Back to Blog
opinions

The future of LLM costs

The future of Large Language Model (LLM) costs has become a subject of intense debate in the AI community, with experts presenting starkly different visions of...

5 min read
By Alan Zabihi

The future of Large Language Model (LLM) costs has become a subject of intense debate in the AI community, with experts presenting starkly different visions of what lies ahead. Understanding these competing perspectives, and the forces that support each view, is crucial for organizations planning their AI strategies.

Competing expert perspectives

On one side, Anthropic CEO Dario Amodei presents a vision of increasing costs, predicting that training advanced AI models could reach $100 billion per model in the coming years. This view is supported by current trends: Google's PaLM required $9-17 million to train, while the newer Gemini's training costs reached $30-191 million, excluding staff salaries. As models grow more sophisticated and incorporate advanced reasoning capabilities, like those seen in OpenAI's o3, their computational requirements—and associated costs—continue to rise.

In contrast, Chamath Palihapitiya argues that we're witnessing the powerful deflationary effects of technology in AI. He points to examples like DeepSeek v3, which offers high-performance capabilities at just $0.28 per million tokens output. This view suggests that while we're currently in an AI investment hype cycle, market forces will inevitably drive down costs, potentially creating challenges for companies that have made massive upfront investments in AI development.

The energy factor

The energy demands of AI represent a critical and growing cost consideration. The computational power needed to sustain AI is doubling approximately every 100 days, creating significant implications for both operational costs and environmental sustainability. To put this in perspective, even if AI model efficiency improves tenfold, the demand for computational power could still increase by up to 10,000 times.

The scale of energy consumption is substantial. For example, with just 100 million weekly active users making five queries per week, GPT-3.5's annual energy consumption reaches 44,200 MWh – equivalent to powering 4,150 U.S. households for a year. This energy intensity varies significantly based on model size. Larger models like CodeLlama 70B consume dramatically more energy than their smaller counterparts, directly impacting operational costs.

Looking ahead, U.S. data centers housing these models are projected to consume about 88 terawatt-hours annually by 2030 – 1.6 times the electricity consumption of New York City in 2023. This surge in demand is expected to drive up energy prices in major data center regions, with some areas potentially seeing increases of up to 70% over the next decade.

Hardware economics

Training modern language models requires specialized hardware like GPUs and TPUs, with NVIDIA's Blackwell AI chips costing $30,000-$40,000 each. Renting an 8-GPU H100 cluster from hyperscalers can cost between $50 to $150 per hour, translating to $36,000 to $108,000 per month of continuous usage.

Two opposing forces are at work: while technological advancement typically leads to more efficient and cheaper computing over time, growing demand and increasing model complexity create upward pressure on prices and requirements. The energy requirements of this hardware add another layer of cost complexity, as data centers seek locations with average energy costs below $0.07 per kilowatt-hour, with some operations securing prices below $0.04 per kilowatt-hour in areas with hydroelectric power generation.

Data and compliance costs

Data represents another crucial cost factor. This isn't just about quantity – it's about quality, consistency, security, compliance, and legal considerations. Organizations must navigate complex intellectual property rights, data protection regulations, licensing agreements, and potential liability issues. The cost of ensuring legal compliance and managing IP rights for training data has become a significant component of overall LLM development expenses.

The inference challenge

While training costs often dominate discussions, inference costs are becoming increasingly significant. Models like OpenAI's o3, which uses reasoning steps and chain-of-thought approaches, significantly increases the compute requirements at inference time. The aggregated cost of inference over a model's lifetime often exceeds the initial training cost, as the same model performs countless inferences.

The energy implications of inference are substantial. Processing 500,000 input and output tokens can cost $7.50 with smaller models, but the energy consumption – and associated costs – scale dramatically with model size and complexity.

Emerging solutions

Several developments could help reduce costs. Synthetic data offers potential savings in data acquisition and compliance, though it comes with challenges in ensuring realism and generalization. Small Language Models (SLMs) provide more efficient alternatives for specific applications. Open-source alternatives, exemplified by models like DeepSeek v3, are demonstrating that high performance doesn't always require massive investment.

On the energy front, advancements in AI research are yielding promising efficiency improvements. Optimized scheduling of AI tasks can reduce energy consumption by 12% to 15%, though this may come with increased processing time. Power usage capping and emerging technologies like neuromorphic computing – which mimics the human brain's neural structure – show potential for significant energy savings.

The transition to renewable energy sources could also help mitigate long-term energy costs, though this requires substantial infrastructure investment. AI itself is being leveraged to optimize energy systems, with potential to reduce energy consumption in buildings and transportation by up to 20% and 15% respectively.

Market evolution

The global LLM market is projected to grow from $1.59 billion in 2023 to $259.8 billion by 2030. This growth is being driven by increasing adoption across multiple sectors as organizations seek to leverage AI capabilities. While companies like Google, OpenAI, and Microsoft have traditionally dominated with their closed-source models, there's a growing trend toward open-source alternatives, exemplified by models like DeepSeek v3.

A two-tier future

Rather than either perspective winning out entirely, we're likely to see the market split into two distinct segments. At the high end, companies pushing the boundaries of AI capabilities will continue to face escalating costs as they develop increasingly sophisticated models. These organizations will need to find ways to monetize their investments effectively despite deflationary pressures in the broader market.

At the same time, we'll likely see the emergence of more accessible and cost-effective solutions for specific applications. This tier of the market will benefit from the deflationary effects Palihapitiya describes, potentially democratizing access to AI capabilities.

Alan Zabihi

Co-founder & CEO

Follow on X

Related Articles

Vibex is our open-source attempt to understand and rebuild OpenAI Codex using modern developer tools. It's a real coding agent that takes plain-language tasks, runs them in secure E2B containers via VibeKit, and produces working GitHub pull requests. No demo shell or fake eval—just structured coding workflows that install packages, write code, run tests, and push changes.

June 25, 20257 min read

A new way of building software is catching on. People are skipping the traditional engineering process and using tools like Replit, LLM apps, and code agents...

April 21, 20254 min read

Today, our interactions with language models are largely limited to discrete, isolated tasks: drafting emails, generating content snippets, analyzing small...

March 31, 20253 min read

Subscribe to our newsletter

Get notified when we publish new articles and updates.