AI is getting expensive. Good.

Everyone on LinkedIn is panicking about the same thing right now. Token costs are going up. The subsidies are ending. GPUs are melting. Data centers can't get built fast enough. Uber burned through its entire 2026 AI budget in four months. An Nvidia VP admitted that compute costs now exceed employee salaries on his own team.

None of that is wrong.  The question is whether it tells the whole story.

The AI industry is running a subsidy model that looks increasingly like early-stage ride-hailing. The frontier labs are spending significantly more than they earn per dollar of revenue. Both OpenAI and Anthropic are heading toward IPOs, and public markets will scrutinize every token served at a loss. Google restructured its AI subscriptions into metered credits. Claude moved toward seat-cost-plus-credits for enterprise. The "all you can eat" era is ending, because some power users were generating tens of thousands of dollars in compute costs while paying a couple hundred a month. That math doesn't work.

Meanwhile, the physical infrastructure can't keep up. Data center construction is bottlenecked by mundane electrical components: transformers, switchgear, that hold up billion-dollar campuses. The power grid was never designed for this density of demand. Tariffs on Chinese-manufactured components and export controls on chips compound everything.

And the agentic use case makes it worse. A chatbot conversation takes a handful of turns. An agentic coding workflow that writes, tests and iterates code can burn orders of magnitude more tokens per task. Enterprise teams are discovering that the real cost of AI is always higher than the sticker price: retrieval, embeddings, retry logic, and context management add up fast.

So yes. The squeeze is real. But here's what that story leaves out.

The per-token price of intelligence has been collapsing at a rate with almost no precedent in technology. The cost of running what was frontier-level performance two years ago has dropped by orders of magnitude. What GPT-4 could do at launch in 2023, you can now get equivalent quality for pennies through multiple providers.

What's actually happening is a Jevons Paradox. Intelligence per token is getting dramatically cheaper. But because it's cheaper, people are consuming vastly more of it. The bill is going up not because intelligence got more expensive but because we're using orders of magnitude more of it.

That distinction matters enormously. One of those problems has a ceiling. The other is solving itself through forces the frontier labs cannot control.

Start with what's happened in China. DeepSeek's latest models charge a fraction of what OpenAI and Anthropic charge for comparable work. These aren't toy models, they perform within striking distance of the best closed-source systems. DeepSeek's own technical report acknowledges they trail frontier by roughly three to six months. At a fraction of the cost.

The mechanism is distillation, training smaller models to mimic the outputs of larger ones. Everyone in the industry does it. OpenAI accuses Chinese labs of it. Anthropic accuses them too. Then Musk admitted under oath that xAI did the same thing to OpenAI's models. The frontier labs are accusing competitors of doing what the entire industry treats as standard practice. So maybe you can’t charge premium prices for intelligence that can be approximately replicated at commodity cost. Not forever. The frontier still leads on the hardest tasks. But "best for the hardest tasks" is a shrinking moat when the second-best option costs a tenth as much and the gap narrows every quarter.

Then there's edge computing. Google's Gemma 4 models run on everything from phones to laptops under a fully open license. Nvidia is shipping desktop hardware that runs large models locally. The architecture this enables is simple: handle routine tasks on-device at zero API cost. Reserve cloud calls for the problems that genuinely need frontier capability.

Most people don't need the best models for everything.

Summarization, classification, translation, everyday code completion, simple Q&A. A well-tuned local model handles these perfectly well. The assumption baked into the panic is that everyone needs frontier, all the time. They don't. The 80% of tasks that drive the bulk of token consumption are increasingly served just fine by models that run on hardware you already own.

I run models locally on a MacBook Pro M3 Max (a laptop). Models that run on this hardware today are roughly ten times faster than what was available eighteen months ago, at the same quality or better. What required a data center a year ago runs on my desk now.

And then there are the coding harnesses.

The reason tools like Claude Code work as well as they do isn't just the model. It's the deterministic software around it: context engineering, tool orchestration, routing, retry logic. Research shows that harness quality alone can swing performance by enormous margins. Same model, different harness, wildly different results.

If the value lives in the harness as much as the model, what happens when the harnesses go open-source? It's already happening. A wave of open-source agent frameworks now support multi-provider routing run most of your workload through cheap or local models, escalate only the hard stuff to a frontier API.

This is the structural threat to frontier pricing power. Not that someone builds a model equally good tomorrow. But that a great harness makes a slightly less capable model good enough for the vast majority of real work. And the customers who burn the most tokens — developers — are the most technically capable and the least sticky users in the ecosystem. If the harness decouples from the model, they will switch.

The one constraint in this system that doesn't have a software solution is energy. You can't distill your way out of a power grid bottleneck. If the infrastructure buildout stalls badly enough, compute scarcity could keep prices elevated for a while regardless of what the software economics suggest.

But even that cuts both ways. Energy constraints on cloud compute accelerate the shift toward local and edge inference, toward exactly the architecture that reduces dependence on centralized pricing. The more expensive cloud becomes, the more attractive it is to run what you can locally.

Think back to what we paid for frontier tokens two years ago. That price is effectively zero today. The frontier of 2024 is the edge model of 2026. That cycle isn't slowing down.

The frontier labs can't charge what they want because competitors undercut them. They can't charge what they truly need because developers will route around them. They can't lock anyone in because the harness layer is separating from the model layer. And they can't ignore the efficiency gains coming from their own ecosystem.

The squeeze is real. But prices cannot explode for long under these constraints. The panic is about a transition. And transitions end.

By Daniel Huszár

Next
Next

AI Capability vs. Data Control: Why the Trade-Off Is a Myth