The S-curve hits: AI capability gains stall while burn r...

What happened

Ed Zitron's June 8 post "AI Is Slowing Down" lays out a now-familiar but increasingly hard-to-rebut argument: the major frontier labs are shipping models faster than ever, and each release is delivering a smaller capability delta than the one before. The piece runs through the release timeline — GPT-5 in mid-2025, Claude 4.5 Sonnet, Gemini 3, Grok 4 — and notes that none produced the kind of step-change reaction that GPT-4 did in March 2023, when developers genuinely rewrote their workflows in a weekend.

The post leans heavily on benchmarks and on the labs' own framing. SWE-Bench Verified scores have moved from ~65% (Claude 3.5 Sonnet, mid-2024) to ~75% (current frontier) — a real gain, but not the doubling we got from GPT-3.5 to GPT-4. Hallucination rates on factual recall are roughly flat. Long-context retrieval is better but still unreliable past 200K tokens in adversarial tests. Zitron's read: the labs have spent ~18 months optimizing the same architecture against the same benchmarks, and the curve is bending.

Meanwhile the spending isn't bending. OpenAI's projected 2026 burn is north of $40B. Anthropic has raised at a $170B valuation against ~$5B ARR. Microsoft, Meta, Google, and Amazon are collectively committed to over $500B in datacenter capex through 2027. The HN thread (575 points, top-of-page when this was written) is mostly developers comparing their own day-to-day experience and concluding that, yes, Sonnet 4.5 is better than 3.5, but it's not better in a way that justifies the energy footprint of a small European country.

Why it matters

The interesting question isn't whether Zitron is right that progress is slowing — reasonable people can fight about that benchmark by benchmark. The interesting question is what happens to the industry's business model if the *marginal* user stops noticing the upgrades. That's the regime we may already be in.

For the first two years of the LLM boom, the pitch to enterprise was "buy now because the next model will obsolete your current integration." That fear of being left behind is what justified six- and seven-figure annual contracts for capability that, in many cases, a 70B open model running on a rented H100 could approximate. If the gap between frontier and open stops widening — and Qwen 3, DeepSeek V3, and Llama 4 suggest it's narrowing in many domains — then the premium for API access compresses fast. OpenAI's own pricing moves bear this out: GPT-5 launched cheaper per token than GPT-4 did, not more expensive.

The counter-argument, well-represented in the HN comments, is that we're measuring the wrong thing. Agentic workloads — long-horizon, tool-using, self-correcting — are where the real frontier action is, and benchmarks like SWE-Bench Verified don't capture it well. There's something to this. The jump from "single-shot code completion" to "autonomously close a GitHub issue end-to-end" is the kind of qualitative shift that doesn't show up in token-level metrics. But agentic reliability is also where the most embarrassing failures live: Devin's revenue per active customer remains unclear, Cursor's agent mode burns through tokens faster than humans can review the output, and Anthropic's own Claude Code traces show recovery loops that would horrify any SRE.

Zitron's sharpest point, and the one underplayed in the post, is that capex doesn't depreciate on the same curve as capability gains. H100s have a five-year useful life on the books and a ~two-year useful life in practice as Blackwell and Rubin land. Datacenters have 30-year amortization schedules. If the model that justifies a $10B training run in 2026 is only 8% better than the one from 2025, and the one from 2025 is already commoditized by open weights, the math gets ugly fast. This is the same dynamic that ended the dot-com fiber buildout — Level 3 and Global Crossing weren't wrong about demand, they were wrong about who would capture the margin.

What this means for your stack

If you're building on top of frontier APIs, the practical takeaway is that you should stop optimizing your prompts for the model you're using today. The half-life of a tuned prompt is now shorter than the procurement cycle to swap providers. Build an abstraction layer. Test the same workload against Sonnet, GPT-5, Gemini 3, and at least one open model monthly. The cost of doing this is a weekend; the cost of being locked in when your vendor raises prices 3x to chase profitability is your runway.

Second: stop assuming the next model will fix your current quality problems. It won't, or at least not enough to matter. The gains from RAG hygiene, eval suites, and constrained output formats now exceed the gains from waiting six months for the next checkpoint. The teams shipping useful AI products in mid-2026 are the ones who treated the model as a fixed component twelve months ago and put their effort into the surrounding system. The teams still waiting for AGI to fix their hallucination problem are the ones whose Series B is getting harder to raise.

Third, and this is where Zitron is genuinely useful even if you think he's too bearish: the people selling you the next platform shift have a strong incentive to undersell the plateau. When NVIDIA, Microsoft, OpenAI, and your local AI consultancy all agree that the next model will change everything, ask what their book looks like if it doesn't. Most of them have answers. Some of them don't.

Looking ahead

The honest version of the next 18 months probably isn't "AI winter" and it isn't "AGI by Christmas." It's a slow-grinding consolidation where the frontier labs converge on similar capability profiles, pricing pressure from open models compresses margins, and the value capture moves up the stack — toward applications, tooling, evals, and the unglamorous middleware that makes models actually usable in production. That's a worse story for the trillion-dollar valuations and a better story for the developers who are tired of rewriting their integrations every six months. Whether the capex cycle can absorb that transition without something breaking is the question Zitron is really asking. The answer probably arrives in someone's Q3 earnings call.

The S-curve hits: AI capability gains stall while burn rates accelerate

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

AI Is Slowing Down

// community takes

The S-curve hits: AI capability gains stall while burn rates accelerate

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

AI Is Slowing Down

// community takes

// share this