Zitron argues that frontier model releases (GPT-5, Claude 4.5 Sonnet, Gemini 3, Grok 4) are delivering smaller capability deltas than prior generations, with SWE-Bench scores moving from ~65% to ~75% rather than doubling like GPT-3.5 to GPT-4. He points to ~18 months of optimizing the same architecture against the same benchmarks while OpenAI burns $40B/year and hyperscalers commit $500B+ in datacenter capex, framing this as a fundamental mismatch between flattening returns and escalating spend.
The editorial reframes the debate away from whether progress is technically slowing toward what happens to the industry's business model when the marginal user stops perceiving improvements. This regime — where each release is real but imperceptible to the daily user — undermines the enterprise upgrade cycle that justifies the capex.
The HN thread's 575 points reflect a consensus among working developers comparing their own usage: Sonnet 4.5 is genuinely better than 3.5, but the delta is incremental rather than transformative. Commenters argue the marginal improvement cannot justify the energy footprint of a small European country, treating their lived workflow experience as more credible than benchmark deltas.
Ed Zitron's June 8 post "AI Is Slowing Down" lays out a now-familiar but increasingly hard-to-rebut argument: the major frontier labs are shipping models faster than ever, and each release is delivering a smaller capability delta than the one before. The piece runs through the release timeline — GPT-5 in mid-2025, Claude 4.5 Sonnet, Gemini 3, Grok 4 — and notes that none produced the kind of step-change reaction that GPT-4 did in March 2023, when developers genuinely rewrote their workflows in a weekend.
The post leans heavily on benchmarks and on the labs' own framing. SWE-Bench Verified scores have moved from ~65% (Claude 3.5 Sonnet, mid-2024) to ~75% (current frontier) — a real gain, but not the doubling we got from GPT-3.5 to GPT-4. Hallucination rates on factual recall are roughly flat. Long-context retrieval is better but still unreliable past 200K tokens in adversarial tests. Zitron's read: the labs have spent ~18 months optimizing the same architecture against the same benchmarks, and the curve is bending.
Meanwhile the spending isn't bending. OpenAI's projected 2026 burn is north of $40B. Anthropic has raised at a $170B valuation against ~$5B ARR. Microsoft, Meta, Google, and Amazon are collectively committed to over $500B in datacenter capex through 2027. The HN thread (575 points, top-of-page when this was written) is mostly developers comparing their own day-to-day experience and concluding that, yes, Sonnet 4.5 is better than 3.5, but it's not better in a way that justifies the energy footprint of a small European country.
The interesting question isn't whether Zitron is right that progress is slowing — reasonable people can fight about that benchmark by benchmark. The interesting question is what happens to the industry's business model if the *marginal* user stops noticing the upgrades. That's the regime we may already be in.
For the first two years of the LLM boom, the pitch to enterprise was "buy now because the next model will obsolete your current integration." That fear of being left behind is what justified six- and seven-figure annual contracts for capability that, in many cases, a 70B open model running on a rented H100 could approximate. If the gap between frontier and open stops widening — and Qwen 3, DeepSeek V3, and Llama 4 suggest it's narrowing in many domains — then the premium for API access compresses fast. OpenAI's own pricing moves bear this out: GPT-5 launched cheaper per token than GPT-4 did, not more expensive.
The counter-argument, well-represented in the HN comments, is that we're measuring the wrong thing. Agentic workloads — long-horizon, tool-using, self-correcting — are where the real frontier action is, and benchmarks like SWE-Bench Verified don't capture it well. There's something to this. The jump from "single-shot code completion" to "autonomously close a GitHub issue end-to-end" is the kind of qualitative shift that doesn't show up in token-level metrics. But agentic reliability is also where the most embarrassing failures live: Devin's revenue per active customer remains unclear, Cursor's agent mode burns through tokens faster than humans can review the output, and Anthropic's own Claude Code traces show recovery loops that would horrify any SRE.
Zitron's sharpest point, and the one underplayed in the post, is that capex doesn't depreciate on the same curve as capability gains. H100s have a five-year useful life on the books and a ~two-year useful life in practice as Blackwell and Rubin land. Datacenters have 30-year amortization schedules. If the model that justifies a $10B training run in 2026 is only 8% better than the one from 2025, and the one from 2025 is already commoditized by open weights, the math gets ugly fast. This is the same dynamic that ended the dot-com fiber buildout — Level 3 and Global Crossing weren't wrong about demand, they were wrong about who would capture the margin.
If you're building on top of frontier APIs, the practical takeaway is that you should stop optimizing your prompts for the model you're using today. The half-life of a tuned prompt is now shorter than the procurement cycle to swap providers. Build an abstraction layer. Test the same workload against Sonnet, GPT-5, Gemini 3, and at least one open model monthly. The cost of doing this is a weekend; the cost of being locked in when your vendor raises prices 3x to chase profitability is your runway.
Second: stop assuming the next model will fix your current quality problems. It won't, or at least not enough to matter. The gains from RAG hygiene, eval suites, and constrained output formats now exceed the gains from waiting six months for the next checkpoint. The teams shipping useful AI products in mid-2026 are the ones who treated the model as a fixed component twelve months ago and put their effort into the surrounding system. The teams still waiting for AGI to fix their hallucination problem are the ones whose Series B is getting harder to raise.
Third, and this is where Zitron is genuinely useful even if you think he's too bearish: the people selling you the next platform shift have a strong incentive to undersell the plateau. When NVIDIA, Microsoft, OpenAI, and your local AI consultancy all agree that the next model will change everything, ask what their book looks like if it doesn't. Most of them have answers. Some of them don't.
The honest version of the next 18 months probably isn't "AI winter" and it isn't "AGI by Christmas." It's a slow-grinding consolidation where the frontier labs converge on similar capability profiles, pricing pressure from open models compresses margins, and the value capture moves up the stack — toward applications, tooling, evals, and the unglamorous middleware that makes models actually usable in production. That's a worse story for the trillion-dollar valuations and a better story for the developers who are tired of rewriting their integrations every six months. Whether the capex cycle can absorb that transition without something breaking is the question Zitron is really asking. The answer probably arrives in someone's Q3 earnings call.
Lots of dismissive comments ITT, very few tackling the substance of the article.> AI Cannot Afford To Slow Down — It Needs $3 Trillion Or More In Revenue By End Of 2030 To Sustain Its ExistenceIs this true? With the total 2024 wages being 11.7 trillion USD [0], and nonfarm payrolls totaling 158,0
Ed is an interesting character. His financial analysis of the AI industry makes logical sense to me (though I am not knowledgeable enough to actually know if it is correct.) However, he seems to be so angry at AI in general, that he misses the obvious areas where LLMs are actually changing the State
Today Apple launched its revamped AI offering. Judging by several reports, Apple pays Google a mere billion dollars a year to operate it. Essentially just licensing the IP. Google are (allegedly) happy to turn over the right to operate and distill their models for only a billion a year.Consumer reve
One of the "smells" that gives away a quacky ranter is they speak in impassioned, "Why doesn't everyone understand this?" tones, but in fact their argument just doesn't flow. If Zitron's argument were as solid as he keeps saying it is, you would read it and underst
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
Number of active users on ChatGPT is at an all-time high. Number of tokens consumed on OpenRouter is at an all-time high. I'm not seeing the plateau.