The scaling curve bent: what 'AI is slowing down' means for your stack

4 min read 1 source clear_take
├── "Frontier AI improvements have structurally plateaued and the economics don't justify the capex"
│  ├── Ed Zitron (Where's Your Ed At) → read

Zitron argues the GPT-5 vs GPT-4o benchmark delta is a fraction of the GPT-3.5 → GPT-4 jump despite an order of magnitude more training compute, and that Claude 4.5 Sonnet and Gemini 2.5 Pro are now clustered within 1–2 points of each other. He pairs this with OpenAI's projected 2026 cash burn, xAI's gas-turbine Memphis buildout, and Anthropic's reported gross margins to argue the unit economics of frontier inference don't close.

│  └── @crescit_eundo (Hacker News, 497 pts) → view

Submitted the piece to Hacker News where it reached 497 points, signaling broad community resonance with Zitron's plateau thesis. The submission framing endorses treating the slowdown as the central story rather than a contrarian provocation.

├── "The S-curve has another inflection — RL-on-reasoning is the next gain vector"
│  └── @HN skeptics of the plateau thesis (Hacker News) → view

A faction in the 1,200+ comment thread argues that pointing at pretraining benchmark deltas misses where the gains are now coming from: reinforcement learning on reasoning traces, as seen in OpenAI's o-series and Claude's extended thinking. They view the recent benchmark clustering as a measurement artifact, not a ceiling.

├── "The plateau is real but the cause is contested — data, architecture, or distribution tail"
│  └── @HN practitioners (Hacker News) → view

Working engineers in the thread largely accepted Zitron's framing that returns have bent, but split on the mechanism: some blamed data exhaustion as the web's high-quality corpus is depleted, others pointed to transformer architectural ceilings, and a third camp argued the labs simply harvested the useful part of the capability distribution and the long tail is genuinely hard. The disagreement on cause matters because each diagnosis implies a different escape route.

└── "Treat plateau as a structural condition to engineer around, not a lull to wait out"
  └── top10.dev editorial (top10.dev) → read below

The editorial reframes Zitron's piece by arguing the prediction itself is a coin flip given noisy curves and unreleased lab capability. The more useful move is for builders to abandon the three-year 'upgrade and ship' pattern and design systems assuming the underlying model won't get dramatically smarter — making evals, scaffolding, and unit economics the differentiator rather than the next checkpoint.

What happened

Ed Zitron's latest at Where's Your Ed At, *AI Is Slowing Down*, landed at 497 points on Hacker News with a thesis the room has been circling for months but mostly refused to say out loud: the frontier-model improvement curve has visibly bent, while the capex and inference cost curves have not.

Zitron's specific claims are worth pinning down before the discourse swallows them. He points to the gap between GPT-5 and GPT-4o on standard reasoning suites (MMLU-Pro, GPQA, SWE-bench Verified) being a fraction of the GPT-3.5 → GPT-4 jump, despite roughly an order of magnitude more compute thrown at training. He notes that Anthropic's Claude 4.5 Sonnet and Google's Gemini 2.5 Pro are now clustered within 1–2 points of each other on most public benchmarks — close enough that ordering depends on which subset you cherry-pick. And he hammers on the gap between *demoed* capability and *deployed* economics: OpenAI's projected 2026 cash burn, xAI's gas-turbine-powered Memphis buildout, and Anthropic's reported gross margins all imply the unit economics of inference at the frontier still don't close without enterprise contract leverage.

The HN comments — 1,200+ deep by the time the front page moved on — split predictably. Skeptics pointed to RL-on-reasoning gains (o-series, Claude's extended thinking) as proof the S-curve has another inflection. Practitioners mostly agreed with Zitron's framing but disagreed on the cause: data exhaustion vs. architectural ceilings vs. simply 'we hit the useful part of the distribution and the long tail is hard.'

Why it matters

The interesting move here isn't the prediction. Predicting an AI slowdown in 2026 is a coin flip — the curves are noisy and the labs have unreleased capability. The interesting move is treating 'plateau' as a structural condition to engineer around, rather than a temporary lull to wait out.

For three years the dominant pattern was *upgrade and ship*. You'd build against GPT-4, the next model would drop, your eval scores would jump 8–15 points without code changes, and your roadmap would assume that cadence forever. That pattern is breaking. Claude 4.5 to 4.6 was a 2–3 point bump on most internal evals teams have shared. GPT-5 underwhelmed against the leaks. Gemini 2.5 closed the gap but didn't open a new one. The 'free upgrade' tax break is over.

What replaces it is uglier and more interesting: scaffolding eats the delta. The teams shipping the best AI products in 2026 aren't the ones with privileged access to a smarter base model — everyone has access to roughly the same intelligence ceiling. They're the ones with better retrieval, better tool-use loops, better verification layers, and better human-in-the-loop UX. Cursor, Cognition, and the better Claude Code competitors have demonstrated this repeatedly: the same Sonnet checkpoint produces wildly different product quality depending on harness design.

This matches what compiler people learned in the 1990s. Once single-threaded performance growth slowed, the action moved up the stack — caches, branch prediction, SIMD, then concurrency. The chip stopped getting faster; the people who *used* the chip got smarter. We're at the equivalent inflection for LLM-powered products: the model is the silicon, and the engineering is everything you wrap around it.

The second-order effect Zitron underplays is that this is *bad news for the labs and good news for application developers.* If Sonnet-class intelligence is now commoditized across three vendors, switching cost collapses. Pricing power moves to whoever owns the workflow, not whoever owns the weights. That's why every frontier lab is suddenly shipping IDEs, agents, and 'Code' products — they can read the same balance sheet.

What this means for your stack

Three concrete adjustments if you ship LLM-backed product code:

Stop pricing in capability gains that may not come. If your roadmap has a Q3 feature that assumes 'the next model will handle this' — kill it or rewrite it against today's capabilities. The teams hurt worst by the GPT-5 disappointment were the ones whose 2026 plans depended on it being a step change. Plan for flat capability, treat any improvement as upside.

Invest in evals like they're load tests. When models were improving fast, evals were a nice-to-have because the next checkpoint would fix your regression. With a flat curve, eval infrastructure becomes the primary lever for product quality. Teams that have been treating evals as a chore — running a 50-case suite by hand quarterly — need the equivalent of a CI pipeline: thousands of cases, branched by use case, run on every prompt change, with regression alerts. This is unglamorous and it's where the next year's product wins come from.

Multi-model architecture becomes table stakes. If the labs are within noise of each other, route by cost and latency, not capability. The cheapest correct answer wins. Use Haiku/Flash/GPT-5-mini for high-volume retrieval and classification, reserve Sonnet/Opus-class for the actual reasoning step, and design fallback chains so a single provider outage doesn't take you down. The `claude-cli.js → codex-cli` fallback chain in this codebase isn't aspirational anymore — it's the default architecture.

Looking ahead

Zitron is right that the discourse has gotten ahead of the capability, and the bill for that gap is going to come due in 2026 — for the labs first, then for the funds, then for the application teams whose pitch decks promised AGI-adjacent features. But the engineer's takeaway isn't pessimism. It's that the interesting work just shifted from 'wait for the next model' to 'extract everything from this one.' That's the work that compounds, that's the work that builds product moats, and that's the work the labs can't ship on your behalf. The slowdown, if it's real, is the best thing that's happened to application developers since the API opened.

Hacker News 619 pts 693 comments

AI Is Slowing Down

→ read on Hacker News
Eighth · Hacker News

Number of active users on ChatGPT is at an all-time high. Number of tokens consumed on OpenRouter is at an all-time high. I'm not seeing the plateau.

jollyllama · Hacker News

Lots of dismissive comments ITT, very few tackling the substance of the article.> AI Cannot Afford To Slow Down — It Needs $3 Trillion Or More In Revenue By End Of 2030 To Sustain Its ExistenceIs this true? With the total 2024 wages being 11.7 trillion USD [0], and nonfarm payrolls totaling 158,0

adamtaylor_13 · Hacker News

Ed is an interesting character. His financial analysis of the AI industry makes logical sense to me (though I am not knowledgeable enough to actually know if it is correct.) However, he seems to be so angry at AI in general, that he misses the obvious areas where LLMs are actually changing the State

dofm · Hacker News

Today Apple launched its revamped AI offering. Judging by several reports, Apple pays Google a mere billion dollars a year to operate it. Essentially just licensing the IP. Google are (allegedly) happy to turn over the right to operate and distill their models for only a billion a year.Consumer reve

putzdown · Hacker News

One of the "smells" that gives away a quacky ranter is they speak in impassioned, "Why doesn't everyone understand this?" tones, but in fact their argument just doesn't flow. If Zitron's argument were as solid as he keeps saying it is, you would read it and underst

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.