The throughput trap: why 'AI makes me 10x faster' is the...

What happened

Nolan Lawson — longtime Mozilla and Salesforce engineer, the person behind PouchDB and a recurring voice on web-platform performance — published a post on May 25 titled *Using AI to write better code more slowly*. It hit the Hacker News front page at 994 points, which is unusual for a personal blog with no acquisition, no benchmark, and no controversy bait. The thesis is in the title: he uses LLMs heavily, and his throughput has gone down.

Lawson's workflow, distilled: he writes the first draft himself. Then he hands the diff to Claude (or whatever model is current) and asks it to argue against the change — find the bug, propose a simpler version, name the edge case, suggest the test he forgot. He iterates until the model stops finding things or starts hallucinating. Only then does he commit. The LLM is configured as an adversarial reviewer, not a junior pair-programmer typing alongside him.

The HN thread is what made the post a phenomenon. Hundreds of senior engineers — many of them named, with bylines — showed up to say some version of *yes, this is what I do too, and I've been embarrassed to admit it because every PM in my org is measuring me on velocity.* The comments are a quiet rebellion against the industry's preferred narrative.

Why it matters

The dominant pitch for coding AI in 2026 is throughput. Cursor's marketing copy, GitHub's Copilot keynotes, the analyst decks from a16z and Sequoia — all of them lead with the same number: developers ship X% more code, Y% faster. Anthropic's own internal data, leaked in the Claude 4.7 launch, claimed a 55% reduction in time-to-merge for engineering tasks in their dogfooding cohort.

Lawson's argument is that this is the wrong KPI for any codebase that has to keep working. Lines of code per hour was a bad metric in 1975 and it's a worse one now, because the marginal cost of producing plausible-looking code has collapsed to roughly zero. When generating a function takes three seconds instead of three minutes, the binding constraint shifts entirely to *reviewing, integrating, and being responsible for* the function. That part hasn't gotten faster. If anything, it's gotten harder, because the code arrives without the author having built a mental model of it.

This is the same dynamic Fred Brooks identified in *No Silver Bullet* and that DORA's research has confirmed for a decade: the bottleneck in software is not typing. It's understanding. Anything that increases typing speed without increasing understanding speed produces a system that looks faster on the dashboard and breaks more often in production. Stripe's engineering org published a postmortem in March attributing a 22-minute outage to an AI-generated change that passed review because *it looked like code the team would have written* — a failure mode that has a name now: plausibility debt.

The second half of Lawson's piece is the part the discourse keeps missing. He's not anti-AI. He's anti the framing that AI's value is measured in commits per day. The actual value, he argues, is that for the first time in his career he has an on-demand interlocutor who has read every Stack Overflow answer, every RFC, every well-known codebase, and will engage with his specific diff at 11pm on a Tuesday. That is a profound force multiplier for *learning*. It is approximately zero help for *shipping more*.

What this means for your stack

If you manage engineers, the takeaway is mechanical: stop measuring AI ROI in PR count, lines added, or tickets-per-sprint. Those numbers will go up regardless of whether the underlying work got better or worse, because the floor of "how much code can a person plausibly produce in a day" has been raised by a model that never sleeps. Measure regression rate, mean time to recovery, and the fraction of merged PRs that needed a follow-up fix within 14 days. Those numbers are the ones that move when AI is being used well, and they move in the opposite direction when it's being used as a typing accelerant.

If you're an individual contributor, the practical pattern from Lawson and the HN consensus is roughly this: write the code yourself first, even if it's slow. Then ask the model to attack it. Ask for the test you didn't write. Ask what breaks at scale, what breaks under concurrency, what breaks on a cold cache. Reject any suggestion you can't explain to a colleague without re-prompting. The discipline is to treat the LLM as the most patient, most well-read, most aggressively pedantic senior engineer who has ever existed — and to refuse the temptation to let it drive.

The inverse pattern — letting the model generate and accepting what compiles — is what Simon Willison has been calling "vibe coding," and it works fine for throwaway scripts, prototypes, and code you'll personally maintain for less than a week. It is catastrophic for anything that crosses a team boundary or has to survive a production incident at 3am, because the person paged is now debugging code they never understood.

Looking ahead

The interesting question isn't whether AI makes engineers faster — that debate is already lost to whoever is selling seats. The interesting question is whether the industry will develop the metric vocabulary to distinguish between *more code* and *better code*, because right now the dashboards only know how to count the former. Lawson's post resonated because a lot of senior engineers have figured this out privately and are tired of pretending otherwise. The next 18 months of AI-coding tooling will be defined by whoever builds the first IDE that treats the model as a reviewer by default and an author only on request. That product doesn't exist yet. The team that ships it will own the segment of the market that actually has to maintain its own code.

The throughput trap: why 'AI makes me 10x faster' is the wrong metric

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Using AI to write better code more slowly

The throughput trap: why 'AI makes me 10x faster' is the wrong metric

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Using AI to write better code more slowly

// share this