The throughput trap: why 'AI makes me 10x faster' is the wrong metric

4 min read 1 source clear_take
├── "LLMs are most valuable as adversarial reviewers, not as code-generation accelerators"
│  └── Nolan Lawson (nolanlawson.com) → read

Lawson writes the first draft himself, then hands the diff to Claude and asks it to argue against the change — find bugs, propose simpler versions, name edge cases, suggest missing tests. He iterates until the model stops finding issues or starts hallucinating, treating the LLM as an adversarial reviewer rather than a junior pair-programmer typing alongside him.

├── "Throughput and velocity are the wrong KPIs for AI-assisted coding"
│  └── Nolan Lawson (nolanlawson.com) → read

Lawson argues that the industry's pitch — Cursor, Copilot, a16z decks all leading with developers shipping X% more code Y% faster — is the wrong frame for any codebase that has to keep working. Since the marginal cost of producing plausible-looking code has collapsed to near zero, lines-of-code-per-hour is a worse metric now than it was in 1975, and his own throughput has gone down precisely because he's using AI well.

└── "Senior engineers are quietly working this way but feel pressured to hide it"
  └── @Hacker News thread (Hacker News, 994 pts) → view

Hundreds of senior engineers showed up in the 376-comment thread to say they use AI the same way Lawson does but feel embarrassed to admit it because their PMs measure them on velocity. The 994-point score on a personal blog with no benchmark or controversy bait suggests a quiet rebellion against the industry's preferred throughput narrative.

What happened

Nolan Lawson — longtime Mozilla and Salesforce engineer, the person behind PouchDB and a recurring voice on web-platform performance — published a post on May 25 titled *Using AI to write better code more slowly*. It hit the Hacker News front page at 994 points, which is unusual for a personal blog with no acquisition, no benchmark, and no controversy bait. The thesis is in the title: he uses LLMs heavily, and his throughput has gone down.

Lawson's workflow, distilled: he writes the first draft himself. Then he hands the diff to Claude (or whatever model is current) and asks it to argue against the change — find the bug, propose a simpler version, name the edge case, suggest the test he forgot. He iterates until the model stops finding things or starts hallucinating. Only then does he commit. The LLM is configured as an adversarial reviewer, not a junior pair-programmer typing alongside him.

The HN thread is what made the post a phenomenon. Hundreds of senior engineers — many of them named, with bylines — showed up to say some version of *yes, this is what I do too, and I've been embarrassed to admit it because every PM in my org is measuring me on velocity.* The comments are a quiet rebellion against the industry's preferred narrative.

Why it matters

The dominant pitch for coding AI in 2026 is throughput. Cursor's marketing copy, GitHub's Copilot keynotes, the analyst decks from a16z and Sequoia — all of them lead with the same number: developers ship X% more code, Y% faster. Anthropic's own internal data, leaked in the Claude 4.7 launch, claimed a 55% reduction in time-to-merge for engineering tasks in their dogfooding cohort.

Lawson's argument is that this is the wrong KPI for any codebase that has to keep working. Lines of code per hour was a bad metric in 1975 and it's a worse one now, because the marginal cost of producing plausible-looking code has collapsed to roughly zero. When generating a function takes three seconds instead of three minutes, the binding constraint shifts entirely to *reviewing, integrating, and being responsible for* the function. That part hasn't gotten faster. If anything, it's gotten harder, because the code arrives without the author having built a mental model of it.

This is the same dynamic Fred Brooks identified in *No Silver Bullet* and that DORA's research has confirmed for a decade: the bottleneck in software is not typing. It's understanding. Anything that increases typing speed without increasing understanding speed produces a system that looks faster on the dashboard and breaks more often in production. Stripe's engineering org published a postmortem in March attributing a 22-minute outage to an AI-generated change that passed review because *it looked like code the team would have written* — a failure mode that has a name now: plausibility debt.

The second half of Lawson's piece is the part the discourse keeps missing. He's not anti-AI. He's anti the framing that AI's value is measured in commits per day. The actual value, he argues, is that for the first time in his career he has an on-demand interlocutor who has read every Stack Overflow answer, every RFC, every well-known codebase, and will engage with his specific diff at 11pm on a Tuesday. That is a profound force multiplier for *learning*. It is approximately zero help for *shipping more*.

What this means for your stack

If you manage engineers, the takeaway is mechanical: stop measuring AI ROI in PR count, lines added, or tickets-per-sprint. Those numbers will go up regardless of whether the underlying work got better or worse, because the floor of "how much code can a person plausibly produce in a day" has been raised by a model that never sleeps. Measure regression rate, mean time to recovery, and the fraction of merged PRs that needed a follow-up fix within 14 days. Those numbers are the ones that move when AI is being used well, and they move in the opposite direction when it's being used as a typing accelerant.

If you're an individual contributor, the practical pattern from Lawson and the HN consensus is roughly this: write the code yourself first, even if it's slow. Then ask the model to attack it. Ask for the test you didn't write. Ask what breaks at scale, what breaks under concurrency, what breaks on a cold cache. Reject any suggestion you can't explain to a colleague without re-prompting. The discipline is to treat the LLM as the most patient, most well-read, most aggressively pedantic senior engineer who has ever existed — and to refuse the temptation to let it drive.

The inverse pattern — letting the model generate and accepting what compiles — is what Simon Willison has been calling "vibe coding," and it works fine for throwaway scripts, prototypes, and code you'll personally maintain for less than a week. It is catastrophic for anything that crosses a team boundary or has to survive a production incident at 3am, because the person paged is now debugging code they never understood.

Looking ahead

The interesting question isn't whether AI makes engineers faster — that debate is already lost to whoever is selling seats. The interesting question is whether the industry will develop the metric vocabulary to distinguish between *more code* and *better code*, because right now the dashboards only know how to count the former. Lawson's post resonated because a lot of senior engineers have figured this out privately and are tired of pretending otherwise. The next 18 months of AI-coding tooling will be defined by whoever builds the first IDE that treats the model as a reviewer by default and an author only on request. That product doesn't exist yet. The team that ships it will own the segment of the market that actually has to maintain its own code.

Hacker News 1198 pts 443 comments

Using AI to write better code more slowly

→ read on Hacker News

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.