Aphyr on LLMs: 'This Is Bullshit About Bullshit Machines...

What happened

Kyle Kingsbury — better known as Aphyr, the person behind Jepsen, the gold-standard test suite that has found correctness bugs in virtually every major distributed database — has released a multi-part essay titled *The Future of Everything Is Lies, I Guess*. Available as a series of blog posts plus PDF and EPUB, the piece represents years of deferred writing on the social and technical implications of large language models.

Kingsbury opens with a disarming admission: he grew up on Asimov and Clarke, dreamed of intelligent machines, and never imagined the Turing test would fall in his lifetime. He also never imagined he'd feel so disheartened when it did. The essay traces his skepticism back to 2019, when he asked a hyperscaler presenting new LLM training hardware whether what they were doing was ethical — whether making deep learning cheaper would enable new forms of spam and propaganda. Five years later, the essay finally exists, and it is, in his own words, "bullshit about bullshit machines."

The piece is deliberately one-sided. Kingsbury acknowledges that others have covered ecological and intellectual property dimensions more thoroughly, and that boosterism needs no additional amplification. His goal is to map the negative space — the risks and failure modes that don't make it into launch keynotes.

Why it matters

This isn't a random blog post from a concerned citizen. Kingsbury's entire career has been built on one principle: systems that claim correctness properties should be tested against those claims, ruthlessly and publicly. Jepsen has embarrassed Redis, MongoDB, Elasticsearch, CockroachDB, and dozens of other databases by demonstrating that their consistency guarantees didn't hold under real failure conditions. When that person turns their attention to LLMs and says the fundamental architecture is truth-indifferent, it carries a specific weight.

The essay's core argument — that LLMs are bullshit machines in the philosophical sense, producing outputs without regard to truth value — isn't new. Harry Frankfurt's *On Bullshit* framework has been applied to language models since GPT-3. What Kingsbury adds is the systems-thinking perspective: what happens when you deploy truth-indifferent components into truth-dependent pipelines? In distributed systems, a single node that lies about its state can corrupt an entire cluster. The analogy to LLM-generated code, documentation, legal filings, and medical advice is not subtle.

The Hacker News discussion around the essay is itself instructive. Commenter danieltanfh95 pushed back on the "LLMs can't do X so they're idiots" framing, arguing that LLMs with harnesses — tool use, retrieval augmentation, chain-of-thought scaffolding — are "clearly capable of engaging with logical problems that only need text." This is the strongest version of the counterargument: nobody serious claims raw token prediction is reasoning, but the composite systems built around LLMs may be.

The most interesting tension in the discourse isn't between "AI works" and "AI doesn't work" — it's between people building verification layers fast enough and people deploying without them. Commenter munificent drew a parallel to the Industrial Revolution: before industrialization, the natural world was nearly infinitely abundant relative to our capacity to exploit it. LLMs may have done something similar to information — made the generation of plausible-sounding text so cheap that we've overwhelmed our capacity to verify it.

Meanwhile, commenter beders highlighted the terminological problem: the phrase "AI" is so overloaded that conversations about capabilities, risks, and ethics constantly talk past each other. When a product marketer says "AI" and a machine learning researcher says "AI" and Kyle Kingsbury says "AI," they're describing different things — and the ambiguity is not accidental.

What this means for your stack

If you're using LLM-generated code in production — and at this point, most teams are — Kingsbury's essay is a useful forcing function to audit your verification pipeline. The question isn't whether Copilot or Claude or GPT wrote the code. The question is whether your review process, test coverage, and deployment safeguards were designed for a world where a substantial fraction of submitted code was generated by a system that optimizes for plausibility rather than correctness.

Concretely, this means:

Testing budgets need to account for LLM-generated code. If 30-40% of your new code is AI-assisted (GitHub's reported Copilot acceptance rate), your test suite needs to cover failure modes that human developers rarely produce but LLMs produce routinely — subtly wrong boundary conditions, hallucinated API surfaces, correct-looking code that breaks under concurrency. Property-based testing and fuzzing become more valuable, not less, in an LLM-assisted workflow.

Code review norms need updating. The traditional code review assumes a human author who understands the code's intent and can explain their reasoning when questioned. When the author is a human who accepted a suggestion from a system that has no intent, the review dynamic changes. Some teams are experimenting with requiring reviewers to run AI-generated code locally before approving, or flagging AI-assisted PRs for additional scrutiny. Neither approach scales perfectly, but doing nothing scales worse.

Observability matters more. Kingsbury's Jepsen work proved that distributed systems fail in ways their authors didn't anticipate. LLM-generated code fails the same way — it's syntactically valid, it passes the obvious tests, and it breaks in production under conditions the model never saw in training. If you're not already running comprehensive observability on LLM-assisted codepaths, you're flying blind in exactly the way Kingsbury has spent a decade warning database vendors about.

The broader point extends beyond code. If your product uses LLM outputs for customer-facing content, search results, documentation, or decision support, the verification layer is your product's integrity. The LLM is a generation engine. Verification is your job.

Looking ahead

Kingsbury's essay is the first installment of a series, with subsequent sections releasing over the coming days. Given his track record — Jepsen reports are exhaustive, well-sourced, and devastating to their subjects — the full work is likely to become a reference text for the LLM-skeptic position. Whether you agree with his framing or not, the systems-level question he's asking is the right one: we know how to build reliable systems from unreliable components (it's literally what distributed computing is), but only when we acknowledge the unreliability upfront rather than marketing it away. The industry's track record on that front, as Jepsen has documented across dozens of databases, is not encouraging.

Aphyr on LLMs: 'This Is Bullshit About Bullshit Machines, and I Mean It'

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

The Future of Everything Is Lies, I Guess

// community takes

Aphyr on LLMs: 'This Is Bullshit About Bullshit Machines, and I Mean It'

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

The Future of Everything Is Lies, I Guess

// community takes

// share this