Aphyr: The Information Commons Is Being Poisoned. Now Wh...

What happened

Kyle Kingsbury — better known as aphyr, the engineer behind Jepsen's distributed systems torture tests — published a long essay on his blog titled *The Future of Everything Is Lies, I Guess: Where Do We Go from Here?* It hit 138 points on Hacker News, which for a meditative personal post rather than a product launch is a meaningful signal about where the practitioner mood sits right now.

The thesis, stripped to its load-bearing beams: the information substrate that working engineers depend on — documentation, bug trackers, forum answers, search results, code review, even peer-reviewed papers — is being flooded with fluent, confident, plausible-sounding generated text faster than any existing social or technical system can filter it. The problem is not that LLMs occasionally hallucinate; the problem is that producing plausible nonsense is now cheaper than refuting it, and that asymmetry is devouring maintainer time.

Aphyr is not the first person to say this. curl's Daniel Stenberg has been on record for over a year about AI-generated CVE reports eating his weekends. The Linux kernel has tightened contributor policies. Stack Overflow banned unmarked GPT answers in December 2022 and has not walked it back. What makes this essay land is the voice — a person whose entire career is built on formally verifying that systems lie, now watching the rest of the field drown in fluency without truth.

Why it matters

The familiar response to AI content is to argue about quality: is GPT-5 better than a junior, can Claude write a correct sort function, etc. Aphyr's framing cuts under that debate. He's not asking whether the models are good. He's asking what happens to the ecosystems humans built on an implicit assumption that producing a comment, a bug report, or a paper took roughly human-scale effort. Every system we use — code review, issue triage, peer review, moderation — is a cost-asymmetry machine, and we just inverted the cost asymmetry.

Compare the responses so far. The optimistic camp (most VCs, most model labs) says detection and provenance tooling will catch up: watermarking, C2PA, attestation, retrieval-augmented verification. The pessimistic camp — which increasingly includes maintainers who have to live with the output — says watermarking is defeated by paraphrase, C2PA requires platform cooperation that will not arrive, and the economic gradient points the wrong way. Platforms that would filter slop are also the platforms selling slop generation as a feature. The slop is not a bug in the business model; for most of the companies producing it, the slop is the product.

There's a useful analogy buried in here for anyone who has worked on distributed systems, which is probably why aphyr reached for it. Byzantine fault tolerance assumes a bounded fraction of lying nodes. The classical results — PBFT, HotStuff — need honest supermajorities. Our social information protocols (Stack Overflow reputation, GitHub review, arXiv endorsement, Wikipedia consensus) were also implicitly Byzantine systems with honest-majority assumptions. Generative models don't add a few more Byzantine nodes; they let one actor spin up arbitrarily many at near-zero cost. That is not a condition the original protocol was designed for, and patching it after the fact is — to borrow aphyr's own professional experience — historically very hard.

The community reaction on the HN thread leaned toward grim agreement, with one recurring dissent worth engaging: *you always could fake this, spam is old, the internet has been a sewer for decades.* True but incomplete. The cost floor matters. Email spam was cheap but undifferentiated; content farms were scale-limited by human writers; Sybil attacks on forums were bottlenecked by captchas and karma. Generative models collapsed all three constraints into a single line item on an API bill. Quantitative change at three orders of magnitude is a qualitative change, and pretending otherwise is how you end up maintaining curl.

What this means for your stack

First, tighten your trust graph. If you maintain anything with public contribution, the era of treating a well-formatted bug report as prima facie good-faith is over. Require reproduction steps that exercise actual code paths. Auto-close reports that can't produce a failing test. Several large OSS projects have quietly adopted variants of this and seen triage volume fall without a corresponding loss of real bugs — the signal was always concentrated in reports that came with evidence.

Second, move your own reading upstream. RSS against a curated list of humans beats algorithmic feeds. Primary sources beat summaries. For security specifically, prefer vendor advisories and signed commits to aggregator posts; for language ecosystems, prefer release notes from core maintainers to Medium explainers. This is not Luddism — it is the same discipline you apply when you read a distributed systems paper and check the authors before you check the abstract.

Third, audit your dependencies like the supply chain they are. Star counts are manipulable. Download counts are manipulable. What is harder to fake is a five-year commit history from a named human whose other work you can read. If you cannot, in ten minutes, name a human being responsible for a library you are about to pull into production, that is now a risk signal, not a neutral fact. Package ecosystems from npm to PyPI to crates.io have already seen typosquatting and slop-maintainer takeover incidents; the cost of diligence is lower than the cost of the first incident.

Fourth, assume your own outputs are being scraped into the next model and act accordingly. This is less a security concern than a civic one. If you write good technical prose and publish it openly, you are feeding the same system that is degrading the commons. Aphyr doesn't offer a tidy answer here and neither should anyone else; the tradeoffs are real. But the question deserves to be asked out loud rather than deferred.

Looking ahead

Aphyr's essay is deliberately short on prescription — the title ends with a question mark for a reason. The honest forecast is that the near-term equilibrium is worse information, smaller and more defensive communities, and a bifurcated web where signed, known-human content is a premium tier and everything else is treated as suspect by default. That is not a pleasant outcome but it is a legible one, and legibility is the first thing you need before you can build anything. The engineers most likely to come through this with working systems are the ones who are already acting as if the trust substrate is gone — because operationally, for a growing class of problems, it is.

Aphyr: The Information Commons Is Being Poisoned. Now What?

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

The Future of Everything Is Lies, I Guess: Where Do We Go from Here?

// community takes

Aphyr: The Information Commons Is Being Poisoned. Now What?

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

The Future of Everything Is Lies, I Guess: Where Do We Go from Here?

// community takes

// share this