Hotz argues that around 2022, AI-generated content began outpacing human-generated content on the open web, and unlike the original Eternal September, this shift has no end state. He frames the pre-2022 internet as a fossil layer already scraped by every major lab, with everything after contaminated by a slop base rate too high to filter at the document level.
By submitting the post and driving it to 369 points, razin amplified Hotz's thesis that the contamination is permanent. The submission's traction reflects broad agreement that the internet has bifurcated into a clean pre-2022 archive and a polluted post-2022 stream.
Hotz identifies three concrete failure modes: search engines surfacing AI summaries of AI summaries, model training runs now requiring expensive human-verification passes, and developer tooling like GitHub Trending, npm popularity, and r/programming being gamed by automated content farms. He treats slop as a measurable infrastructure problem, not an aesthetic complaint.
Hotz explicitly declines to propose a remedy, framing the slop-saturated web as the new baseline rather than a problem to solve. His position is that engineers and founders should stop hoping for cleanup and instead design systems — search, training, discovery — that assume contamination is permanent.
The editorial argues that Anthropic, OpenAI, and Google have all shifted toward licensed data deals (Reddit, Stack Overflow, publishers) and synthetic data pipelines precisely because the open crawl is degrading. Pre-2022 Common Crawl snapshots are now treated internally as scarce, valuable assets — a tacit admission that the dead internet thesis is operationally true at the labs.
George Hotz (geohot) published 'The Eternal Sloptember' on May 24, 2026, and Hacker News pushed it to 369 points within hours. The post extends the 'Eternal September' meme — the 1993 moment Usenet was overrun by AOL newbies and never recovered — to a 2020s thesis: the open web crossed a threshold around 2022 where AI-generated content (slop) began outpacing human-generated content, and unlike September, this one isn't going to end.
Hotz's argument is short and unsentimental. The pre-2022 internet is now a fossil layer: a finite snapshot of mostly human writing that everyone — Google, Anthropic, OpenAI, Meta — has already scraped. Everything written after is contaminated, not because every sentence is machine-generated, but because the base rate of slop is high enough that you can't tell at the document level anymore. The blog post itself is barely 600 words; the comment thread is where the substance lives, with HN regulars trading examples of poisoned Stack Overflow answers, GitHub README inflation, and Amazon reviews written by GPT to sell products written by GPT.
The specifics matter. Hotz calls out three failure modes: search engines surfacing AI summaries of AI summaries, model training runs that now require expensive human-verification passes, and developer tooling — GitHub Trending, npm popularity, Reddit r/programming — being gamed by automated content farms. He doesn't propose a fix. He suggests this is the new equilibrium and the question is what you build on top of it.
The 'dead internet theory' has been a fringe meme for years. What's changed is that the people who actually train the models are now saying it out loud. Anthropic, OpenAI, and Google have all quietly shifted toward licensed data deals (Reddit, Stack Overflow, news publishers) and synthetic data pipelines precisely because the open crawl is degrading. Pre-2022 Common Crawl snapshots are now traded internally at frontier labs the way 1960s mainframe tapes were once hoarded: as irreplaceable artifacts of a cleaner era.
The second-order effects are where this gets uncomfortable for working developers. GitHub Trending, which a lot of tooling (including, candidly, content-curation systems like this one) treats as a quality signal, has been demonstrably gamed for at least 18 months — fake-star campaigns, coordinated forks, and AI-generated 'awesome-X' lists with thousands of links to repos that don't compile. Stack Overflow's traffic is down roughly 50% from its 2021 peak, and the answers that remain are increasingly LLM-pasted with the confident wrongness LLMs specialize in. The signal-to-noise ratio that made the 2010s developer web usable is gone, and nothing has replaced it.
The counter-argument, which Hotz nods at but doesn't engage, is that this is just a tooling problem. Better classifiers, watermarking, provenance standards like C2PA — surely we can detect the slop and filter it. The evidence so far is the other direction: every detector released has been beaten within weeks, and watermarking only works if the model that wrote the text was the one cooperating. Open-weight models don't watermark. Fine-tunes strip watermarks. The arms race has a clear winner and it isn't the detectors.
There's also a generational issue nobody wants to name. The cohort of developers who learned to code by reading Stack Overflow, GitHub source, and random 2014 blog posts is the last cohort that will have done so. New developers learn by asking Claude or Copilot, which were trained on the pre-2022 web. The corpus is frozen, the teachers are derivative, and the loop closes. It's not obviously catastrophic — Hotz isn't a doomer here — but it is a phase change.
Three concrete adjustments are worth making this quarter.
First, stop trusting recency as a quality signal. If your retrieval pipeline, RAG store, or trend-detector weights newer content higher, you are actively selecting for slop. Invert it where you can. For technical documentation, a 2019 Stack Overflow answer with 400 upvotes beats a 2025 blog post with 4,000 words almost every time. Pin dependency docs to specific versions and dates. Cache the good snapshots locally; treat the live web as a fallback, not a primary.
Second, audit your trust roots. GitHub Trending, npm download counts, Reddit upvotes, and HN points are all gameable by anyone with a $20/month OpenAI key and a botnet. If any of those signals feed an automated decision in your stack — package selection, vendor evaluation, content curation — replace them with explicitly curated lists maintained by humans you can name. Awesome-lists from known maintainers, RFCs, vendor-published case studies with named authors. Identity is the new PageRank.
Third, price in the verification tax. Anything you ingest from the post-2022 web — articles, code samples, datasets, even GitHub issues — needs a human or a domain-expert model in the loop before it touches production. Budget for it. The teams that are quietly winning at AI-assisted development right now are the ones who treat LLM output as a draft that requires review, not a search result that requires copy-paste.
The interesting question Hotz doesn't ask is whether the slop tide eventually reveals new high ground. Closed gardens — Discord servers, paid Substacks, internal company wikis, private GitHub orgs — are accumulating the human writing that used to be public. The next decade's developer knowledge probably lives there, behind authentication, monetized or gated. That's a worse internet in most ways that matter, but it's the one we're building. The eternal September ended Usenet. The eternal Sloptember might end the open web as a learning surface, and what replaces it will look a lot more like the pre-internet professional guilds than anything Tim Berners-Lee had in mind.
>But each time I suspected I could have done it better and faster manually.I've heard this said so many times, but my experience has just been so dramatically the opposite that it rings false. But geohot seems to be a pretty productive and smart guy, so it's hard to just dismiss what he
So currently there are people who are buying grey market peptides[1], marked "not for human consumption" and injecting themselves with them based on dubious anecdotes and vibes, to make their skin clearer, build muscle mass, and so on.Are they are all suddenly turning into zombies? No. Do
With the level of ability that AI is at right now, I've found it useful personally to think of it something like a very good search over existing knowledge. Another step up in searchability in the lineage of reference books, stack overflow, GitHub etc.Programmers are rewriting and reinventing t
One under-discussed phenomenon here, I think:The hardest thing in software engineering is solving the right problem. The ability to identify the right problem to solve, is IMO, what distinguishes the top senior engineers. And we could have endless discussions about what constitutes the right problem
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
I think a lot of the problem with the current discourse is how black-and-white it is. Either you're a luddite or "ai pilled".In most cases, LLMs can get you 80-95% of the way, sometimes less, sometimes more. And heck, sometimes, it just gets you somewhere wrong.But it seems everyone i