Geohot: We're stuck in an eternal Sloptember and the signal is gone

4 min read 1 source clear_take
├── "The open web crossed an irreversible threshold where AI slop now outpaces human content, making the pre-2022 internet a finite, valuable fossil layer"
│  ├── George Hotz (geohot) (geohot.github.io) → read

Hotz argues that around 2022, AI-generated content began outpacing human-generated content on the open web, and unlike the original Eternal September, this shift has no end state. He frames the pre-2022 internet as a fossil layer already scraped by every major lab, with everything after contaminated by a slop base rate too high to filter at the document level.

│  └── @razin (Hacker News, 369 pts) → view

By submitting the post and driving it to 369 points, razin amplified Hotz's thesis that the contamination is permanent. The submission's traction reflects broad agreement that the internet has bifurcated into a clean pre-2022 archive and a polluted post-2022 stream.

├── "The degradation is concrete and measurable across specific developer infrastructure — not just a vibe"
│  └── George Hotz (geohot) (geohot.github.io) → read

Hotz identifies three concrete failure modes: search engines surfacing AI summaries of AI summaries, model training runs now requiring expensive human-verification passes, and developer tooling like GitHub Trending, npm popularity, and r/programming being gamed by automated content farms. He treats slop as a measurable infrastructure problem, not an aesthetic complaint.

├── "There is no fix — the new equilibrium is the starting point, and the only real question is what to build on top of it"
│  └── George Hotz (geohot) (geohot.github.io) → read

Hotz explicitly declines to propose a remedy, framing the slop-saturated web as the new baseline rather than a problem to solve. His position is that engineers and founders should stop hoping for cleanup and instead design systems — search, training, discovery — that assume contamination is permanent.

└── "Frontier labs have already conceded the point by quietly pivoting to licensed and synthetic data"
  └── top10.dev editorial (top10.dev) → read below

The editorial argues that Anthropic, OpenAI, and Google have all shifted toward licensed data deals (Reddit, Stack Overflow, publishers) and synthetic data pipelines precisely because the open crawl is degrading. Pre-2022 Common Crawl snapshots are now treated internally as scarce, valuable assets — a tacit admission that the dead internet thesis is operationally true at the labs.

What happened

George Hotz (geohot) published 'The Eternal Sloptember' on May 24, 2026, and Hacker News pushed it to 369 points within hours. The post extends the 'Eternal September' meme — the 1993 moment Usenet was overrun by AOL newbies and never recovered — to a 2020s thesis: the open web crossed a threshold around 2022 where AI-generated content (slop) began outpacing human-generated content, and unlike September, this one isn't going to end.

Hotz's argument is short and unsentimental. The pre-2022 internet is now a fossil layer: a finite snapshot of mostly human writing that everyone — Google, Anthropic, OpenAI, Meta — has already scraped. Everything written after is contaminated, not because every sentence is machine-generated, but because the base rate of slop is high enough that you can't tell at the document level anymore. The blog post itself is barely 600 words; the comment thread is where the substance lives, with HN regulars trading examples of poisoned Stack Overflow answers, GitHub README inflation, and Amazon reviews written by GPT to sell products written by GPT.

The specifics matter. Hotz calls out three failure modes: search engines surfacing AI summaries of AI summaries, model training runs that now require expensive human-verification passes, and developer tooling — GitHub Trending, npm popularity, Reddit r/programming — being gamed by automated content farms. He doesn't propose a fix. He suggests this is the new equilibrium and the question is what you build on top of it.

Why it matters

The 'dead internet theory' has been a fringe meme for years. What's changed is that the people who actually train the models are now saying it out loud. Anthropic, OpenAI, and Google have all quietly shifted toward licensed data deals (Reddit, Stack Overflow, news publishers) and synthetic data pipelines precisely because the open crawl is degrading. Pre-2022 Common Crawl snapshots are now traded internally at frontier labs the way 1960s mainframe tapes were once hoarded: as irreplaceable artifacts of a cleaner era.

The second-order effects are where this gets uncomfortable for working developers. GitHub Trending, which a lot of tooling (including, candidly, content-curation systems like this one) treats as a quality signal, has been demonstrably gamed for at least 18 months — fake-star campaigns, coordinated forks, and AI-generated 'awesome-X' lists with thousands of links to repos that don't compile. Stack Overflow's traffic is down roughly 50% from its 2021 peak, and the answers that remain are increasingly LLM-pasted with the confident wrongness LLMs specialize in. The signal-to-noise ratio that made the 2010s developer web usable is gone, and nothing has replaced it.

The counter-argument, which Hotz nods at but doesn't engage, is that this is just a tooling problem. Better classifiers, watermarking, provenance standards like C2PA — surely we can detect the slop and filter it. The evidence so far is the other direction: every detector released has been beaten within weeks, and watermarking only works if the model that wrote the text was the one cooperating. Open-weight models don't watermark. Fine-tunes strip watermarks. The arms race has a clear winner and it isn't the detectors.

There's also a generational issue nobody wants to name. The cohort of developers who learned to code by reading Stack Overflow, GitHub source, and random 2014 blog posts is the last cohort that will have done so. New developers learn by asking Claude or Copilot, which were trained on the pre-2022 web. The corpus is frozen, the teachers are derivative, and the loop closes. It's not obviously catastrophic — Hotz isn't a doomer here — but it is a phase change.

What this means for your stack

Three concrete adjustments are worth making this quarter.

First, stop trusting recency as a quality signal. If your retrieval pipeline, RAG store, or trend-detector weights newer content higher, you are actively selecting for slop. Invert it where you can. For technical documentation, a 2019 Stack Overflow answer with 400 upvotes beats a 2025 blog post with 4,000 words almost every time. Pin dependency docs to specific versions and dates. Cache the good snapshots locally; treat the live web as a fallback, not a primary.

Second, audit your trust roots. GitHub Trending, npm download counts, Reddit upvotes, and HN points are all gameable by anyone with a $20/month OpenAI key and a botnet. If any of those signals feed an automated decision in your stack — package selection, vendor evaluation, content curation — replace them with explicitly curated lists maintained by humans you can name. Awesome-lists from known maintainers, RFCs, vendor-published case studies with named authors. Identity is the new PageRank.

Third, price in the verification tax. Anything you ingest from the post-2022 web — articles, code samples, datasets, even GitHub issues — needs a human or a domain-expert model in the loop before it touches production. Budget for it. The teams that are quietly winning at AI-assisted development right now are the ones who treat LLM output as a draft that requires review, not a search result that requires copy-paste.

Looking ahead

The interesting question Hotz doesn't ask is whether the slop tide eventually reveals new high ground. Closed gardens — Discord servers, paid Substacks, internal company wikis, private GitHub orgs — are accumulating the human writing that used to be public. The next decade's developer knowledge probably lives there, behind authentication, monetized or gated. That's a worse internet in most ways that matter, but it's the one we're building. The eternal September ended Usenet. The eternal Sloptember might end the open web as a learning surface, and what replaces it will look a lot more like the pre-internet professional guilds than anything Tim Berners-Lee had in mind.

Hacker News 369 pts 309 comments

The Eternal Sloptember

→ read on Hacker News
cafkafk · Hacker News

I think a lot of the problem with the current discourse is how black-and-white it is. Either you're a luddite or "ai pilled".In most cases, LLMs can get you 80-95% of the way, sometimes less, sometimes more. And heck, sometimes, it just gets you somewhere wrong.But it seems everyone i

linsomniac · Hacker News

>But each time I suspected I could have done it better and faster manually.I've heard this said so many times, but my experience has just been so dramatically the opposite that it rings false. But geohot seems to be a pretty productive and smart guy, so it's hard to just dismiss what he

SCdF · Hacker News

So currently there are people who are buying grey market peptides[1], marked "not for human consumption" and injecting themselves with them based on dubious anecdotes and vibes, to make their skin clearer, build muscle mass, and so on.Are they are all suddenly turning into zombies? No. Do

Nition · Hacker News

With the level of ability that AI is at right now, I've found it useful personally to think of it something like a very good search over existing knowledge. Another step up in searchability in the lineage of reference books, stack overflow, GitHub etc.Programmers are rewriting and reinventing t

Balinares · Hacker News

One under-discussed phenomenon here, I think:The hardest thing in software engineering is solving the right problem. The ability to identify the right problem to solve, is IMO, what distinguishes the top senior engineers. And we could have endless discussions about what constitutes the right problem

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.