Willison argues that after three years of hedging on LLM use cases, the evidence is now clear: both frontier labs have stopped searching and started executing around a single proven use case — agentic coding. He cites product convergence (Claude Code, Codex 2025, Cursor, Windsurf, Cline, Aider all rhyming on the same execution loop) as the tell that the underlying capability crossed a threshold simultaneously.
Posting his own piece to HN, Willison framed the shift as narrow and specific — not chat, not general assistance, not enterprise knowledge work, but writing, running, and debugging code in an agent loop. The 199-point score suggests the developer audience broadly recognizes the pattern he's describing.
Willison points to Anthropic's Max plan and OpenAI's Pro tier — both priced at the high end of consumer software — as evidence that a real population of developers happily pays premium rates for agents that close tickets autonomously. This isn't a 'tools for thought' market with hobbyist pricing; it prices like labor replacement because that's effectively what it is.
Willison argues the real tell isn't that coding works — that's been obvious since Copilot in 2021 — but that Anthropic in particular has visibly de-emphasized other product directions to concentrate on Claude Code. When a frontier lab narrows its focus rather than broadening it, that's the behavioral signature of having found a market rather than still hunting for one.
On May 27, Simon Willison published a short post titled *I think Anthropic and OpenAI have found product-market fit* — and for a writer who has spent three years carefully refusing to declare anything about LLMs settled, that's a notable shift in tone. His argument is narrow and specific: the two leading frontier labs have stopped behaving like companies searching for a use case and started behaving like companies that found one. That use case is coding — not chat, not 'general assistance,' not enterprise knowledge work, but writing, running, and debugging software in an agent loop.
The evidence Willison points to is mostly product-shape evidence rather than revenue. Anthropic has spent the past year rebuilding around Claude Code, which graduated from an experimental CLI to the centerpiece of the company's developer story. OpenAI has done the same with Codex — not the 2021 model, but the 2025 reincarnation: a cloud-hosted agent that picks up a repo, runs tests, and ships patches. Cursor, Windsurf, Cline, Aider, Zed's agent mode, and a half-dozen forks all converged on roughly the same execution loop within a six-month window. When products from competing labs and competing startups all rhyme this closely, it's usually because the underlying capability finally crossed a threshold and everyone hit the same local maximum at once.
Willison also notes the pricing pattern. Anthropic's Max plan at $100–$200/month and OpenAI's Pro tier at $200/month both exist because there's a population of developers who will pay that — happily — for an agent that closes tickets while they sleep. That's not a 'tools for thought' market; that's a 'replace the junior engineer's overflow queue' market, and it prices accordingly.
The interesting move here isn't that coding turned out to be a good fit for LLMs. Everyone has known that since Copilot shipped in 2021. The interesting move is what the labs *stopped* doing. Anthropic has visibly de-emphasized the consumer chatbot race — Claude.ai is still there, but every product announcement for eighteen months has been about Code, MCP, computer use, or the API. OpenAI's consumer ChatGPT business is enormous and won't go anywhere, but the *innovation budget* — the new SKUs, the new model variants, the agentic scaffolding — is flowing to Codex and the developer platform. Google, the third lab in the room, is the conspicuous holdout still trying to make Gemini work as a general assistant inside Workspace, and it shows in the comparative product velocity.
This matters because PMF for a frontier lab isn't just a revenue signal — it's a *training data* signal. Once a lab decides coding is the product, every subsequent decision (data mix, RL environments, eval suites, post-training recipes) gets pulled toward that target, and the gap to general-assistant competitors widens with every release cycle. This is why Claude 4.5 and GPT-5-Codex feel qualitatively different from their predecessors at code-shaped work but only incrementally better at, say, summarizing a PDF. The labs are optimizing what they measure, and what they measure is increasingly `pass@1 on a real PR`.
The community reaction on Hacker News (199 points within a few hours, which is high for a Willison post that isn't about a model release) splits along predictable lines. The bullish read: this is the moment LLMs became infrastructure for a specific profession, the way Bloomberg terminals became infrastructure for traders. The bearish read: 'PMF' is doing a lot of work in a sentence where the underlying businesses still burn capital faster than they earn it, and a $200/month plan only looks like PMF if you ignore inference cost per agent-hour. Both can be true. Bloomberg also lost money for years before the lock-in turned into a moat.
The more honest framing, which Willison gestures at without quite saying: the labs have found product-market fit with developers specifically, and they're betting the rest of the economy will follow developers the way the rest of the economy followed Slack and GitHub. That's a real bet, not a sure thing. Developers are unusually tolerant of broken tooling, unusually willing to glue things together, and unusually bad as a proxy for what knowledge workers in general will adopt. The fit is real; the extrapolation is the speculation.
If you're a senior engineer reading this and you haven't restructured how you use these tools in the last six months, you're probably leaving a lot on the table. The 2024 workflow — open Cursor, accept tab-completions, occasionally chat with the sidebar — is now the floor, not the ceiling. The 2026 workflow looks more like: write a one-paragraph spec, hand it to Claude Code or Codex in agent mode, let it run for ten to thirty minutes against a sandboxed checkout, review the PR, and iterate. The unit of work shifted from *line* to *task*, and the cognitive load shifted from *typing* to *specifying and reviewing*.
A few concrete adjustments worth making. One: invest in your repo's machine-readability. Good README, good CLAUDE.md / AGENTS.md, fast test suite, deterministic local dev setup. The labs have effectively standardized on 'agent reads the repo, runs the tests, iterates' — repos that don't support that loop are now actively harder to work in than repos that do. Two: stop paying for IDE-only AI plans if you're not also paying for an agentic plan. The IDE is the wrong unit. Three: build the habit of reading agent-generated diffs the way you'd read a contractor's PR — skeptically, with focus on the boundary conditions and the tests, not the happy path.
The harder organizational question is what to do about the junior-engineer pipeline, because if Willison is right about pricing, the labs are explicitly targeting the overflow queue that used to be how junior engineers learned. There's no good answer yet, but pretending it isn't happening is the worst answer.
The interesting thing to watch over the next two quarters isn't whether Anthropic and OpenAI keep winning at coding — they will, until someone open-sources a model that closes the gap. The interesting thing is whether either lab can translate developer PMF into a *second* vertical. Anthropic is clearly probing legal and finance with computer-use demos; OpenAI keeps poking at the enterprise knowledge-worker seat. If one of them lands a second PMF in the next 18 months, the 2027 conversation is about platform companies. If neither does, the 2027 conversation is about two very profitable, very narrow developer-tools businesses with $50B+ valuations and a lot of explaining to do.
I find this analysis confusing. PMF for coding was likely reached some time last year. Profitability, which is different, we don’t know. The article kind of confuses both without making a strong economic case or using numbers in a compelling way. I don’t understand what the Uber case has to do with
I feel like there's a bit of AI psychosis in this particular post.>"These are tools which burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals.">"Somehow this fragment turned into headlines
So how do openai and anthropic plan to keep customers when GLM-5.1 is just as good and open source and a lot cheaper?I don't see the business model working. My closest friend actually does automation software for large companies.He does not use Claude or openai at all. He primarily uses gpt 120
> $2,180.16 worth of tokens for $200“Tokens” don’t have an intrisic cost or value. Saying that I used $2,180.16 worth of tokens is like relying on the salesperson to convince me I’m getting a billion dollars worth of pots and pans for $19.99.I think it’s funny how we are throwing critical thinkin
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
They've got, ballpark, $5t to $10t to make back in the next 5 years, or the hardware buildouts will start getting written down.This means we're going to need $1t+ per year in spending, per year, on tokens. 200m knowledge workers in the world, 30m developers. We're talking about a worl