An Amateur 'Vibe Mathed' a 60-Year-Old Erdős Problem With ChatGPT

5 min read 1 source clear_take
├── "LLMs are valuable as iterative thinking partners for mathematical research, not as autonomous solvers"
│  ├── Scientific American (Scientific American) → read

The article emphasizes that the solver didn't just paste the problem and receive a proof — they used ChatGPT as an iterative thinking partner across many sessions, bouncing proof strategies, checking logical steps, and exploring dead ends faster. This frames the achievement as a human-directed collaboration, not an AI breakthrough.

│  └── top10.dev editorial (top10.dev) → read below

The editorial highlights that the process involved 'significant human direction' and that the LLM's pattern-matching suggested approaches from adjacent areas of mathematics. The value was in accelerating exploration, not replacing the mathematician's structural insight.

├── ""Vibe maths" is a meaningful parallel to vibe coding, but mathematics demands higher rigor"
│  └── top10.dev editorial (top10.dev) → read below

The editorial argues that the 'vibe maths' framing is doing real analytical work — mathematics is harder for LLM-assisted workflows because proofs demand absolute rigor with no 'it works on my machine' escape hatch, yet easier in that correctness is formally verifiable. This tension makes the achievement more notable than vibe coding successes.

└── "This democratizes mathematical research by lowering barriers for non-affiliated researchers"
  └── Scientific American (Scientific American) → read

The article's framing highlights that the solver was an amateur without a university position or formal research affiliation, yet solved a problem that stumped the professional community for 60 years. The implicit argument is that LLM tools can level the playing field, giving outsiders access to the kind of broad mathematical knowledge previously gatekept by institutional affiliation.

What happened

An amateur mathematician — someone without a university position or formal research affiliation — used OpenAI's ChatGPT to solve Erdős Problem #1196, a conjecture from the legendary Hungarian mathematician Paul Erdős that had remained open for approximately 60 years. The result was reported by Scientific American under the headline framing of "vibe maths," a deliberate callback to the now-ubiquitous "vibe coding" phenomenon.

Erdős problems carry a particular weight in mathematics. Paul Erdős, who died in 1996, posed hundreds of open problems throughout his career, many accompanied by cash bounties. The problems cataloged at erdosproblems.com range from elementary-sounding number theory questions to deep combinatorial conjectures. Problem #1196 falls in the combinatorics and number theory space — the kind of problem where the statement is deceptively simple but the proof requires genuine structural insight.

The solver didn't just paste the problem into ChatGPT and receive a proof. They used the LLM as an iterative thinking partner — bouncing proof strategies off it, asking it to check logical steps, exploring dead ends faster than they could alone, and using the model's pattern-matching to suggest approaches from adjacent areas of mathematics. The process reportedly involved many sessions and significant human direction.

Why it matters

The phrase "vibe maths" is doing real work here, and it's worth unpacking why. In software, vibe coding describes the practice of using LLMs to write code by intent rather than by hand — you describe what you want, the model generates it, you iterate. The results range from surprisingly good to subtly catastrophic, depending on the complexity and the operator's ability to evaluate output.

Mathematics is, in some ways, a harder domain for this workflow and in other ways an easier one. Harder because mathematical proofs demand absolute rigor — there's no "it works on my machine" escape hatch. A proof is correct or it isn't. But easier because correctness is verifiable in a way that software behavior often isn't. You can check a proof. You can formalize it. The LLM doesn't need to be right on the first try; it needs to help you search the space of possible arguments faster.

This result lands in the middle of an ongoing debate about AI's role in mathematical research. On one side, projects like DeepMind's AlphaProof and Meta's formal theorem-proving work aim to automate proof discovery end-to-end. Those systems target the Fields Medal frontier — IMO problems, millennium-prize-adjacent conjectures. On the other side, working mathematicians have quietly been using ChatGPT and Claude as sophisticated rubber ducks: tools for checking intuitions, generating counterexamples, and exploring unfamiliar subfields.

The Erdős result suggests the second approach may be underrated. The solver didn't need a purpose-built theorem prover. They needed a general-purpose language model that could hold context about a mathematical argument and respond usefully to natural-language queries about proof strategies. That's a much lower bar than autonomous proof discovery, and it's available to anyone with a browser right now.

The Hacker News discussion (285 points) reflected a community genuinely engaged with the implications. The predictable objections surfaced — "the human did the real work," "ChatGPT just got lucky," "wait for peer review" — but the dominant thread was more nuanced. Several commenters with mathematical backgrounds noted that the bottleneck in solving many open problems isn't raw intelligence but exposure: knowing which techniques from which subfields might apply. LLMs, trained on the entire mathematical literature, serve as a kind of compressed library that can surface connections a lone researcher might never encounter.

What this means for your stack

If you're a developer who uses AI tools daily, this story validates something you probably already suspect: the value of LLMs as thinking partners scales with the difficulty and openness of the problem, not just with the volume of boilerplate to generate.

The practical implications extend beyond mathematics. Consider the parallel to debugging complex distributed systems, designing novel algorithms, or reasoning about security threat models. These are all domains where the bottleneck is often navigating a vast search space of possible approaches, not executing the final solution. The Erdős result demonstrates that a general-purpose LLM, used by someone with strong domain intuition, can meaningfully compress that search.

There's a workforce implication too, and it cuts both ways. The amateur framing is significant — this wasn't a tenured professor at a research university. It was someone outside the institutional system who lacked access to a department full of collaborators but found a substitute in AI. For developers and technically-minded people who've always wanted to contribute to adjacent fields — mathematics, physics, biology — the barrier to meaningful participation just got measurably lower. You still need the intuition. You still need the ability to evaluate whether an AI-suggested approach is nonsense. But you no longer need to be embedded in an institution to have access to a knowledgeable sounding board.

The flip side: if an amateur with ChatGPT can solve problems that stumped professionals for decades, what does that say about the problems? Some mathematicians will argue (not unreasonably) that Erdős problems vary enormously in difficulty, and that a 60-year-old unsolved problem isn't necessarily a hard problem — it may just be an overlooked one. The real test will be whether AI-assisted amateurs start cracking problems that active research groups have been grinding on. That hasn't happened yet.

Looking ahead

The "vibe maths" label will stick, for better or worse. Expect more stories like this as the population of people using LLMs for serious intellectual work outside software engineering grows. The more interesting question isn't whether AI can help solve math problems — at this point, clearly yes — but whether the verification pipeline can keep up. Peer review in mathematics already takes months to years. If AI-assisted solvers start submitting proofs at higher volume, the bottleneck shifts from discovery to validation. That's a good problem to have, but it's a real one. For now, the takeaway is simple: the most productive use of LLMs isn't replacing human thinking. It's making human thinking cheaper to iterate on. An amateur just proved that with a 60-year-old conjecture and a ChatGPT subscription.

Hacker News 759 pts 543 comments

Amateur armed with ChatGPT solves an Erdős problem

<a href="https:&#x2F;&#x2F;www.erdosproblems.com&#x2F;1196" rel="nofollow">https:&#x2F;&#x2F;www.erdosproblems.com&#x2F;1196</a>

→ read on Hacker News
ravenical · Hacker News

https:&#x2F;&#x2F;archive.ph&#x2F;2w4fi

adamgordonbell · Hacker News

Here is the chat: don&#x27;t search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a &quot;number theory and primitive sets&quot; math problem. Provide a full unconditional proof or disproof of the problem. {{problem}} REMEMBER - this uncondit

CSMastermind · Hacker News

For the uninitiated, Paul Erdős was a pretty famous but very eccentric mathematician who lived for most of the 1900s.He had a habit of seeking out and documenting mathematical problems people were working on.The problems range in difficulty from &quot;easy homework for a current undergrad in math&qu

lqstuart · Hacker News

Buried pretty deep in the article&gt; “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says. But now he and Tao have shortened the proof so that it better distills the LLM’s key i

shybear · Hacker News

It seems like alot of scientific advancements occurred by someone applying technique X from one field to problem Y in another. I feel like LLMs are much better at making these types of connections than humans because they 1) know about many more theories&#x2F;approaches than a single human can 2) do

// share this

// get daily digest

Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.