The article frames the achievement as validation of the 'AI-as-collaborator' model, where ChatGPT served as a high-bandwidth thinking partner rather than an oracle. It highlights that the significant finding is LLMs making amateurs productive in domains that previously required years of specialized training.
The editorial describes this as the 'bicycle for the mind' thesis made concrete, arguing the real story isn't that LLMs can do math but that they can make amateurs productive in one of the most demanding intellectual disciplines. The human provided taste, direction, and verification while the AI provided exploratory breadth.
The editorial explicitly states 'ChatGPT did not solve this problem. A human solved it, with ChatGPT as an accelerant.' It notes the final proof required the human to exercise mathematical judgment that the model itself could not provide, even as it acknowledges this distinction may undersell the collaboration.
The article coins and explores the term 'vibe maths' — a riff on Karpathy's 'vibe coding' — as a distinct methodology where the mathematician uses ChatGPT not for finished proofs but to suggest directions, check intermediate steps, and iterate on partial arguments far faster than working alone.
An amateur mathematician — someone without a professional academic position in mathematics — has solved Erdős problem #1196, a combinatorics question that has stood open for approximately 60 years. The tool that helped bridge the gap between enthusiast and published result: ChatGPT.
The problem, catalogued on erdosproblems.com, is one of hundreds posed by the legendary Hungarian mathematician Paul Erdős, who spent decades scattering unsolved problems across the mathematical landscape like seeds. Many remain open. Some carry cash bounties. All carry prestige. Solving any Erdős problem is notable; solving one without institutional affiliation, using an LLM as a collaborator, is unprecedented.
The approach has been dubbed "vibe maths" — a riff on Andrej Karpathy's "vibe coding" meme — where the human mathematician used ChatGPT not as an oracle that produces finished proofs, but as a high-bandwidth thinking partner. The LLM suggested directions to explore, checked intermediate steps, and helped the solver iterate on partial arguments far faster than they could alone. The final proof, however, required the human to exercise mathematical judgment that the model itself could not.
Let's get the obvious objection out of the way: ChatGPT did not solve this problem. A human solved it, with ChatGPT as an accelerant. But that distinction, while technically correct, undersells what actually happened here.
The significant finding isn't that LLMs can do mathematics — it's that LLMs can make *amateurs* productive in domains that previously required years of specialized training to even attempt. This is the "bicycle for the mind" thesis made concrete in one of the most demanding intellectual disciplines that exists.
The Hacker News discussion (score: 190 and climbing) has split predictably into camps. One side argues this validates the AI-as-collaborator model: the human provided taste, direction, and verification while the AI provided breadth, speed, and pattern-matching across a vast corpus of mathematical techniques. The other side worries about what "vibe maths" means for rigor — if the solver doesn't fully understand every step the LLM suggested, is the proof actually trustworthy?
The answer, at least in this case, appears to be yes. Professional mathematicians have reviewed the work. The proof stands on its own merits regardless of how the ideas were generated. Mathematics doesn't care about provenance — a correct proof is correct whether it was conceived in a shower, on a napkin, or in a ChatGPT conversation.
But the meta-question is more interesting: what does it mean when the barrier to entry for serious mathematical research drops from "PhD plus years of specialization" to "deep interest plus an LLM subscription"? We've seen analogous shifts in software engineering (GitHub Copilot turning junior devs into mid-level contributors on unfamiliar codebases), in writing (LLMs helping non-native speakers produce polished English), and in legal research (AI tools letting small firms compete with BigLaw on document review). Mathematics was supposed to be different — too abstract, too rigorous, too dependent on deep structural intuition.
Apparently not.
The term "vibe maths" is deliberately provocative, but the underlying workflow is recognizable to anyone who's used LLMs productively for technical work. It follows a pattern:
1. Human frames the problem. The solver identified which Erdős problem to attempt, understood the existing literature, and knew what a solution would need to look like.
2. LLM generates candidate approaches. ChatGPT suggested proof strategies, relevant theorems, and structural ideas — many wrong, some interesting, a few genuinely useful.
3. Human filters and steers. The solver evaluated which directions were promising based on mathematical intuition the model lacks. This is the "vibe" part — pattern-matching on what *feels* right before you can prove it.
4. Iterate rapidly. The feedback loop between human judgment and LLM generation compressed what might have been months of solo exploration into a much shorter timeline.
5. Human writes the final proof. The published result is a standard mathematical proof that stands independent of the process that generated it.
This workflow maps almost exactly to how productive developers use Copilot or Claude for complex engineering tasks: you need to know what good looks like, but the AI helps you get there faster. The people who dismiss this as "the AI did it" misunderstand the process. The people who dismiss it as "the human did everything" ignore the counterfactual — this particular human likely would not have solved this particular problem without the tool.
If you're building AI-assisted developer tools, this is a case study worth studying closely. The key architectural insight is that the LLM's value wasn't in producing a correct final output — it was in compressing the exploration phase. The solver explored more dead ends faster, which meant finding the right path sooner.
This has direct implications for how we design AI coding tools. The current generation of AI assistants is optimized for code generation — give me a function that does X. But the vibe maths result suggests the higher-value application might be exploration assistance: help me understand which architectural approach is worth pursuing before I commit to building it. Help me enumerate the failure modes I haven't considered. Help me find the relevant prior art I don't know to search for.
For teams evaluating AI tool ROI: stop measuring lines of code generated and start measuring time-to-good-decision. The amateur mathematician didn't need ChatGPT to write the proof — they needed it to figure out which proof to write.
There's also a talent-market implication. If domain amateurs with AI tools can now compete with specialists on well-defined open problems, the moat around deep specialization narrows. This doesn't mean expertise becomes worthless — the solver still needed substantial mathematical knowledge to frame the problem and evaluate outputs. But it does mean the minimum viable expertise for serious contributions is dropping.
We're early in understanding what "AI-assisted research" actually looks like in practice, as opposed to the breathless predictions and dismissive counter-takes that dominate the discourse. This Erdős result is a single data point, but it's a striking one. The next question isn't whether LLMs can help amateurs solve hard problems — we now have proof they can. The question is whether this scales: can the vibe maths approach work on problems where verification is harder than generation, where you can't easily check if the AI's suggestions led you astray? In mathematics, a proof is a proof. In engineering, a system that works in testing can still fail in production. The gap between "LLM helped me find an answer" and "LLM helped me find the *right* answer" remains the central unsolved problem of AI-assisted work.
<a href="https://www.erdosproblems.com/1196" rel="nofollow">https://www.erdosproblems.com/1196</a>
→ read on Hacker NewsHere is the chat: don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem. {{problem}} REMEMBER - this uncondit
For the uninitiated, Paul Erdős was a pretty famous but very eccentric mathematician who lived for most of the 1900s.He had a habit of seeking out and documenting mathematical problems people were working on.The problems range in difficulty from "easy homework for a current undergrad in math&qu
Buried pretty deep in the article> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says. But now he and Tao have shortened the proof so that it better distills the LLM’s key i
It seems like alot of scientific advancements occurred by someone applying technique X from one field to problem Y in another. I feel like LLMs are much better at making these types of connections than humans because they 1) know about many more theories/approaches than a single human can 2) do
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
https://archive.ph/2w4fi