Epoch AI has confirmed that OpenAI's GPT-5.4 Pro solved an open problem in Ramsey hypergraph theory — a result verified through their FrontierMath benchmark, which tracks AI performance on research-level mathematics problems that remain unsolved or recently solved by human mathematicians.
This is not another benchmark score improvement. This is an AI system producing a novel mathematical result on a problem that professional mathematicians had not yet cracked. The distinction matters: solving known competition problems tests pattern matching against a training distribution. Solving open problems requires generating genuinely new reasoning chains.
FrontierMath, for context, was designed specifically to resist the 'contamination' problem that plagues most AI math benchmarks. Problems are contributed by working mathematicians, held private until solved, and verified through formal proof checking — not pattern-matched against known solution templates. When Epoch launched the benchmark, frontier models solved under 2% of problems. The gap between 'can solve IMO problems' and 'can solve open research questions' was supposed to be measured in years, not months.
The Ramsey hypergraph problem sits in combinatorics — specifically, bounding the behavior of coloring problems on hypergraphs, a generalization of the classical Ramsey numbers that have resisted tight bounds for decades. The fact that GPT-5.4 Pro found a valid construction or proof here suggests the model isn't just interpolating from training data; it's performing something closer to mathematical exploration.
The HN discussion (322 points) splits predictably: one camp sees this as the beginning of AI-driven mathematics becoming routine; the other argues a single verified result doesn't constitute a research program and that the hard part — identifying which problems are tractable and why — remains human territory. Both camps are partially right, but the skeptics are losing ground faster than they expected to.
For practitioners, the immediate takeaway isn't about pure math. It's about what this implies for reasoning capability in applied domains. If GPT-5.4 Pro can navigate the search space of open combinatorial problems, the same capability applied to code optimization, formal verification, or constraint satisfaction gets materially more interesting. The models aren't just getting better at conversation — they're getting better at thinking.
The longer-term question: does this accelerate the timeline for AI systems that can meaningfully contribute to research, not just assist with it? Epoch's own AI timeline forecasts may need updating based on their own benchmark results. That's either ironic or inevitable, depending on your priors.
I don't know why I am still perpetually shocked that the default assumption is that humans are somehow unique.It's this pervasive belief that underlies so much discussion around what it means to be intelligent. The null hypothesis goes out the window.People constantly make comments like &q
I have long said I am an AI doubter until AI could print out the answers to hard problems or ones requiring tons of innovation. Assuming this is verified to be correct (not by AI) then I just became a believer. I would like to see a few more AI inventions to know for sure, but wow, it really is a ne
For those, like me, who find the prompt itself of interest …> A full transcript of the original conversation with GPT-5.4 Pro can be found here [0] and GPT-5.4 Pro’s write-up from the end of that transcript can be found here [1].[0] https://epoch.ai/files/open-problems/gp
I like to imagine that the number of consumed tokens before a solution is found is a proxy for how difficult a problem is, and it looks like Opus 4.6 consumed around 250k tokens. That means that a tricky React refactor I did earlier today at work was about half as hard as an open problem in mathemat
Top 10 dev stories every morning at 8am UTC. AI-curated. Retro terminal HTML email.
I am kind of amazed at how many commenters respond to this result by confidently asserting that LLMs will never generate 'truly novel' ideas or problem solutions.> AI is a remixer; it remixes all known ideas together. It won't come up with new ideas> it's not because the mo