GPT-5.4 Pro Just Solved an Open Math Problem. Epoch Conf...

// tldr

Epoch AI has confirmed that OpenAI's GPT-5.4 Pro solved an open problem in Ramsey hypergraph theory — a problem from FrontierMath, the benchmark Epoch specifically designed to be unsolvable by current AI systems.

This is worth pausing on.

FrontierMath isn't another MMLU-style multiple choice quiz.

// deep dive

This is worth pausing on. FrontierMath isn't another MMLU-style multiple choice quiz. It's a curated set of original, unpublished math problems created by working mathematicians, specifically constructed so that correct answers are verifiable but the solution paths require genuine mathematical reasoning. When FrontierMath launched, models scored in the low single digits. The whole point was to create a benchmark with years of headroom.

That headroom just got a lot shorter.

The solved problem involves Ramsey hypergraphs — a domain in extremal combinatorics dealing with the conditions under which order must appear in sufficiently large structures. These aren't textbook exercises. Open problems in Ramsey theory have resisted human mathematicians for decades. The fact that a language model produced a novel, verified solution moves this from 'AI is good at math competitions' to 'AI is generating results that professional mathematicians haven't.'

For the skeptics: Epoch's verification process matters here. This isn't a model self-reporting success on a benchmark it may have seen during training. Epoch uses problems with exact, verifiable numerical answers that aren't available anywhere online. The confirmation means the output was checked against ground truth established by the problem's author.

What does this mean practically? Two things.

First, the benchmark treadmill is accelerating. FrontierMath was supposed to be the benchmark that lasted. It's already being chipped away at within roughly a year of launch. For anyone building evaluation infrastructure, the lesson is clear: static benchmarks have an ever-shrinking shelf life.

Second, the 'reasoning model' trajectory is producing qualitatively different outputs than the 'predict the next token' framing suggests. Whether GPT-5.4 Pro 'understands' Ramsey theory is a philosophical question. Whether it can produce novel, correct mathematical results that extend human knowledge is now an empirical one. The answer is yes.

The HN discussion (170 points) reflects the split you'd expect: mathematicians debating whether the solution pathway constitutes genuine insight or sophisticated pattern-matching, and engineers pointing out that the distinction matters less than the output. Both camps are right, which is exactly what makes this uncomfortable.

For developers building on top of these models: the ceiling on what you can delegate to an LLM just moved up again. If it can solve open combinatorics problems, your 'too complex for AI' assumptions about domain-specific reasoning tasks deserve a fresh audit.

// community takes

qnleigh · Hacker News

I am kind of amazed at how many commenters respond to this result by confidently asserting that LLMs will never generate 'truly novel' ideas or problem solutions.> AI is a remixer; it remixes all known ideas together. It won't come up with new ideas> it's not because the mo

virgildotcodes · Hacker News

I don't know why I am still perpetually shocked that the default assumption is that humans are somehow unique.It's this pervasive belief that underlies so much discussion around what it means to be intelligent. The null hypothesis goes out the window.People constantly make comments like &q

Validark · Hacker News

I have long said I am an AI doubter until AI could print out the answers to hard problems or ones requiring tons of innovation. Assuming this is verified to be correct (not by AI) then I just became a believer. I would like to see a few more AI inventions to know for sure, but wow, it really is a ne

alberth · Hacker News

For those, like me, who find the prompt itself of interest …> A full transcript of the original conversation with GPT-5.4 Pro can be found here [0] and GPT-5.4 Pro’s write-up from the end of that transcript can be found here [1].[0] https://epoch.ai/files/open-problems/gp

johnfn · Hacker News

I like to imagine that the number of consumed tokens before a solution is found is a proxy for how difficult a problem is, and it looks like Opus 4.6 consumed around 250k tokens. That means that a tricky React refactor I did earlier today at work was about half as hard as an open problem in mathemat

GPT-5.4 Pro Just Solved an Open Math Problem. Epoch Confirmed It.

// tldr

// deep dive

// read from source

Epoch confirms GPT5.4 Pro solved a frontier math open problem

// community takes

GPT-5.4 Pro Just Solved an Open Math Problem. Epoch Confirmed It.

// tldr

// deep dive

// read from source

Epoch confirms GPT5.4 Pro solved a frontier math open problem

// community takes

// share this