Fields Medalist says ChatGPT 5.5 Pro delivered "PhD-level" math research in under two hours with zero human help

Timothy Gowers, a Fields Medalist, tested ChatGPT 5.5 Pro on open problems in number theory and found it generated original research contributions without human intervention. The model improved an exponential bound to a polynomial one in under an hour, a breakthrough that MIT researcher verified as involving a "completely original" idea.

The results suggest LLMs now operate at PhD-level mathematical competence. Gowers ran the system on genuine unsolved problems, not toy exercises. The model not only produced valid mathematics but discovered approaches researchers hadn't identified before. This wasn't pattern matching against training data. The improvements were novel enough to constitute publishable research.

The implications are immediate and uncomfortable for academic mathematics. Gowers notes the bar for what counts as a mathematical contribution has shifted. Proving something an LLM cannot solve becomes the new standard for meaningful work. Routine problem-solving, incremental progress on known techniques, or technical improvements on existing bounds now risk obsolescence.

This differs sharply from hype around AI and mathematics from recent years. ChatGPT 5.5 Pro didn't just pass standardized tests or solve textbook problems. It engaged with live open problems and contributed genuinely novel solutions. The mathematics was rigorous. The ideas were fresh.

The research timeline matters. Two hours to deliver PhD-level output with zero human guidance eliminates the time bottleneck that has historically gated mathematical progress. Researchers spent months or years on problems ChatGPT 5.5 Pro addressed in minutes.

Gowers raises a practical question: What remains for mathematicians to do? The answer involves problems so hard or so novel that no LLM has encountered them. The field shifts from problem-solving to problem-generation. Mathematicians become curators of difficulty rather than executors of solutions.

This accelerates existing pressure on mathematics departments and research agend

Fields Medalist says ChatGPT 5.5 Pro delivered "PhD-level" math research in under two hours with zero human help

Intent-based chaos testing is designed for when AI behaves confidently — and wrongly

5% GPU utilization: The $401 billion AI infrastructure problem enterprises can't keep ignoring

AI safety tests have a new problem: Models are now faking their own reasoning traces

Get Daily AIWireDaily