Can Large Models Really Handle 'PhD-Level' Reasoning? New Study Pours Cold Water
A new study tested AI programming capabilities with PhD-level dynamic programming problems, yielding less than 1% accuracy. Human innovation + AI execution might be the optimal solution.
Some keep boasting about large models having 'PhD-level reasoning,' but the latest study 'FormulaOne' directly throws a set of PhD-level dynamic programming problems at them—AI coding tools achieved under 1% accuracy.
Paper link: https://alphaxiv.org/pdf/2507.13337
This outcome isn’t surprising. Current AI excels at pattern recombination: stitching together solutions seen in training data. But when faced with truly hardcore problems requiring deep algorithmic innovation, it’s like asking a high schooler to write a PhD thesis.
The comment section is gold:
- One joked, 'Witness the birth of a new benchmark' (Wake up, sweetie, new KPI just dropped)
- Others questioned methodological rigor (reminiscent of Apple’s controversial paper)
- The sharpest take came from @astealoth: 'PhD education has long been corrupted by corporate standardization. AI saw through the charade and low-key got mad'
The study’s conclusion is pragmatic: Humans handle groundbreaking innovation; AI manages execution-layer chaos—that’s the golden combo. Expecting AI to replace the most complex parts of human cognition is like waiting for a calculator to spontaneously prove the Riemann hypothesis.
(Note: Dynamic programming is an advanced methodology in algorithm design, often used for optimizing complex system decisions like robot path planning or gene sequence alignment.)
发布时间: 2025-09-06 01:19