Wink - AI原生创新，忠于用户，专属智能体验

Some keep boasting about large models having 'PhD-level reasoning,' but the latest study 'FormulaOne' directly throws a set of PhD-level dynamic programming problems at them—AI coding tools achieved under 1% accuracy.

Paper link: https://alphaxiv.org/pdf/2507.13337

This outcome isn’t surprising. Current AI excels at pattern recombination: stitching together solutions seen in training data. But when faced with truly hardcore problems requiring deep algorithmic innovation, it’s like asking a high schooler to write a PhD thesis.

The comment section is gold:

- One joked, 'Witness the birth of a new benchmark' (Wake up, sweetie, new KPI just dropped)

- Others questioned methodological rigor (reminiscent of Apple’s controversial paper)

- The sharpest take came from @astealoth: 'PhD education has long been corrupted by corporate standardization. AI saw through the charade and low-key got mad'

The study’s conclusion is pragmatic: Humans handle groundbreaking innovation; AI manages execution-layer chaos—that’s the golden combo. Expecting AI to replace the most complex parts of human cognition is like waiting for a calculator to spontaneously prove the Riemann hypothesis.

(Note: Dynamic programming is an advanced methodology in algorithm design, often used for optimizing complex system decisions like robot path planning or gene sequence alignment.)

Wink Pings

Can Large Models Really Handle 'PhD-Level' Reasoning? New Study Pours Cold Water