Wink Pings

Can Large Models Really Handle 'PhD-Level' Reasoning? New Study Pours Cold Water

A new study tested AI programming capabilities with PhD-level dynamic programming problems, yielding less than 1% accuracy. Human innovation + AI execution might be the optimal solution.

Some keep boasting about large models having 'PhD-level reasoning,' but the latest study 'FormulaOne' directly throws a set of PhD-level dynamic programming problems at them—AI coding tools achieved under 1% accuracy.

Paper link: https://alphaxiv.org/pdf/2507.13337

This outcome isn’t surprising. Current AI excels at pattern recombination: stitching together solutions seen in training data. But when faced with truly hardcore problems requiring deep algorithmic innovation, it’s like asking a high schooler to write a PhD thesis.

The comment section is gold:

- One joked, 'Witness the birth of a new benchmark' (Wake up, sweetie, new KPI just dropped)

- Others questioned methodological rigor (reminiscent of Apple’s controversial paper)

- The sharpest take came from @astealoth: 'PhD education has long been corrupted by corporate standardization. AI saw through the charade and low-key got mad'

The study’s conclusion is pragmatic: Humans handle groundbreaking innovation; AI manages execution-layer chaos—that’s the golden combo. Expecting AI to replace the most complex parts of human cognition is like waiting for a calculator to spontaneously prove the Riemann hypothesis.

(Note: Dynamic programming is an advanced methodology in algorithm design, often used for optimizing complex system decisions like robot path planning or gene sequence alignment.)

发布时间: 2025-09-06 01:19