Can AI Think? The "Human-Like" Claim About the Centaur Model Comes Under Fire

The question of whether AI can think is hot again. The new study quantifies how much of the "human-like" performance reported in LLM evaluations reflects genuine comprehension versus pure pattern matching.

Centaur's Claim

Centaur — a model built on standard LLMs and fine-tuned on psychological experiment data — had reported strong results in decision-making and executive control. Nature framed it as an example success in the July 2025 issue.

The New Study's Findings

According to research published in National Science Open by Zhejiang University researchers, much of Centaur's performance may stem from overfitting. The model recognises patterns in its training data and reproduces the expected answers rather than understanding the tasks.

The Striking Test

The researchers replaced the original multiple-choice prompts with "please pick option A." Centaur kept selecting the "correct answers" from the original dataset — proving it doesn't interpret the meaning of the questions.

Implications for AI Evaluation

Care is required when assessing LLM capabilities. The "black box" nature hides how outputs are generated — the risk of hallucination and misinterpretation is high. Evaluation suites must add out-of-distribution and counterfactual trials, not only headline benchmarks.

Cite this: AI Mevzuları · April 28, 2026 · aimevzulari.com/haberler/yapay-zeka-dusunebiliyor-mu-centaur-arastirma-supheli