GSM8K is a Benchmark of 8,500 grade-school math word problems released by OpenAI in 2021, designed to measure step-by-step reasoning. Each item requires a 2–8 step derivation, which made it a classic showcase of how Chain-of-Thought dramatically boosts performance. Progress from ~35% on GPT-3 to 90%+ on GPT-4 and beyond visually captured the rapid rise in reasoning capability. As models saturated, harder math benchmarks like MATH and AIME took the spotlight.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2021
GSM8K
A benchmark that measures step-by-step reasoning with grade-school math problems.
- EN — English term
- GSM8K
- TR — Turkish term
- GSM8K