MMLU | AI Mevzuları

MMLU (Massive Multitask Language Understanding), released by Hendrycks et al. in 2021, is a multiple-choice Benchmark of about 16,000 questions across 57 subjects — math, law, history, medicine. It became the headline metric for frontier-model comparisons; the climb from ~70% at GPT-3.5 level to 85%+ in GPT-4 and beyond is one of the cleanest illustrations of progress. As models neared saturation, harder variants like MMLU-Pro (2024) emerged. It's still one of the most-cited references for measuring a model's general knowledge breadth.