MBPP | AI Mevzuları

MBPP (Mostly Basic Python Problems) is a coding Benchmark introduced by Google in Austin et al. (2021), containing about 974 crowd-sourced basic Python tasks. Each item ships with a natural-language description, a reference solution, and unit tests, measuring natural-language-to-code translation. It's positioned as a complement to HumanEval — the latter is shorter and sparser, MBPP broader and more pedestrian. Modern models have largely saturated MBPP at 90%+ pass@1, but it still appears in reports as a quick sanity check.