BLEU (Bilingual Evaluation Understudy), introduced by Papineni et al. at IBM in 2002, is a classic machine-translation metric based on n-gram overlap with reference translations. For more than two decades it served as the standard yardstick for translation quality — simple, deterministic, and fast made it indispensable in practice. The critique is equally well-known: it's surface-level lexical overlap that captures meaning poorly and penalizes paraphrases. Modern evaluation has shifted toward COMET, BERTScore, and LLM-as-Judge approaches.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2002
BLEU
A classic machine-translation metric based on n-gram overlap with reference translations.
- EN — English term
- BLEU
- TR — Turkish term
- BLEU