LLM-as-Judge is an evaluation method where one LLM scores another model's output, or makes a Pairwise Comparison between two candidates. Lmsys's MT-Bench work in 2023 showed this approach can correlate strongly with human preference and the technique was widely adopted. Its main advantages are automation, scalability, and relatively low cost. Typical biases — self-preference, position bias, length preference — must be controlled, often via response swapping, multiple judges, and rubric calibration.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2023
LLM-as-Judge
An evaluation method in which an LLM is used to judge another model's output.
- EN — English term
- LLM-as-Judge
- TR — Turkish term
- Yargıç Olarak LLM