Pairwise comparison is the evaluation method that places two models' (or two versions') answers to the same prompt side by side and asks which is better. It's more reliable than absolute scoring because both humans and LLM-as-Judge tend to be more consistent in relative judgments. Chatbot Arena scaled this idea: random blind pairings rated by users and aggregated into an Elo Rating leaderboard. In production teams it's commonly the engine behind A/B tests and version-promotion decisions.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Beginner · 2023
Pairwise Comparison
An eval method that asks which of two models' answers to the same prompt is better.
- EN — English term
- Pairwise Comparison
- TR — Turkish term
- İkili Karşılaştırma