Over-refusal is when a model declines requests that are harmless or clearly legitimate, such as discussing history or helping with fiction. It usually emerges when providers tune safety too aggressively or the RLHF reward model is over-cautious. Anthropic and OpenAI have publicly worked on rebalancing helpfulness against Refusal in successive releases. In practice over-refusal is a real product metric — it directly affects perceived usefulness and user retention.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2023
Over-refusal
When a model refuses harmless or reasonable requests it should have answered.
- EN — English term
- Over-refusal
- TR — Turkish term
- Aşırı Reddetme