A guardrail is a control layer that keeps an LLM or Agent within sanctioned boundaries — input filtering, output validation, allow-listed tools, or topic-scope policies. NVIDIA NeMo Guardrails, Llama Guard, and Anthropic's safety filters are well-known examples. A guardrail typically hardens the application around the model rather than the model itself, defending against Prompt Injection, data leakage, and wrong actions caused by Hallucination. Reliable production systems don't trust model behavior alone; they layer guardrails and Evals as a defense in depth.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2023
Guardrail
A control layer that keeps an LLM or agent within sanctioned behavior boundaries.
- EN — English term
- Guardrail
- TR — Turkish term
- Korkuluk (Guardrail)