Red teaming is a concept from the military and cybersecurity worlds; in AI it means systematically probing a model's or product's safety, ethical, and behavioral limits via a team that plays an adversarial user. Anthropic's 2022 red-team paper and OpenAI's GPT-4 system card helped make the practice an industry standard. Typical targets include harmful content generation, Jailbreak paths, Prompt Injection attacks, Bias leaks, and misinformation in sensitive domains. Teams usually combine human experts with automated adversarial tools, and feed results back into Eval suites and Guardrail design.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Advanced · 2022
Red Teaming
The practice of probing an AI system's limits and weaknesses with adversarial methods.
- EN — English term
- Red Teaming
- TR — Turkish term
- Red Teaming