Prompt injection, named in Simon Willison's September 2022 post, is the attack class where an adversary uses untrusted text — a comment, a web page, an email — to tell the model "ignore previous instructions and do this instead." It splits into direct (the user's own prompt) and indirect (the model reads adversarial instructions inside a retrieved document) variants; the latter is especially dangerous in RAG and Browser Use systems. There is no complete fix yet; defenses combine input/output Guardrails, careful parser design, least-privilege tools, and human approval steps. OWASP listed it as the #1 risk in its 2023 LLM security top-ten.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2022
Prompt Injection
An attack where an adversary tries to override an LLM's instructions via untrusted external text.
- EN — English term
- Prompt Injection
- TR — Turkish term
- Prompt Enjeksiyonu