Prompt Injection

Prompt injection, named in Simon Willison's September 2022 post, is the attack class where an adversary uses untrusted text — a comment, a web page, an email — to tell the model "ignore previous instructions and do this instead." It splits into direct (the user's own prompt) and indirect (the model reads adversarial instructions inside a retrieved document) variants; the latter is especially dangerous in RAG and Browser Use systems. There is no complete fix yet; defenses combine input/output Guardrails, careful parser design, least-privilege tools, and human approval steps. OWASP listed it as the #1 risk in its 2023 LLM security top-ten.