Prompt caching is the server-side feature that stores and reuses the KV Cache for a long recurring prefix — system prompt, document set, Few-shot examples — when many requests share it. Anthropic launched "Prompt Caching" in August 2024, OpenAI enabled automatic prompt cache in October 2024, and Google offers "context caching" for Gemini. Cost drops of 50–90% and TTFT improvements of up to 50% are typical. It has become a near-mandatory optimization for RAG apps over long documents, multi-turn agent flows, and any product with a large system prompt.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2024
Prompt Caching
A feature that caches large recurring prompt prefixes for major cost and latency savings.
- EN — English term
- Prompt Caching
- TR — Turkish term
- Prompt Önbellekleme