Prompt Caching

Prompt caching is the server-side feature that stores and reuses the KV Cache for a long recurring prefix — system prompt, document set, Few-shot examples — when many requests share it. Anthropic launched "Prompt Caching" in August 2024, OpenAI enabled automatic prompt cache in October 2024, and Google offers "context caching" for Gemini. Cost drops of 50–90% and TTFT improvements of up to 50% are typical. It has become a near-mandatory optimization for RAG apps over long documents, multi-turn agent flows, and any product with a large system prompt.