Context Window

The context window is the maximum number of Tokens a language model can 'see' in a single pass — system prompt, chat history, user message and the model's own response all share the same budget. Early GPT models had only 2K-4K tokens; today Claude offers 200K and Gemini ships variants with 1M, turning context size into a major axis of competition. Larger windows let you work with longer documents, but Self-Attention's quadratic cost means Latency and price both climb fast. In the Long Context era, 'how much can we actually use well?' is a far more interesting engineering question than the raw window number.