TTFT (time to first token) is the latency between sending a request and receiving the very first generated Token — the most critical component of perceived chat responsiveness. Most of it is spent in the 'prefill' phase, when the prompt is being loaded into the KV Cache; long contexts, cold models and heavy traffic all push it up. No matter how high your TPS is, a slow TTFT makes the system feel sluggish — which is why serving stacks like vLLM lean on PagedAttention, Prompt Caching and Continuous Batching to push it down. Streaming UIs can't hide a poor TTFT, but they do let users feel the benefits of high TPS more directly.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2023
Time to First Token (TTFT)
The time between sending a request and receiving the first generated token.
- EN — English term
- Time to First Token (TTFT)
- TR — Turkish term
- İlk Token Süresi (TTFT)