Latency is the time between sending a request and receiving a result — a metric inherited from web and distributed-systems engineering, and one of the most product-critical numbers in any LLM stack. In an LLM context latency isn't a single number: TTFT tracks the first token, 'tail latency' tracks the slowest 1% of requests, and total completion time measures end-to-end. At equal Throughput, low tail latency is dramatically more expensive, which is why product requirements are usually expressed in percentiles like 'p95 < 2 seconds'. Streaming UIs can't hide latency, but they reshape the user's perception of waiting — which is why they have become near-standard in modern LLM products.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Beginner · 2000
Latency
The time between issuing a request and receiving a result.
- EN — English term
- Latency
- TR — Turkish term
- Gecikme (Latency)