A token is the smallest unit a Transformer model uses while processing text — it can be a full word, a sub-word fragment, or a single character. Modern LLMs typically rely on sub-word tokenisation algorithms such as BPE (Byte-Pair Encoding) or SentencePiece, which let them represent even rare and unseen words by splitting them into meaningful pieces. Each token maps to an Embedding vector learned during training; the model's entire knowledge effectively lives in this vector space. Because everything from context windows to API pricing is measured in tokens, understanding tokenisation is a foundational discipline of LLM engineering.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Beginner · 2017
Token
The smallest unit a language model processes — a word fragment, character, or symbol.
- EN — English term
- Token
- TR — Turkish term
- Token