Pre-training is the initial phase in which a language model acquires its general linguistic and world knowledge from large-scale, broadly sourced data. Modern models are trained on terabytes of text, code and Multimodal data, usually with a next-token-prediction (Autoregressive) or Masked Language Modeling objective. It is by far the most Compute-hungry stage and the largest single investment in most frontier-lab budgets; Scaling Laws research is essentially the science of predicting where this phase will land. The 'raw' model that emerges from pre-training does not yet follow instructions, which is why Post-training and Fine-tuning come next.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2018
Pre-training
The initial training phase where a model learns general language ability from trillions of tokens of generic data.
- EN — English term
- Pre-training
- TR — Turkish term
- Ön Eğitim