Masked language modelling (MLM), popularised by BERT in 2018, is a training objective where some Tokens in a sentence are randomly masked and the model is asked to predict them from the surrounding context. The task gives the model bidirectional context — it can use information on both the left and the right at once — which is particularly valuable for classification, semantic search and Embedding generation. Encoder-only models train with MLM, while Decoder-only models like the GPT family use the Autoregressive objective instead. Although standalone MLM models lost their spotlight in the generative-LLM era, RoBERTa, DeBERTa and friends still do quiet work behind retrieval and classification pipelines.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2018
Masked Language Modeling
A training objective where the model learns to predict tokens that have been masked out of a sentence.
- EN — English term
- Masked Language Modeling
- TR — Turkish term
- Maskeli Dil Modelleme