Masked Language Modeling

Masked language modelling (MLM), popularised by BERT in 2018, is a training objective where some Tokens in a sentence are randomly masked and the model is asked to predict them from the surrounding context. The task gives the model bidirectional context — it can use information on both the left and the right at once — which is particularly valuable for classification, semantic search and Embedding generation. Encoder-only models train with MLM, while Decoder-only models like the GPT family use the Autoregressive objective instead. Although standalone MLM models lost their spotlight in the generative-LLM era, RoBERTa, DeBERTa and friends still do quiet work behind retrieval and classification pipelines.