The Transformer is arguably the most influential architecture in modern AI, introduced by Vaswani et al. from Google Brain and the University of Toronto in the 2017 paper 'Attention Is All You Need'. It dropped the recurrence of older RNNs and LSTMs entirely, building everything on top of Self-Attention and feed-forward layers — which let training parallelise across GPUs and unlocked the era of truly large models. GPT, BERT, Llama 3, Claude, Gemini — virtually every LLM you can name is a Transformer in either decoder-only, encoder-only or Encoder-Decoder form. It has since become the default architecture far beyond NLP, dominating vision, audio, code and even protein structure prediction.
External Links