Attention Is All You Need is the landmark paper Vaswani and the Google team published in June 2017 (NeurIPS 2017). When RNNs and LSTMs were the standard for sequence-to-sequence tasks, the paper essentially said: "drop all of them — just use Attention".
The result: it didn't just beat the state-of-the-art on machine translation and language modeling — it cut training time by an order of magnitude. From 2018 on, BERT, GPT, and T5 were all built on top of it. It's now one of the most-cited papers in modern AI/ML.
- Introduced self-attention, multi-head attention, and positional encoding.
- Of the eight authors at publication, four were at Google Brain, three at Google Research, one at the University of Toronto.
- Most of the authors later founded their own companies (Cohere, Adept, Inceptive).
External Links