Attention Is All You Need

Attention Is All You Need is the landmark paper Vaswani and the Google team published in June 2017 (NeurIPS 2017). When RNNs and LSTMs were the standard for sequence-to-sequence tasks, the paper essentially said: "drop all of them — just use Attention".

The result: it didn't just beat the state-of-the-art on machine translation and language modeling — it cut training time by an order of magnitude. From 2018 on, BERT, GPT, and T5 were all built on top of it. It's now one of the most-cited papers in modern AI/ML.

Introduced self-attention, multi-head attention, and positional encoding.
Of the eight authors at publication, four were at Google Brain, three at Google Research, one at the University of Toronto.
Most of the authors later founded their own companies (Cohere, Adept, Inceptive).