Cross-attention is the mechanism that lets one sequence attend to information from a different sequence — in the classic Transformer, it is where the Decoder looks at the Encoder output. Unlike Self-Attention, queries come from one source while keys and values come from another, so a decoder generating Turkish translations can consult the encoder's representation of the English input. Vision-language models (VLM), diffusion image models and most text-conditioned generation systems rely on cross-attention to ground their output in the conditioning input. It is, in short, where an output sequence learns how to consume an input.
MEVZU N°124ISTANBULYEAR I — VOL. III
Glossary · Intermediate · 2017
Cross-Attention
An attention mechanism where one sequence attends to a different sequence, typically connecting encoder and decoder.
- EN — English term
- Cross-Attention
- TR — Turkish term
- Çapraz-Dikkat