Loading...

Attention mechanism

The method by which transformer models determine which parts of an input matter for producing each part of an output. When generating the next word, the model assigns weights to every previous token, attending more to relevant context and less to irrelevant text. Attention is not comprehension; the model is computing statistical relevance, not understanding meaning. The attention patterns can sometimes be examined to understand why a model produced particular outputs, though this interpretability has limits.

See: Context window; Self-attention; Transformer