Attention

Grouped-query attention

Introduced by Ainslie et al. in GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Grouped-query attention an interpolation of multi-query and multi-head attention that achieves quality close to multi-head at comparable speed to multi-query attention.

Source: GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Papers


Paper Code Results Date Stars

Categories