Transformers

RealFormer

Introduced by He et al. in RealFormer: Transformer Likes Residual Attention

RealFormer is a type of Transformer based on the idea of residual attention. It adds skip edges to the backbone Transformer to create multiple direct paths, one for each type of attention module. It adds no parameters or hyper-parameters. Specifically, RealFormer uses a Post-LN style Transformer as backbone and adds skip edges to connect Multi-Head Attention modules in adjacent layers.

Source: RealFormer: Transformer Likes Residual Attention

Papers


Paper Code Results Date Stars

Categories