Transformers

PAR Transformer

Introduced by Mandava et al. in Pay Attention when Required

PAR Transformer is a Transformer model that uses 63% fewer self-attention blocks, replacing them with feed-forward blocks, while retaining test accuracies. It is based on the Transformer-XL architecture and uses neural architecture search to find an an efficient pattern of blocks in the transformer architecture.

Source: Pay Attention when Required

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 1 25.00%
Paraphrase Identification 1 25.00%
Question Answering 1 25.00%
Sentiment Analysis 1 25.00%

Categories