Transformers

Funnel Transformer is a type of Transformer that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. By re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, the model capacity is further improved. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-transformer is able to recover a deep representation for each token from the reduced hidden sequence via a decoder.

The proposed model keeps the same overall skeleton of interleaved S-Attn and P-FFN sub-modules wrapped by residual connection and layer normalization. But differently, to achieve representation compression and computation reduction, THE model employs an encoder that gradually reduces the sequence length of the hidden states as the layer gets deeper. In addition, for tasks involving per-token predictions like pretraining, a simple decoder is used to reconstruct a full sequence of token-level representations from the compressed encoder output. Compression is achieved via a pooling operation,

Source: Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Reading Comprehension 1 50.00%
Text Classification 1 50.00%

Categories