Funnel Transformer Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**Funnel Transformer** is a type of [Transformer](https://paperswithcode.com/methods/category/transformers) that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. By re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, the model capacity is further improved. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-[transformer](https://paperswithcode.com/method/transformer) is able to recover a deep representation for each token from the reduced hidden sequence via a decoder.

The proposed model keeps the same overall skeleton of interleaved S-[Attn](https://paperswithcode.com/method/scaled) and P-[FFN](https://paperswithcode.com/method/dense-connections) sub-modules wrapped by [residual connection](https://paperswithcode.com/method/residual-connection) and [layer normalization](https://paperswithcode.com/method/layer-normalization). But differently, to achieve representation compression and computation reduction, THE model employs an encoder that gradually reduces the sequence length of the hidden states as the layer gets deeper. In addition, for tasks involving per-token predictions like pretraining, a simple decoder is used to reconstruct a full sequence of token-level representations from the compressed encoder output. Compression is achieved via a pooling operation,

Code Snippet URL (optional):

Image

Currently: methods/d8144571-43c1-46f0-b395-80486d862dd6.png Clear
Change:

Attached collections:

TRANSFORMERS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Reading Comprehension	1	50.00%
Text Classification	1	50.00%

Component	Type	Add Remove
Dense Connections	Feedforward Networks
Layer Normalization	Normalization
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

Funnel Transformer

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove