Finetuning Pretrained Transformers into RNNs

Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. This comes with a significant computational overhead, as the attention mechanism scales with a quadratic complexity in sequence length... (read more)

PDF Abstract
No code implementations yet. Submit your code now
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Language Modelling WikiText-103 T2R + Pretrain Validation perplexity 19 # 12
Test perplexity 19.6 # 20
Machine Translation WMT2014 English-French T2R + Pretrain BLEU score 42.1 # 15
Machine Translation WMT2014 English-German T2R + Pretrain BLEU score 28.7 # 30
Machine Translation WMT2017 Chinese-English T2R + Pretrain BLEU 23.8 # 1

Methods used in the Paper


METHOD TYPE
Softmax
Output Functions