TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	WikiText-103	T2R + Pretrain	Validation perplexity	19	# 17
Language Modelling	WikiText-103	T2R + Pretrain	Test perplexity	19.6	# 41
Machine Translation	WMT2014 English-French	T2R + Pretrain	BLEU score	42.1	# 18
Machine Translation	WMT2014 English-French	T2R + Pretrain	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-French	T2R + Pretrain	Operations per network pass	None	# 1
Machine Translation	WMT2014 English-German	T2R + Pretrain	BLEU score	28.7	# 39
Machine Translation	WMT2014 English-German	T2R + Pretrain	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-German	T2R + Pretrain	Operations per network pass	None	# 1
Machine Translation	WMT2017 Chinese-English	T2R + Pretrain	BLEU	23.8	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/finetuning-pretrained-transformers-into-rnns/machine-translation-on-wmt2017-chinese)](https://paperswithcode.com/sota/machine-translation-on-wmt2017-chinese?p=finetuning-pretrained-transformers-into-rnns)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/finetuning-pretrained-transformers-into-rnns/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=finetuning-pretrained-transformers-into-rnns)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/finetuning-pretrained-transformers-into-rnns/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=finetuning-pretrained-transformers-into-rnns)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/finetuning-pretrained-transformers-into-rnns/language-modelling-on-wikitext-103)](https://paperswithcode.com/sota/language-modelling-on-wikitext-103?p=finetuning-pretrained-transformers-into-rnns)`

Finetuning Pretrained Transformers into RNNs

EMNLP 2021 · Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith ·

Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. But this comes with a significant computational cost, as the attention mechanism's complexity scales quadratically with sequence length. Efficient transformer variants have received increasing interest in recent works. Among them, a linear-complexity recurrent variant has proven well suited for autoregressive generation. It approximates the softmax attention with randomized or heuristic feature maps, but can be difficult to train and may yield suboptimal accuracy. This work aims to convert a pretrained transformer into its efficient recurrent counterpart, improving efficiency while maintaining accuracy. Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune. With a learned feature map, our approach provides an improved tradeoff between efficiency and accuracy over the standard transformer and other recurrent variants. We also show that the finetuning process has lower training cost relative to training these recurrent variants from scratch. As many models for natural language tasks are increasingly dependent on large-scale pretrained transformers, this work presents a viable approach to improving inference efficiency without repeating the expensive pretraining process.

PDF Abstract EMNLP 2021 PDF EMNLP 2021 Abstract

Code

Add Remove Mark official

yashbonde/RNN-sim

Tasks

Add Remove

Language Modelling

Machine Translation

Text Generation

Datasets

WikiText-2

WikiText-103

WMT 2014

Results from the Paper

Edit

Ranked #2 on Machine Translation on WMT2017 Chinese-English

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	WikiText-103	T2R + Pretrain	Validation perplexity	19	# 17	Compare
Language Modelling	WikiText-103	T2R + Pretrain	Test perplexity	19.6	# 41	Compare
Machine Translation	WMT2014 English-French	T2R + Pretrain	BLEU score	42.1	# 18	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2014 English-German	T2R + Pretrain	BLEU score	28.7	# 39	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2017 Chinese-English	T2R + Pretrain	BLEU	23.8	# 2	Compare

Methods

Add Remove

Softmax

Edit Social Preview

Finetuning Pretrained Transformers into RNNs

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove