TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Machine Translation	WMT2014 English-German	Transformer Cycle (Rev)	BLEU score	35.14	# 1
Machine Translation	WMT2014 English-German	Transformer Cycle (Rev)	SacreBLEU	33.54	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lessons-on-parameter-sharing-across-layers-in/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=lessons-on-parameter-sharing-across-layers-in)`

Lessons on Parameter Sharing across Layers in Transformers

13 Apr 2021 · Sho Takase, Shun Kiyono ·

We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.

PDF Abstract

Code

Add Remove Mark official

takase/share_layer_params official

jaketae/param-share-transformer

Tasks

Add Remove

Machine Translation

Datasets

LibriSpeech

WMT 2014

Results from the Paper

Edit

Ranked #1 on Machine Translation on WMT2014 English-German

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Machine Translation	WMT2014 English-German	Transformer Cycle (Rev)	BLEU score	35.14	# 1		Compare
Machine Translation	WMT2014 English-German	Transformer Cycle (Rev)	SacreBLEU	33.54	# 2		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Lessons on Parameter Sharing across Layers in Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove