TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	SQuAD1.1	SRU	EM	71.4	# 150
Question Answering	SQuAD1.1	SRU	F1	80.2	# 153
Question Answering	SQuAD1.1	SRU	Hardware Burden	4G	# 1
Question Answering	SQuAD1.1	SRU	Operations per network pass	None	# 1
Question Answering	SQuAD1.1 dev	SRU	EM	71.4	# 32
Question Answering	SQuAD1.1 dev	SRU	F1	80.2	# 35
Machine Translation	WMT2014 English-German	Transformer + SRU	BLEU score	28.4	# 44
Machine Translation	WMT2014 English-German	Transformer + SRU	Hardware Burden	34G	# 1
Machine Translation	WMT2014 English-German	Transformer + SRU	Operations per network pass	None	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-recurrent-units-for-highly/question-answering-on-squad11-dev)](https://paperswithcode.com/sota/question-answering-on-squad11-dev?p=simple-recurrent-units-for-highly)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-recurrent-units-for-highly/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=simple-recurrent-units-for-highly)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-recurrent-units-for-highly/question-answering-on-squad11)](https://paperswithcode.com/sota/question-answering-on-squad11?p=simple-recurrent-units-for-highly)`

Simple Recurrent Units for Highly Parallelizable Recurrence

EMNLP 2018 · Tao Lei, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi ·

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model on translation by incorporating SRU into the architecture.

PDF Abstract EMNLP 2018 PDF EMNLP 2018 Abstract

Code

Add Remove Mark official

asappresearch/sru official

2,098

aymericdamien/TopDeepLearning

5,795

memray/OpenNMT-kpg-release

213

bzhangGo/lrn

Helsinki-NLP/OpenNMT-py

See all 11 implementations

Tasks

Add Remove

General Classification

Machine Translation

Question Answering

Text Classification

Translation

Datasets

SST

SQuAD

MPQA Opinion Corpus

WMT 2014

Results from the Paper

Edit

Ranked #32 on Question Answering on SQuAD1.1 dev

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	SQuAD1.1	SRU	EM	71.4	# 150	Compare
			F1	80.2	# 153	Compare
			Hardware Burden	4G	# 1	Compare
			Operations per network pass	None	# 1	Compare
Question Answering	SQuAD1.1 dev	SRU	EM	71.4	# 32	Compare
Question Answering	SQuAD1.1 dev	SRU	F1	80.2	# 35	Compare
Machine Translation	WMT2014 English-German	Transformer + SRU	BLEU score	28.4	# 44	Compare
			Hardware Burden	34G	# 1	Compare
			Operations per network pass	None	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Highway Layer • Label Smoothing • Layer Normalization • Linear Layer • LSTM • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Sigmoid Activation • Softmax • SRU • Tanh Activation • Transformer

Edit Social Preview

Simple Recurrent Units for Highly Parallelizable Recurrence

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove