TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	One Billion Word	GCNN-14 bottleneck	PPL	31.9	# 18
Language Modelling	WikiText-103	GCNN-8	Test perplexity	44.9	# 82
Language Modelling	WikiText-103	GCNN-8	Validation perplexity	-	# 38
Language Modelling	WikiText-103	GCNN-8	Test perplexity	37.2	# 78

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/language-modeling-with-gated-convolutional/language-modelling-on-one-billion-word)](https://paperswithcode.com/sota/language-modelling-on-one-billion-word?p=language-modeling-with-gated-convolutional)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/language-modeling-with-gated-convolutional/language-modelling-on-wikitext-103)](https://paperswithcode.com/sota/language-modelling-on-wikitext-103?p=language-modeling-with-gated-convolutional)`

Language Modeling with Gated Convolutional Networks

ICML 2017 · Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier ·

The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

PDF Abstract ICML 2017 PDF ICML 2017 Abstract

Code

Add Remove Mark official

facebookresearch/fairseq

29,257

mhagiwara/nanigonet

Rishit-dagli/GLU

astanway/gated-conv-nets

stikbuf/Language_Modeling

See all 10 implementations

Tasks

Add Remove

Language Modelling

Sentence

Datasets

WikiText-2

WikiText-103 Billion Word Benchmark

Results from the Paper

Edit

Ranked #18 on Language Modelling on One Billion Word

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	One Billion Word	GCNN-14 bottleneck	PPL	31.9	# 18	Compare
Language Modelling	WikiText-103	GCNN-8	Test perplexity	44.9	# 82	Compare
			Validation perplexity	-	# 38	Compare
			Test perplexity	37.2	# 78	Compare

Methods

Add Remove

1x1 Convolution • Adaptive Softmax • Convolution • Gated Convolution • Gated Convolution Network • GLU • Gradient Clipping • Kaiming Initialization • Linear Layer • Nesterov Accelerated Gradient • Residual Connection

Edit Social Preview

Language Modeling with Gated Convolutional Networks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove