Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them slow to train... (read more)

PDF Abstract ICLR 2019 PDF ICLR 2019 Abstract

Datasets


Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Language Modelling LAMBADA Universal Transformer (w/ dynamic halting) Accuracy 56.25 # 2
Machine Translation WMT2014 English-German universal transformer base BLEU score 28.9 # 17

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
Attention Dropout
Regularization
Universal Transformer
Transformers
BPE
Subword Segmentation
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
ReLU
Activation Functions
Adam
Stochastic Optimization
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers