R-Transformer: Recurrent Neural Network Enhanced Transformer

12 Jul 2019Zhiwei WangYao MaZitao LiuJiliang Tang

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation procedure... (read more)

PDF Abstract

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Compare
Music Modeling Nottingham R-Transformer NLL 2.37 # 1
Music Modeling Nottingham Transformer NLL 3.34 # 4
Language Modelling Penn Treebank (Character Level) R-Transformer Bit per Character (BPC) 1.24 # 9
Language Modelling Penn Treebank (Word Level) R-Transformer Test perplexity 84.38 # 24
Sequential Image Classification Sequential MNIST R-Transformer Unpermuted Accuracy 99.1% # 2