Search Results for author: Stephen Merity

Found 8 papers, 6 papers with code

Single Headed Attention RNN: Stop Thinking With Your Head

5 code implementations • 26 Nov 2019 • Stephen Merity

The leading approaches in language modeling are all obsessed with TV shows of my youth - namely Transformers and Sesame Street.

Ranked #27 on Language Modelling on enwik8

Hyperparameter Optimization Language Modelling

1,174

Paper
Code

An Analysis of Neural Language Modeling at Multiple Scales

12 code implementations • 22 Mar 2018 • Stephen Merity, Nitish Shirish Keskar, Richard Socher

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures.

Ranked #7 on Language Modelling on Penn Treebank (Character Level)

Language Modelling

1,956

Paper
Code

A Flexible Approach to Automated RNN Architecture Generation

no code implementations • ICLR 2018 • Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher

The process of designing neural architectures requires expert knowledge and extensive trial and error.

Language Modelling Machine Translation +2

Paper
Add Code

Regularizing and Optimizing LSTM Language Models

47 code implementations • ICLR 2018 • Stephen Merity, Nitish Shirish Keskar, Richard Socher

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering.

Ranked #17 on Language Modelling on Penn Treebank (Word Level)

Language Modelling Translation

32,756

Paper
Code

Revisiting Activation Regularization for Language RNNs

no code implementations • 3 Aug 2017 • Stephen Merity, Bryan McCann, Richard Socher

Both of these techniques require minimal modification to existing RNN architectures and result in performance improvements comparable or superior to more complicated regularization techniques or custom cell architectures.

L2 Regularization Language Modelling

Paper
Add Code

Quasi-Recurrent Neural Networks

8 code implementations • 5 Nov 2016 • James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher

Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences.

Ranked #15 on Machine Translation on IWSLT2015 German-English

Language Modelling Machine Translation +4

1,257

Paper
Code

Pointer Sentinel Mixture Models

9 code implementations • 26 Sep 2016 • Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies.

Language Modelling

201

Paper
Code

Dynamic Memory Networks for Visual and Textual Question Answering

11 code implementations • 4 Mar 2016 • Caiming Xiong, Stephen Merity, Richard Socher

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.

Ranked #4 on Visual Question Answering (VQA) on VQA v1 test-std

Question Answering Visual Question Answering

240

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.