Search Results for author: Erich Elsen

Found 29 papers, 22 papers with code

The State of Sparse Training in Deep Reinforcement Learning

1 code implementation • 17 Jun 2022 • Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision.

reinforcement-learning Reinforcement Learning (RL)

314

Paper
Code

Training Compute-Optimal Large Language Models

2 code implementations • 29 Mar 2022 • Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent SIfre

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.

Ranked #1 on Common Sense Reasoning on BIG-bench (Logical Sequence)

Anachronisms Analogical Similarity +64

15,796

Paper
Code

Unified Scaling Laws for Routed Language Models

no code implementations • 2 Feb 2022 • Aidan Clark, Diego de Las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent SIfre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu, Karen Simonyan

The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count.

Language Modelling

Paper
Add Code

Step-unrolled Denoising Autoencoders for Text Generation

1 code implementation • ICLR 2022 • Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, Aaron van den Oord

In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models.

Denoising Language Modelling +2

Paper
Code

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

2 code implementations • NA 2021 • Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent SIfre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Ranked #1 on Language Modelling on StackExchange

Abstract Algebra Anachronisms +133

760

Paper
Code

Improving language models by retrieving from trillions of tokens

2 code implementations • 8 Dec 2021 • Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, Erich Elsen, Laurent SIfre

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens.

Ranked #1 on Language Modelling on WikiText-103 (using extra training data)

Language Modelling Question Answering +1

47,906

Paper
Code

Top-KAST: Top-K Always Sparse Training

2 code implementations • NeurIPS 2020 • Siddhant M. Jayakumar, Razvan Pascanu, Jack W. Rae, Simon Osindero, Erich Elsen

Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint.

Language Modelling

Paper
Code

Practical Real Time Recurrent Learning with a Sparse Approximation

no code implementations • ICLR 2021 • Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

For highly sparse networks, SnAp with $n=2$ remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online.

Paper
Add Code

On the Generalization Benefit of Noise in Stochastic Gradient Descent

no code implementations • ICML 2020 • Samuel L. Smith, Erich Elsen, Soham De

It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.

Paper
Add Code

Sparse GPU Kernels for Deep Learning

1 code implementation • 18 Jun 2020 • Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

In this work, we study sparse matrices from deep learning applications and identify favorable properties that can be exploited to accelerate computation.

226

Paper
Code

AlgebraNets

1 code implementation • 12 Jun 2020 • Jordan Hoffmann, Simon Schmitt, Simon Osindero, Karen Simonyan, Erich Elsen

Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i. e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$.

Computational Efficiency Image Classification +1

Paper
Code

A Practical Sparse Approximation for Real Time Recurrent Learning

no code implementations • 12 Jun 2020 • Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep).

Paper
Add Code

End-to-End Adversarial Text-to-Speech

2 code implementations • ICLR 2021 • Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan

Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest.

Adversarial Text Dynamic Time Warping +2

124

Paper
Code

Rigging the Lottery: Making All Tickets Winners

10 code implementations • ICML 2020 • Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model.

Ranked #1 on Sparse Learning on ImageNet

Image Classification Language Modelling +1

314

Paper
Code

Fast Sparse ConvNets

4 code implementations • CVPR 2020 • Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiency-accuracy curve.

1,699

Paper
Code

High Fidelity Speech Synthesis with Adversarial Networks

3 code implementations • ICLR 2020 • Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech.

Generative Adversarial Network Speech Synthesis +1

227

Paper
Code

Hyperparameter Tuning and Implicit Regularization in Minibatch SGD

no code implementations • 25 Sep 2019 • Samuel L Smith, Erich Elsen, Soham De

First, we argue that stochastic gradient descent exhibits two regimes with different behaviours; a noise dominated regime which typically arises for small or moderate batch sizes, and a curvature dominated regime which typically arises when the batch size is large.

Paper
Add Code

The Difficulty of Training Sparse Neural Networks

no code implementations • ICML Workshop Deep_Phenomen 2019 • Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

Additionally, our attempts to find a decreasing objective path from "bad" solutions to the "good" ones in the sparse subspace fail.

Paper
Add Code

Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

no code implementations • 7 Jun 2019 • Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters.

Paper
Add Code

The State of Sparsity in Deep Neural Networks

6 code implementations • 25 Feb 2019 • Trevor Gale, Erich Elsen, Sara Hooker

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.

Model Compression Sparse Learning

32,798

Paper
Code

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

4 code implementations • ICLR 2019 • Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck

Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales.

Music Generation Music Modeling +2

213

Paper
Code

Efficient Neural Audio Synthesis

16 code implementations • ICML 2018 • Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, Koray Kavukcuoglu

The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile CPU in real time.

Audio Synthesis Speech Synthesis +1

50,735

Paper
Code

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

2 code implementations • ICML 2018 • Aaron van den Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George van den Driessche, Edward Lockhart, Luis C. Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis

The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system.

Speech Synthesis Vocal Bursts Intensity Prediction

Paper
Code

Onsets and Frames: Dual-Objective Piano Transcription

1 code implementation • 30 Oct 2017 • Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames.

Music Transcription

213

Paper
Code

Mixed Precision Training

8 code implementations • ICLR 2018 • Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu

Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x.

8,358

Paper
Code

Exploring Sparsity in Recurrent Neural Networks

1 code implementation • 17 Apr 2017 • Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta

Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.

Paper
Code

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

2 code implementations • 15 Jul 2016 • Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

8k Caption Generation +3

Paper
Code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

35 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

Ranked #1 on Accented Speech Recognition on VoxForge American-Canadian

Accented Speech Recognition Noisy Speech Recognition

76,588

Paper
Code

Deep Speech: Scaling up end-to-end speech recognition

24 code implementations • 17 Dec 2014 • Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng

We present a state-of-the-art speech recognition system developed using end-to-end deep learning.

Ranked #2 on Accented Speech Recognition on VoxForge American-Canadian

Accented Speech Recognition

24,261

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.