Search Results for author: Erich Elsen

Found 29 papers, 22 papers with code

The State of Sparse Training in Deep Reinforcement Learning

1 code implementation17 Jun 2022 Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision.

reinforcement-learning Reinforcement Learning (RL)

Step-unrolled Denoising Autoencoders for Text Generation

1 code implementation ICLR 2022 Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, Aaron van den Oord

In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models.

Denoising Language Modelling +2

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

2 code implementations NA 2021 Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent SIfre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Abstract Algebra Anachronisms +133

Top-KAST: Top-K Always Sparse Training

2 code implementations NeurIPS 2020 Siddhant M. Jayakumar, Razvan Pascanu, Jack W. Rae, Simon Osindero, Erich Elsen

Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint.

Language Modelling

Practical Real Time Recurrent Learning with a Sparse Approximation

no code implementations ICLR 2021 Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

For highly sparse networks, SnAp with $n=2$ remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online.

On the Generalization Benefit of Noise in Stochastic Gradient Descent

no code implementations ICML 2020 Samuel L. Smith, Erich Elsen, Soham De

It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.

Sparse GPU Kernels for Deep Learning

1 code implementation18 Jun 2020 Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

In this work, we study sparse matrices from deep learning applications and identify favorable properties that can be exploited to accelerate computation.

AlgebraNets

1 code implementation12 Jun 2020 Jordan Hoffmann, Simon Schmitt, Simon Osindero, Karen Simonyan, Erich Elsen

Neural networks have historically been built layerwise from the set of functions in ${f: \mathbb{R}^n \to \mathbb{R}^m }$, i. e. with activations and weights/parameters represented by real numbers, $\mathbb{R}$.

Computational Efficiency Image Classification +1

A Practical Sparse Approximation for Real Time Recurrent Learning

no code implementations12 Jun 2020 Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep).

End-to-End Adversarial Text-to-Speech

2 code implementations ICLR 2021 Jeff Donahue, Sander Dieleman, Mikołaj Bińkowski, Erich Elsen, Karen Simonyan

Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest.

Adversarial Text Dynamic Time Warping +2

Rigging the Lottery: Making All Tickets Winners

10 code implementations ICML 2020 Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model.

Image Classification Language Modelling +1

Fast Sparse ConvNets

4 code implementations CVPR 2020 Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet v1, MobileNet v2 and EfficientNet architectures substantially outperform strong dense baselines on the efficiency-accuracy curve.

High Fidelity Speech Synthesis with Adversarial Networks

3 code implementations ICLR 2020 Mikołaj Bińkowski, Jeff Donahue, Sander Dieleman, Aidan Clark, Erich Elsen, Norman Casagrande, Luis C. Cobo, Karen Simonyan

However, their application in the audio domain has received limited attention, and autoregressive models, such as WaveNet, remain the state of the art in generative modelling of audio signals such as human speech.

Generative Adversarial Network Speech Synthesis +1

Hyperparameter Tuning and Implicit Regularization in Minibatch SGD

no code implementations25 Sep 2019 Samuel L Smith, Erich Elsen, Soham De

First, we argue that stochastic gradient descent exhibits two regimes with different behaviours; a noise dominated regime which typically arises for small or moderate batch sizes, and a curvature dominated regime which typically arises when the batch size is large.

The Difficulty of Training Sparse Neural Networks

no code implementations ICML Workshop Deep_Phenomen 2019 Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

Additionally, our attempts to find a decreasing objective path from "bad" solutions to the "good" ones in the sparse subspace fail.

Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

no code implementations7 Jun 2019 Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters.

The State of Sparsity in Deep Neural Networks

6 code implementations25 Feb 2019 Trevor Gale, Erich Elsen, Sara Hooker

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.

Model Compression Sparse Learning

Onsets and Frames: Dual-Objective Piano Transcription

1 code implementation30 Oct 2017 Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, Douglas Eck

We advance the state of the art in polyphonic piano music transcription by using a deep convolutional and recurrent neural network which is trained to jointly predict onsets and frames.

Music Transcription

Exploring Sparsity in Recurrent Neural Networks

1 code implementation17 Apr 2017 Sharan Narang, Erich Elsen, Gregory Diamos, Shubho Sengupta

Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

2 code implementations15 Jul 2016 Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

8k Caption Generation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.