Search Results for author: Utku Evci

Found 22 papers, 12 papers with code

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

1 code implementation7 Feb 2024 Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions.

Scaling Laws for Sparsely-Connected Foundation Models

no code implementations15 Sep 2023 Elias Frantar, Carlos Riquelme, Neil Houlsby, Dan Alistarh, Utku Evci

We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i. e., "foundation models"), in both vision and language domains.

Computational Efficiency

Dynamic Sparse Training with Structured Sparsity

1 code implementation3 May 2023 Mike Lasby, Anna Golubeva, Utku Evci, Mihai Nica, Yani Ioannou

Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference.

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

1 code implementation24 Feb 2023 Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity.

reinforcement-learning Reinforcement Learning (RL)

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

no code implementations15 Sep 2022 Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs).

The State of Sparse Training in Deep Reinforcement Learning

1 code implementation17 Jun 2022 Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision.

reinforcement-learning Reinforcement Learning (RL)

GradMax: Growing Neural Networks using Gradient Information

1 code implementation ICLR 2022 Utku Evci, Bart van Merriënboer, Thomas Unterthiner, Max Vladymyrov, Fabian Pedregosa

The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified.

Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning

1 code implementation10 Jan 2022 Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael C. Mozer

We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain.

Transfer Learning

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

no code implementations29 Sep 2021 Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael Curtis Mozer

We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain.

Transfer Learning

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

1 code implementation6 Apr 2021 Vincent Dumoulin, Neil Houlsby, Utku Evci, Xiaohua Zhai, Ross Goroshin, Sylvain Gelly, Hugo Larochelle

To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB).

Few-Shot Learning General Classification +1

Practical Real Time Recurrent Learning with a Sparse Approximation

no code implementations ICLR 2021 Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

For highly sparse networks, SnAp with $n=2$ remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online.

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

1 code implementation7 Oct 2020 Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training.

A Practical Sparse Approximation for Real Time Recurrent Learning

no code implementations12 Jun 2020 Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

Current methods for training recurrent neural networks are based on backpropagation through time, which requires storing a complete history of network states, and prohibits updating the weights `online' (after every timestep).

Rigging the Lottery: Making All Tickets Winners

10 code implementations ICML 2020 Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model.

Image Classification Language Modelling +1

Natural Language Understanding with the Quora Question Pairs Dataset

no code implementations1 Jul 2019 Lakshay Sharma, Laura Graesser, Nikita Nangia, Utku Evci

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset.

BIG-bench Machine Learning Natural Language Understanding

The Difficulty of Training Sparse Neural Networks

no code implementations ICML Workshop Deep_Phenomen 2019 Utku Evci, Fabian Pedregosa, Aidan Gomez, Erich Elsen

Additionally, our attempts to find a decreasing objective path from "bad" solutions to the "good" ones in the sparse subspace fail.

Mean Replacement Pruning

no code implementations ICLR 2019 Utku Evci, Nicolas Le Roux, Pablo Castro, Leon Bottou

Finally, we show that the units selected by the best performing scoring functions are somewhat consistent over the course of training, implying the dead parts of the network appear during the stages of training.

Detecting Dead Weights and Units in Neural Networks

no code implementations15 Jun 2018 Utku Evci

We propose an efficient way for detecting dead units and use it to select which units to prune.

Quantization

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

no code implementations ICLR 2018 Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

In particular, we present a case that links the two observations: small and large batch gradient descent appear to converge to different basins of attraction but we show that they are in fact connected through their flat region and so belong to the same basin.

Cannot find the paper you are looking for? You can Submit a new open access paper.