Search Results for author: Soham De

Found 33 papers, 7 papers with code

Analyzing the effect of neural network architecture on training performance

no code implementations ICML 2020 Karthik Abinav Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein

Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training.

Gemma: Open Models Based on Gemini Research and Technology

no code implementations13 Mar 2024 Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.

ConvNets Match Vision Transformers at Scale

no code implementations25 Oct 2023 Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De

Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale.

Unlocking Accuracy and Fairness in Differentially Private Image Classification

2 code implementations21 Aug 2023 Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry.

Classification Fairness +2

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

no code implementations21 Jul 2023 Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling.

Computational Efficiency Position

Differentially Private Diffusion Models Generate Useful Synthetic Images

no code implementations27 Feb 2023 Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data.

Image Generation Privacy Preserving

Unlocking High-Accuracy Differentially Private Image Classification through Scale

2 code implementations28 Apr 2022 Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points.

Classification Image Classification with Differential Privacy +1

Regularising for invariance to data augmentation improves supervised learning

no code implementations7 Mar 2022 Aleksander Botev, Matthias Bauer, Soham De

Data augmentation is used in machine learning to make the classifier invariant to label-preserving transformations.

Data Augmentation

A study on the plasticity of neural networks

no code implementations31 May 2021 Tudor Berariu, Wojciech Czarnecki, Soham De, Jorg Bornschein, Samuel Smith, Razvan Pascanu, Claudia Clopath

One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task.

Continual Learning Transfer Learning

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

no code implementations27 May 2021 Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets.

Data Augmentation Image Classification

High-Performance Large-Scale Image Recognition Without Normalization

19 code implementations11 Feb 2021 Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples.

Image Classification Vocal Bursts Intensity Prediction

On the Origin of Implicit Regularization in Stochastic Gradient Descent

no code implementations ICLR 2021 Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.

Characterizing signal propagation to close the performance gap in unnormalized ResNets

4 code implementations ICLR 2021 Andrew Brock, Soham De, Samuel L. Smith

Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs.

On the Generalization Benefit of Noise in Stochastic Gradient Descent

no code implementations ICML 2020 Samuel L. Smith, Erich Elsen, Soham De

It has long been argued that minibatch stochastic gradient descent can generalize better than large batch gradient descent in deep neural networks.

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

no code implementations NeurIPS 2020 Soham De, Samuel L. Smith

Batch normalization dramatically increases the largest trainable depth of residual networks, and this benefit has been crucial to the empirical success of deep residual networks on a wide range of benchmarks.

Hyperparameter Tuning and Implicit Regularization in Minibatch SGD

no code implementations25 Sep 2019 Samuel L Smith, Erich Elsen, Soham De

First, we argue that stochastic gradient descent exhibits two regimes with different behaviours; a noise dominated regime which typically arises for small or moderate batch sizes, and a curvature dominated regime which typically arises when the batch size is large.

The Effect of Neural Net Architecture on Gradient Confusion & Training Performance

no code implementations25 Sep 2019 Karthik A. Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein

Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training.

Batch Normalization has Multiple Benefits: An Empirical Study on Residual Networks

no code implementations25 Sep 2019 Soham De, Samuel L Smith

This initialization scheme outperforms batch normalization when the batch size is very small, and is competitive with batch normalization for batch sizes that are not too large.

Adversarial Robustness through Local Linearization

no code implementations NeurIPS 2019 Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli

Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack.

Adversarial Defense Adversarial Robustness

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

no code implementations15 Apr 2019 Karthik A. Sankararaman, Soham De, Zheng Xu, W. Ronny Huang, Tom Goldstein

Our results show that, for popular initialization techniques, increasing the width of neural networks leads to lower gradient confusion, and thus faster model training.

Training Quantized Nets: A Deeper Understanding

no code implementations NeurIPS 2017 Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein

Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding low-precision model for efficient inference on such systems.

Son of Zorn's Lemma: Targeted Style Transfer Using Instance-aware Semantic Segmentation

no code implementations9 Jan 2017 Carlos Castillo, Soham De, Xintong Han, Bharat Singh, Abhay Kumar Yadav, Tom Goldstein

This work considers targeted style transfer, in which the style of a template image is used to alter only part of a target image.

LEMMA Object +2

An Empirical Study of ADMM for Nonconvex Problems

no code implementations10 Dec 2016 Zheng Xu, Soham De, Mario Figueiredo, Christoph Studer, Tom Goldstein

The alternating direction method of multipliers (ADMM) is a common optimization tool for solving constrained and non-differentiable problems.

Image Denoising regression +1

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

no code implementations18 Oct 2016 Soham De, Abhay Yadav, David Jacobs, Tom Goldstein

The high fidelity gradients enable automated learning rate selection and do not require stepsize decay.

Efficient Distributed SGD with Variance Reduction

no code implementations9 Dec 2015 Soham De, Tom Goldstein

Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets.

Stochastic Optimization

Variance Reduction for Distributed Stochastic Gradient Descent

no code implementations5 Dec 2015 Soham De, Gavin Taylor, Tom Goldstein

Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates.

Stochastic Optimization

Layer-Specific Adaptive Learning Rates for Deep Networks

no code implementations15 Oct 2015 Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Goldstein, Gavin Taylor

In this paper, we attempt to overcome the two above problems by proposing an optimization method for training deep neural networks which uses learning rates which are both specific to each layer in the network and adaptive to the curvature of the function, increasing the learning rate at low curvature points.

Image Classification

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

no code implementations27 Feb 2015 Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.