Search Results for author: Satrajit Chatterjee

Found 10 papers, 2 papers with code

On the Generalization Mystery in Deep Learning

no code implementations18 Mar 2022 Satrajit Chatterjee, Piotr Zielinski

The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size?

Enabling Binary Neural Network Training on the Edge

2 code implementations8 Feb 2021 Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides

The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training.

Quantization

Apollo: Transferable Architecture Exploration

no code implementations2 Feb 2021 Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon

We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements).

Making Coherence Out of Nothing At All: Measuring Evolution of Gradient Alignment

no code implementations1 Jan 2021 Satrajit Chatterjee, Piotr Zielinski

Using $m$-coherence, we study the evolution of alignment of per-example gradients in ResNet and EfficientNet models on ImageNet and several variants with label noise, particularly from the perspective of the recently proposed Coherent Gradients (CG) theory that provides a simple, unified explanation for memorization and generalization [Chatterjee, ICLR 20].

Making Coherence Out of Nothing At All: Measuring the Evolution of Gradient Alignment

no code implementations3 Aug 2020 Satrajit Chatterjee, Piotr Zielinski

Using $m$-coherence, we study the evolution of alignment of per-example gradients in ResNet and Inception models on ImageNet and several variants with label noise, particularly from the perspective of the recently proposed Coherent Gradients (CG) theory that provides a simple, unified explanation for memorization and generalization [Chatterjee, ICLR 20].

Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale

no code implementations16 Mar 2020 Piotr Zielinski, Shankar Krishnan, Satrajit Chatterjee

The key insight of CGH is that, since the overall gradient for a single step of SGD is the sum of the per-example gradients, it is strongest in directions that reduce the loss on multiple examples if such directions exist.

Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization

no code implementations ICLR 2020 Satrajit Chatterjee

We propose an approach to answering this question based on a hypothesis about the dynamics of gradient descent that we call Coherent Gradients: Gradients from similar examples are similar and so the overall gradient is stronger in certain directions where these reinforce each other.

Circuit-Based Intrinsic Methods to Detect Overfitting

no code implementations ICML 2020 Satrajit Chatterjee, Alan Mishchenko

By intrinsic methods, we mean methods that rely only on the model and the training data, as opposed to traditional methods (we call them extrinsic methods) that rely on performance on a test set or on bounds from model complexity.

Learning and Memorization

no code implementations ICML 2018 Satrajit Chatterjee

In the machine learning research community, it is generally believed that there is a tension between memorization and generalization.

Cannot find the paper you are looking for? You can Submit a new open access paper.