Search Results for author: Yamini Bansal

Found 13 papers, 4 papers with code

The unreasonable effectiveness of few-shot learning for machine translation

no code implementations2 Feb 2023 Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Fangxiaoyu Feng, Melvin Johnson, Orhan Firat

We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs.

Few-Shot Learning Machine Translation +2

Limitations of the NTK for Understanding Generalization in Deep Learning

no code implementations20 Jun 2022 Nikhil Vyas, Yamini Bansal, Preetum Nakkiran

The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks.

Data Scaling Laws in NMT: The Effect of Noise and Architecture

no code implementations4 Feb 2022 Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat

In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT).

Language Modelling Machine Translation +1

Distributional Generalization: Structure Beyond Test Error

no code implementations29 Sep 2021 Preetum Nakkiran, Yamini Bansal

Classifiers in machine learning are often reduced to single dimensional quantities, such as test error or loss.

Revisiting Model Stitching to Compare Neural Representations

no code implementations NeurIPS 2021 Yamini Bansal, Preetum Nakkiran, Boaz Barak

We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks.

Self-Supervised Learning

Distributional Generalization: Characterizing Classifiers Beyond Test Error

no code implementations NeurIPS 2021 Preetum Nakkiran, Yamini Bansal

We present a new set of empirical properties of interpolating classifiers, including neural networks, kernel machines and decision trees.

Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modelling

no code implementations25 Oct 2020 Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Josh Tenenbaum, David D. Cox, Dan Gutfreund

Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors.

Disentanglement

For self-supervised learning, Rationality implies generalization, provably

2 code implementations ICLR 2021 Yamini Bansal, Gal Kaplun, Boaz Barak

We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training data, and then fitting a simple (e. g., linear) classifier $g$ to the labels.

Representation Learning Self-Supervised Learning

Distributional Generalization: A New Kind of Generalization

1 code implementation17 Sep 2020 Preetum Nakkiran, Yamini Bansal

We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error.

2D object detection

CZ-GEM: A FRAMEWORK FOR DISENTANGLED REPRESENTATION LEARNING

no code implementations ICLR 2020 Akash Srivastava, Yamini Bansal, Yukun Ding, Bernhard Egger, Prasanna Sattigeri, Josh Tenenbaum, David D. Cox, Dan Gutfreund

In this work, we tackle a slightly more intricate scenario where the observations are generated from a conditional distribution of some known control variate and some latent noise variate.

Disentanglement

Deep Double Descent: Where Bigger Models and More Data Hurt

3 code implementations ICLR 2020 Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.

Minnorm training: an algorithm for training over-parameterized deep neural networks

no code implementations3 Jun 2018 Yamini Bansal, Madhu Advani, David D. Cox, Andrew M. Saxe

To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector'-like examples.

Generalization Bounds

On the Information Bottleneck Theory of Deep Learning

1 code implementation ICLR 2018 Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox

The practical successes of deep neural networks have not been matched by theoretical progress that satisfyingly explains their behavior.

Information Plane

Cannot find the paper you are looking for? You can Submit a new open access paper.