no code implementations • 2 Feb 2023 • Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Fangxiaoyu Feng, Melvin Johnson, Orhan Firat
We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs.
no code implementations • 20 Jun 2022 • Nikhil Vyas, Yamini Bansal, Preetum Nakkiran
The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks.
no code implementations • 4 Feb 2022 • Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat
In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT).
no code implementations • 29 Sep 2021 • Preetum Nakkiran, Yamini Bansal
Classifiers in machine learning are often reduced to single dimensional quantities, such as test error or loss.
no code implementations • NeurIPS 2021 • Yamini Bansal, Preetum Nakkiran, Boaz Barak
We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks.
no code implementations • NeurIPS 2021 • Preetum Nakkiran, Yamini Bansal
We present a new set of empirical properties of interpolating classifiers, including neural networks, kernel machines and decision trees.
no code implementations • 25 Oct 2020 • Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Josh Tenenbaum, David D. Cox, Dan Gutfreund
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors.
2 code implementations • ICLR 2021 • Yamini Bansal, Gal Kaplun, Boaz Barak
We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training data, and then fitting a simple (e. g., linear) classifier $g$ to the labels.
1 code implementation • 17 Sep 2020 • Preetum Nakkiran, Yamini Bansal
We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error.
no code implementations • ICLR 2020 • Akash Srivastava, Yamini Bansal, Yukun Ding, Bernhard Egger, Prasanna Sattigeri, Josh Tenenbaum, David D. Cox, Dan Gutfreund
In this work, we tackle a slightly more intricate scenario where the observations are generated from a conditional distribution of some known control variate and some latent noise variate.
3 code implementations • ICLR 2020 • Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever
We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.
no code implementations • 3 Jun 2018 • Yamini Bansal, Madhu Advani, David D. Cox, Andrew M. Saxe
To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector'-like examples.
1 code implementation • ICLR 2018 • Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, David Daniel Cox
The practical successes of deep neural networks have not been matched by theoretical progress that satisfyingly explains their behavior.