no code implementations • NeurIPS 2018 • Neha Gupta, Aaron Sidford
This running time improves upon the previous best unaccelerated running time of $\tilde{O}(nd + n L d / \mu)$.
no code implementations • 19 Apr 2019 • Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant
We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point.
no code implementations • 31 Aug 2020 • Arturs Backurs, Avrim Blum, Neha Gupta
In particular, the number of label queries should be independent of the complexity of $H$, and the function $h$ should be well-defined, independent of $x$.
no code implementations • NeurIPS 2020 • Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan
We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4. 5, and CART.
no code implementations • NeurIPS 2020 • Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan
We show that top-down decision tree learning heuristics are amenable to highly efficient learnability estimation: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with polylogarithmically many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree.
no code implementations • 8 Feb 2022 • Ben Adlam, Neha Gupta, Zelda Mariet, Jamie Smith
We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself.
no code implementations • 21 Jun 2022 • Neha Gupta, Jamie Smith, Ben Adlam, Zelda Mariet
Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction. We conclude by an empirical analysis of recent deep learning methods that ensemble over hyperparameters, revealing that these techniques indeed favor bias reduction.
no code implementations • NeurIPS 2023 • Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar
Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn.
no code implementations • 15 Apr 2024 • Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
While the principles underpinning cascading are well-studied for classification tasks - with deferral based on predicted class uncertainty favored theoretically and practically - a similar understanding is lacking for generative LM tasks.