no code implementations • 18 Apr 2025 • Helen Jin, Anton Xue, Weiqiu You, Surbhi Goel, Eric Wong
Stability guarantees are an emerging tool for evaluating feature attributions, but existing certification methods rely on smoothed classifiers and often yield conservative guarantees.
no code implementations • 8 Apr 2025 • Natalie Collina, Ira Globus-Harris, Surbhi Goel, Varun Gupta, Aaron Roth, Mirah Shi
Our results require no knowledge of (or even the existence of) a prior distribution and are computationally efficient.
no code implementations • 11 Mar 2025 • Nirmit Joshi, Gal Vardi, Adam Block, Surbhi Goel, Zhiyuan Li, Theodor Misiakiewicz, Nathan Srebro
We present a simple base class that allows for universal representability and computationally tractable chain-of-thought learning.
no code implementations • 15 Jan 2025 • Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
In this work, we show that this task is tractable and present the first efficient algorithm to test various noise assumptions on the training data.
no code implementations • 29 Nov 2024 • Natalie Collina, Surbhi Goel, Varun Gupta, Aaron Roth
We then address the case where parties hold beliefs over distributions with d outcomes, exploring two feedback mechanisms.
no code implementations • 7 Oct 2024 • Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Andrej Risteski, Surbhi Goel
Our theoretical and empirical findings on sparse parity, complemented by empirical observations on more complex tasks, highlight the benefit of progressive distillation via implicit curriculum across setups.
no code implementations • 21 Jun 2024 • Anton Xue, Avishree Khare, Rajeev Alur, Surbhi Goel, Eric Wong
We study how to subvert large language models (LLMs) from following prompt-specified rules.
no code implementations • 4 Jun 2024 • Mahdi Sabbaghi, George Pappas, Hamed Hassani, Surbhi Goel
Empirically, our method allows a Transformer trained on numbers with at most 5-digits for addition and multiplication to generalize up to 50-digit numbers, without using additional data for longer sequences.
no code implementations • 4 Jun 2024 • Surbhi Goel, Abhishek Shetty, Konstantinos Stavropoulos, Arsen Vasilyan
We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution.
1 code implementation • 12 May 2024 • Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani
Our key insight is that we can exploit the piecewise linear structure of ReLU activations and convert the problem into a linear bandit in a transformed feature space, once we learn the parameters of ReLU relatively accurately during the exploration stage.
1 code implementation • 5 Mar 2024 • Guanwen Qiu, Da Kuang, Surbhi Goel
Existing research often posits spurious features as easier to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored.
no code implementations • 16 Feb 2024 • Benjamin L. Edelman, Ezra Edelman, Surbhi Goel, Eran Malach, Nikolaos Tsilivis
We examine how learning is affected by varying the prior distribution over Markov chains, and consider the generalization of our in-context learning of Markov chains (ICL-MC) task to $n$-grams for $n > 2$.
no code implementations • 7 Sep 2023 • Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang
Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.
no code implementations • NeurIPS 2023 • Surbhi Goel, Steve Hanneke, Shay Moran, Abhishek Shetty
We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples.
no code implementations • 20 Apr 2023 • Sitan Chen, Zehao Dou, Surbhi Goel, Adam R Klivans, Raghu Meka
We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions.
no code implementations • 19 Oct 2022 • Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang
Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine.
no code implementations • 1 Sep 2022 • Surbhi Goel, Sham Kakade, Adam Tauman Kalai, Cyril Zhang
For example, on parity problems, the NN learns as well as Gaussian elimination, an efficient algorithm that can be succinctly described.
no code implementations • 18 Jul 2022 • Boaz Barak, Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang
There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times.
no code implementations • 28 Feb 2022 • Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy
Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.
no code implementations • ICLR 2022 • Jordan T. Ash, Cyril Zhang, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade
Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning.
no code implementations • 19 Oct 2021 • Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Cyril Zhang
Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond.
no code implementations • 20 Jul 2021 • Yuval Dagan, Constantinos Daskalakis, Nishanth Dikkala, Surbhi Goel, Anthimos Vardis Kandiros
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e. g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies.
no code implementations • 18 Jun 2021 • Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Dipendra Misra
We focus on disambiguating the role of one of these parameters: the number of negative examples.
1 code implementation • NeurIPS 2021 • Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade
There is an increasing need for effective active learning algorithms that are compatible with deep neural networks.
no code implementations • 1 Mar 2021 • Naman Agarwal, Surbhi Goel, Cyril Zhang
In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune.
no code implementations • 27 Nov 2020 • Surbhi Goel, Adam Klivans, Pasin Manurangsi, Daniel Reichman
We are also able to obtain lower bounds on the running time in terms of the desired additive error $\epsilon$.
no code implementations • NeurIPS 2020 • Surbhi Goel, Adam Klivans, Frederic Koehler
Graphical models are powerful tools for modeling high-dimensional data, but learning graphical models in the presence of latent variables is well-known to be difficult.
no code implementations • NeurIPS 2020 • Surbhi Goel, Aravind Gollakota, Adam Klivans
We give the first statistical-query lower bounds for agnostically learning any non-polynomial activation with respect to Gaussian marginals (e. g., ReLU, sigmoid, sign).
no code implementations • ICML 2020 • Surbhi Goel, Aravind Gollakota, Zhihan Jin, Sushrut Karmalkar, Adam Klivans
Our lower bounds hold for broad classes of activations including ReLU and sigmoid.
no code implementations • 26 May 2020 • Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R. Klivans, Mahdi Soltanolkotabi
We consider the fundamental problem of ReLU regression, where the goal is to output the best fitting ReLU with respect to square loss given access to draws from some unknown distribution.
no code implementations • ICML 2020 • Omar Montasser, Surbhi Goel, Ilias Diakonikolas, Nathan Srebro
We study the problem of learning adversarially robust halfspaces in the distribution-independent setting.
no code implementations • NeurIPS 2019 • Surbhi Goel, Sushrut Karmalkar, Adam Klivans
Let $\mathsf{opt} < 1$ be the population loss of the best-fitting ReLU.
no code implementations • 15 Jun 2019 • Surbhi Goel
We study the problem of learning graphical models with latent variables.
no code implementations • ICML 2020 • Jessica Hoffmann, Soumya Basu, Surbhi Goel, Constantine Caramanis
When the conditions are met, i. e., when the graphs are connected with at least three edges, we give an efficient algorithm for learning the weights of both graphs with optimal sample complexity (up to log factors).
no code implementations • ICLR 2019 • Surbhi Goel, Rina Panigrahy
Giving provable guarantees for learning neural networks is a core challenge of machine learning theory.
no code implementations • 21 Feb 2019 • Matt Jordan, Naren Manoj, Surbhi Goel, Alexandros G. Dimakis
To demonstrate the value of quantifying the perceptual distortion of adversarial examples, we present and employ a unifying framework fusing different attack styles.
no code implementations • 13 Feb 2019 • Surbhi Goel, Daniel M. Kane, Adam R. Klivans
We give the first efficient algorithm for learning the structure of an Ising model that tolerates independent failures; that is, each entry of the observed sample is missing with some unknown probability p. Our algorithm matches the essentially optimal runtime and sample complexity bounds of recent work for learning Ising models due to Klivans and Meka (2017).
no code implementations • ICLR 2019 • Simon S. Du, Surbhi Goel
We propose a new algorithm to learn a one-hidden-layer convolutional neural network where both the convolutional weights and the outputs weights are parameters to be learned.
no code implementations • ICML 2018 • Surbhi Goel, Adam Klivans, Raghu Meka
We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches.
no code implementations • 18 Sep 2017 • Surbhi Goel, Adam Klivans
We give a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e. g., sigmoid or ReLU).
no code implementations • NeurIPS 2017 • Surbhi Goel, Adam Klivans
In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e. g. feed-forward networks of ReLUs).
no code implementations • 30 Nov 2016 • Surbhi Goel, Varun Kanade, Adam Klivans, Justin Thaler
These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $\epsilon$ must be $\Omega(1)$ and strong assumptions are required on the marginal distribution.