Search Results for author: Surbhi Goel

Found 32 papers, 2 papers with code

Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations

1 code implementation • 5 Mar 2024 • Guanwen Qiu, Da Kuang, Surbhi Goel

Existing research often posits spurious features as "easier" to learn than core features in neural network optimization, but the impact of their relative simplicity remains under-explored.

Paper
Code

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

no code implementations • 16 Feb 2024 • Benjamin L. Edelman, Ezra Edelman, Surbhi Goel, Eran Malach, Nikolaos Tsilivis

We examine how learning is affected by varying the prior distribution over Markov chains, and consider the generalization of our in-context learning of Markov chains (ICL-MC) task to $n$-grams for $n > 2$.

In-Context Learning

Paper
Add Code

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

no code implementations • 7 Sep 2023 • Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.

tabular-classification

Paper
Add Code

Adversarial Resilience in Sequential Prediction via Abstention

no code implementations • NeurIPS 2023 • Surbhi Goel, Steve Hanneke, Shay Moran, Abhishek Shetty

We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples.

Paper
Add Code

Learning Narrow One-Hidden-Layer ReLU Networks

no code implementations • 20 Apr 2023 • Sitan Chen, Zehao Dou, Surbhi Goel, Adam R Klivans, Raghu Meka

We consider the well-studied problem of learning a linear combination of $k$ ReLU activations with respect to a Gaussian distribution on inputs in $d$ dimensions.

Paper
Add Code

Transformers Learn Shortcuts to Automata

no code implementations • 19 Oct 2022 • Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang

Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine.

Paper
Add Code

Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms

no code implementations • 1 Sep 2022 • Surbhi Goel, Sham Kakade, Adam Tauman Kalai, Cyril Zhang

For example, on parity problems, the NN learns as well as Gaussian elimination, an efficient algorithm that can be succinctly described.

Paper
Add Code

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

no code implementations • 18 Jul 2022 • Boaz Barak, Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times.

Paper
Add Code

Understanding Contrastive Learning Requires Incorporating Inductive Biases

no code implementations • 28 Feb 2022 • Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

Anti-Concentrated Confidence Bonuses for Scalable Exploration

no code implementations • ICLR 2022 • Jordan T. Ash, Cyril Zhang, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade

Intrinsic rewards play a central role in handling the exploration-exploitation trade-off when designing sequential decision-making algorithms, in both foundational theory and state-of-the-art deep reinforcement learning.

Decision Making reinforcement-learning +1

Paper
Add Code

Inductive Biases and Variable Creation in Self-Attention Mechanisms

no code implementations • 19 Oct 2021 • Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Cyril Zhang

Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond.

Paper
Add Code

Statistical Estimation from Dependent Data

no code implementations • 20 Jul 2021 • Yuval Dagan, Constantinos Daskalakis, Nishanth Dikkala, Surbhi Goel, Anthimos Vardis Kandiros

We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e. g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies.

regression text-classification +1

Paper
Add Code

Investigating the Role of Negatives in Contrastive Representation Learning

no code implementations • 18 Jun 2021 • Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Dipendra Misra

We focus on disambiguating the role of one of these parameters: the number of negative examples.

Contrastive Learning Data Augmentation +3

Paper
Add Code

Gone Fishing: Neural Active Learning with Fisher Embeddings

1 code implementation • NeurIPS 2021 • Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade

There is an increasing need for effective active learning algorithms that are compatible with deep neural networks.

Active Learning

185

Paper
Code

Acceleration via Fractal Learning Rate Schedules

no code implementations • 1 Mar 2021 • Naman Agarwal, Surbhi Goel, Cyril Zhang

In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune.

Paper
Add Code

Tight Hardness Results for Training Depth-2 ReLU Networks

no code implementations • 27 Nov 2020 • Surbhi Goel, Adam Klivans, Pasin Manurangsi, Daniel Reichman

We are also able to obtain lower bounds on the running time in terms of the desired additive error $\epsilon$.

Paper
Add Code

From Boltzmann Machines to Neural Networks and Back Again

no code implementations • NeurIPS 2020 • Surbhi Goel, Adam Klivans, Frederic Koehler

Graphical models are powerful tools for modeling high-dimensional data, but learning graphical models in the presence of latent variables is well-known to be difficult.

Paper
Add Code

Statistical-Query Lower Bounds via Functional Gradients

no code implementations • NeurIPS 2020 • Surbhi Goel, Aravind Gollakota, Adam Klivans

We give the first statistical-query lower bounds for agnostically learning any non-polynomial activation with respect to Gaussian marginals (e. g., ReLU, sigmoid, sign).

Paper
Add Code

Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent

no code implementations • ICML 2020 • Surbhi Goel, Aravind Gollakota, Zhihan Jin, Sushrut Karmalkar, Adam Klivans

Our lower bounds hold for broad classes of activations including ReLU and sigmoid.

Paper
Add Code

Approximation Schemes for ReLU Regression

no code implementations • 26 May 2020 • Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R. Klivans, Mahdi Soltanolkotabi

We consider the fundamental problem of ReLU regression, where the goal is to output the best fitting ReLU with respect to square loss given access to draws from some unknown distribution.

regression

Paper
Add Code

Efficiently Learning Adversarially Robust Halfspaces with Noise

no code implementations • ICML 2020 • Omar Montasser, Surbhi Goel, Ilias Diakonikolas, Nathan Srebro

We study the problem of learning adversarially robust halfspaces in the distribution-independent setting.

Paper
Add Code

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

no code implementations • NeurIPS 2019 • Surbhi Goel, Sushrut Karmalkar, Adam Klivans

Let $\mathsf{opt} < 1$ be the population loss of the best-fitting ReLU.

Paper
Add Code

Learning Restricted Boltzmann Machines with Arbitrary External Fields

no code implementations • 15 Jun 2019 • Surbhi Goel

We study the problem of learning graphical models with latent variables.

Paper
Add Code

Learning Mixtures of Graphs from Epidemic Cascades

no code implementations • ICML 2020 • Jessica Hoffmann, Soumya Basu, Surbhi Goel, Constantine Caramanis

When the conditions are met, i. e., when the graphs are connected with at least three edges, we give an efficient algorithm for learning the weights of both graphs with optimal sample complexity (up to log factors).

Paper
Add Code

Recovering the Lowest Layer of Deep Networks with High Threshold Activations

no code implementations • ICLR 2019 • Surbhi Goel, Rina Panigrahy

Giving provable guarantees for learning neural networks is a core challenge of machine learning theory.

BIG-bench Machine Learning Learning Theory +1

Paper
Add Code

Quantifying Perceptual Distortion of Adversarial Examples

no code implementations • 21 Feb 2019 • Matt Jordan, Naren Manoj, Surbhi Goel, Alexandros G. Dimakis

To demonstrate the value of quantifying the perceptual distortion of adversarial examples, we present and employ a unifying framework fusing different attack styles.

SSIM

Paper
Add Code

Learning Ising Models with Independent Failures

no code implementations • 13 Feb 2019 • Surbhi Goel, Daniel M. Kane, Adam R. Klivans

We give the first efficient algorithm for learning the structure of an Ising model that tolerates independent failures; that is, each entry of the observed sample is missing with some unknown probability p. Our algorithm matches the essentially optimal runtime and sample complexity bounds of recent work for learning Ising models due to Klivans and Meka (2017).

Paper
Add Code

Improved Learning of One-hidden-layer Convolutional Neural Networks with Overlaps

no code implementations • ICLR 2019 • Simon S. Du, Surbhi Goel

We propose a new algorithm to learn a one-hidden-layer convolutional neural network where both the convolutional weights and the outputs weights are parameters to be learned.

regression

Paper
Add Code

Learning One Convolutional Layer with Overlapping Patches

no code implementations • ICML 2018 • Surbhi Goel, Adam Klivans, Raghu Meka

We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches.

Paper
Add Code

Learning Neural Networks with Two Nonlinear Layers in Polynomial Time

no code implementations • 18 Sep 2017 • Surbhi Goel, Adam Klivans

We give a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e. g., sigmoid or ReLU).

Learning Theory PAC learning +1

Paper
Add Code

Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

no code implementations • NeurIPS 2017 • Surbhi Goel, Adam Klivans

In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e. g. feed-forward networks of ReLUs).

Paper
Add Code

Reliably Learning the ReLU in Polynomial Time

no code implementations • 30 Nov 2016 • Surbhi Goel, Varun Kanade, Adam Klivans, Justin Thaler

These results are in contrast to known efficient algorithms for reliably learning linear threshold functions, where $\epsilon$ must be $\Omega(1)$ and strong assumptions are required on the marginal distribution.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.