no code implementations • 5 Nov 2024 • Omar Salemohamed, Laurent Charlin, Shivam Garg, Vatsal Sharan, Gregory Valiant
We also adapt our framework to the problem of estimating frequencies over a data stream, and believe it could also be a powerful discovery tool for new problems.
no code implementations • 30 Oct 2024 • Kristian Georgiev, Roy Rinberg, Sung Min Park, Shivam Garg, Andrew Ilyas, Aleksander Madry, Seth Neel
This perspective naturally suggests a reduction from the unlearning problem to that of data attribution, where the goal is to predict the effect of changing the training set on a model's outputs.
no code implementations • 19 Nov 2023 • Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant
From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance.
1 code implementation • 1 Aug 2022 • Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant
To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?
no code implementations • 12 Jan 2022 • Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant
The ``sample amplification'' problem formalizes the following question: Given $n$ i. i. d.
1 code implementation • 22 Dec 2021 • Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood
Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.
no code implementations • 17 Nov 2021 • Shivam Garg, Santosh S. Vempala
We also show that FA can be far from optimal when $r < \mbox{rank}(Y)$.
1 code implementation • 12 Aug 2021 • Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux
Common policy gradient methods rely on the maximization of a sequence of surrogate functions.
1 code implementation • ICML 2020 • Sina Ghiassian, Andrew Patterson, Shivam Garg, Dhawal Gupta, Adam White, Martha White
It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well.
no code implementations • ICML 2020 • Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant
In the Gaussian case, we show that an $\left(n, n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples.
no code implementations • NeurIPS 2018 • Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant
This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset.