no code implementations • 5 Nov 2024 • Omar Salemohamed, Laurent Charlin, Shivam Garg, Vatsal Sharan, Gregory Valiant
We also adapt our framework to the problem of estimating frequencies over a data stream, and believe it could also be a powerful discovery tool for new problems.
3 code implementations • 10 Jun 2024 • Dutch Hansen, Siddartha Devic, Preetum Nakkiran, Vatsal Sharan
Calibration is a well-studied property of predictors which guarantees meaningful uncertainty estimates.
no code implementations • 5 Jun 2024 • Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia
This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain.
no code implementations • 28 May 2024 • Haipeng Luo, Spandan Senapati, Vatsal Sharan
We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneously.
no code implementations • 11 Mar 2024 • Bhavya Vasudeva, Deqing Fu, Tianyi Zhou, Elliott Kau, Youqi Huang, Vatsal Sharan
Transformers achieve state-of-the-art accuracy and robustness across many tasks, but an understanding of the inductive biases that they have and how those biases are different from other neural network architectures remains elusive.
no code implementations • 15 Feb 2024 • Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
We demonstrate a compactness result holding broadly across supervised learning with a general class of loss functions: Any hypothesis class $H$ is learnable with transductive sample complexity $m$ precisely when all of its finite projections are learnable with sample complexity $m$.
no code implementations • 14 Feb 2024 • Siddartha Devic, Aleksandra Korolova, David Kempe, Vatsal Sharan
However, when predictors trained for classification tasks have intrinsic uncertainty, it is not obvious how this uncertainty should be represented in the derived rankings.
1 code implementation • 26 Oct 2023 • Deqing Fu, Tian-Qi Chen, Robin Jia, Vatsal Sharan
Transformers excel at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they do so remains a mystery.
1 code implementation • 9 Oct 2023 • Bhavya Vasudeva, Kameron Shahabi, Vatsal Sharan
Neural networks (NNs) are known to exhibit simplicity bias where they tend to prefer learning 'simple' features over more 'complex' ones, even when the latter may be more informative.
no code implementations • 24 Sep 2023 • Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.
no code implementations • 8 Feb 2023 • Siddartha Devic, David Kempe, Vatsal Sharan, Aleksandra Korolova
The prevalence and importance of algorithmic two-sided marketplaces has drawn attention to the issue of fairness in such settings.
no code implementations • 29 Mar 2022 • Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant
We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1. 25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$).
1 code implementation • 28 Feb 2022 • Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder
Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory.
no code implementations • 12 Jan 2022 • Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant
The ``sample amplification'' problem formalizes the following question: Given $n$ i. i. d.
no code implementations • 4 Nov 2021 • Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan
We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components.
no code implementations • 11 Sep 2021 • Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder
We suggest a rigorous new paradigm for loss minimization in machine learning where the loss function can be ignored at the time of learning and only be taken into account when deciding an action.
no code implementations • ICLR 2021 • Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, Qiuyi Zhang
Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different?
no code implementations • 10 Mar 2021 • Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder
We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution $Q$ closest to $P$, that looks the same as $R$ on every set $C \in \mathcal{C}$, where $\mathcal{C}$ may be a huge collection of sets.
1 code implementation • NeurIPS 2019 • Parikshit Gopalan, Vatsal Sharan, Udi Wieder
We consider the problem of detecting anomalies in a large dataset.
no code implementations • ICML 2020 • Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant
In the Gaussian case, we show that an $\left(n, n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples.
no code implementations • 18 Apr 2019 • Vatsal Sharan, Aaron Sidford, Gregory Valiant
We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints.
no code implementations • NeurIPS 2018 • Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant
This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset.
no code implementations • 31 Oct 2018 • Hongyang R. Zhang, Vatsal Sharan, Moses Charikar, YIngyu Liang
We consider the tensor completion problem of predicting the missing entries of a tensor.
no code implementations • NeurIPS 2018 • Vatsal Sharan, Parikshit Gopalan, Udi Wieder
We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores.
no code implementations • NeurIPS 2017 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant
On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.
1 code implementation • 7 Nov 2017 • Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant
We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.
no code implementations • 25 Jun 2017 • Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant
What learning algorithms can be run directly on compressively-sensed data?
no code implementations • ICML 2017 • Vatsal Sharan, Gregory Valiant
The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima---particularly when the weights of the factors are non-uniform.
no code implementations • 8 Dec 2016 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant
For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.