no code implementations • 5 Dec 2023 • Spencer Compton, Gregory Valiant

Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean?

no code implementations • 19 Nov 2023 • Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance.

no code implementations • 6 Jun 2023 • Steven Cao, Percy Liang, Gregory Valiant

We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.

1 code implementation • 1 Aug 2022 • Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant

To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?

no code implementations • 29 Mar 2022 • Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1. 25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$).

no code implementations • 12 Jan 2022 • Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant

In this work, we place the sample amplification problem on a firm statistical foundation by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions.

no code implementations • 4 Nov 2021 • Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components.

no code implementations • ACL 2021 • Kartik Chandra, Chuma Kabaghe, Gregory Valiant

Our results suggest that polyperceivable examples are surprisingly prevalent in natural language, existing for {\textgreater}2{\%} of English words.

no code implementations • 29 Jun 2021 • Mingda Qiao, Gregory Valiant

We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time.

1 code implementation • 17 Feb 2021 • Kai Sheng Tai, Peter Bailis, Gregory Valiant

Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training.

no code implementations • 13 Jan 2021 • Annie Marsden, John Duchi, Gregory Valiant

We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model.

no code implementations • 7 Dec 2020 • Mingda Qiao, Gregory Valiant

In this paper, we prove an $\Omega(T^{0. 528})$ bound on the calibration error, which is the first super-$\sqrt{T}$ lower bound for this setting to the best of our knowledge.

2 code implementations • ICML 2020 • Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré

We validate our proposed scheme on image and text datasets.

no code implementations • 12 Dec 2019 • Weihao Kong, Gregory Valiant, Emma Brunskill

We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.

1 code implementation • NeurIPS 2020 • Justin Y. Chen, Gregory Valiant, Paul Valiant

Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$.

4 code implementations • NeurIPS 2019 • Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou

Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort.

no code implementations • 3 Jun 2019 • Melody Y. Guan, Gregory Valiant

Recent work on adversarial examples has demonstrated that most natural inputs can be perturbed to fool even state-of-the-art machine learning systems.

no code implementations • ICML 2020 • Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

In the Gaussian case, we show that an $\left(n, n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples.

no code implementations • 19 Apr 2019 • Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant

We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point.

no code implementations • 18 Apr 2019 • Vatsal Sharan, Aaron Sidford, Gregory Valiant

We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints.

no code implementations • 12 Feb 2019 • Mingda Qiao, Gregory Valiant

The algorithm is allowed to choose when to make the prediction as well as the length of the prediction window, possibly depending on the observations so far.

no code implementations • 12 Feb 2019 • Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade

Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $\mathcal{O}(\frac{1}{t})$ for $t < c\log{N}$, with regards to the earth mover's distance (between the estimated and true distributions).

3 code implementations • 25 Jan 2019 • Kai Sheng Tai, Peter Bailis, Gregory Valiant

How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network?

no code implementations • NeurIPS 2018 • Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant

This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset.

no code implementations • NeurIPS 2018 • Weihao Kong, Gregory Valiant

In this setting, we show that with $O(\sqrt{d})$ samples, one can accurately estimate the fraction of the variance of the label that can be explained via the best linear function of the data.

no code implementations • 22 Nov 2017 • Mingda Qiao, Gregory Valiant

Specifically, we consider the setting where there is some underlying distribution, $p$, and each data source provides a batch of $\ge k$ samples, with the guarantee that at least a $(1-\epsilon)$ fraction of the sources draw their samples from a distribution with total variation distance at most $\eta$ from $p$.

no code implementations • NeurIPS 2017 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.

1 code implementation • 7 Nov 2017 • Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.

no code implementations • NeurIPS 2017 • Kevin Tian, Weihao Kong, Gregory Valiant

Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_i \in [0, 1]$, and we observe $n$ independent random variables, $X_1,\ldots, X_n$, with $X_i \sim $ Binomial$(t, p_i)$.

no code implementations • 9 Aug 2017 • Michela Meister, Gregory Valiant

This setting can be viewed as an instance of the semi-verified learning model introduced in [CSV17], which explores the tradeoff between the number of items evaluated by each worker and the fraction of good evaluators.

no code implementations • 25 Jun 2017 • Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant

What learning algorithms can be run directly on compressively-sensed data?

no code implementations • 15 Mar 2017 • Jacob Steinhardt, Moses Charikar, Gregory Valiant

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data.

no code implementations • ICML 2017 • Vatsal Sharan, Gregory Valiant

The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima---particularly when the weights of the factors are non-uniform.

no code implementations • 8 Dec 2016 • Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.

no code implementations • 7 Nov 2016 • Moses Charikar, Jacob Steinhardt, Gregory Valiant

For example, given a dataset of $n$ points for which an unknown subset of $\alpha n$ points are drawn from a distribution of interest, and no assumptions are made about the remaining $(1-\alpha)n$ points, is it possible to return a list of $\operatorname{poly}(1/\alpha)$ answers, one of which is correct?

no code implementations • NeurIPS 2016 • Jacob Steinhardt, Gregory Valiant, Moses Charikar

We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers.

no code implementations • 21 Feb 2016 • Qingqing Huang, Sham M. Kakade, Weihao Kong, Gregory Valiant

When can accurate reconstruction be accomplished in the sparse data regime?

1 code implementation • 30 Jan 2016 • Weihao Kong, Gregory Valiant

We consider this fundamental recovery problem in the regime where the number of samples is comparable, or even sublinear in the dimensionality of the distribution in question.

no code implementations • 21 Apr 2015 • Gregory Valiant, Paul Valiant

One conceptual implication of this result is that for large samples, Bayesian assumptions on the "shape" or bounds on the tail probabilities of a distribution over discrete support are not helpful for the task of learning the distribution.

no code implementations • NeurIPS 2015 • Bhaswar B. Bhattacharya, Gregory Valiant

We consider the problem of closeness testing for two discrete distributions in the practically relevant setting of \emph{unequal} sized samples drawn from each of them.

no code implementations • NeurIPS 2013 • Paul Valiant, Gregory Valiant

Recently, [Valiant and Valiant] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a SUBLINEAR sized sample.

no code implementations • 7 Oct 2013 • Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant

This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.

no code implementations • 19 Aug 2013 • Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, Paul Valiant

We study the question of closeness testing for two discrete distributions.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.