Search Results for author: Gregory Valiant

Found 43 papers, 8 papers with code

Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances

no code implementations5 Dec 2023 Spencer Compton, Gregory Valiant

Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean?

Testing with Non-identically Distributed Samples

no code implementations19 Nov 2023 Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance.

Avg

One-sided Matrix Completion from Two Observations Per Row

no code implementations6 Jun 2023 Steven Cao, Percy Liang, Gregory Valiant

We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.

Matrix Completion

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

2 code implementations1 Aug 2022 Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant

To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e. g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class?

In-Context Learning

Efficient Convex Optimization Requires Superlinear Memory

no code implementations29 Mar 2022 Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1. 25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$).

On the Statistical Complexity of Sample Amplification

no code implementations12 Jan 2022 Brian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant

In this work, we place the sample amplification problem on a firm statistical foundation by deriving generally applicable amplification procedures, lower bound techniques and connections to existing statistical notions.

Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

no code implementations4 Nov 2021 Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components.

Beyond Laurel/Yanny: An Autoencoder-Enabled Search for Polyperceivable Audio

no code implementations ACL 2021 Kartik Chandra, Chuma Kabaghe, Gregory Valiant

Our results suggest that polyperceivable examples are surprisingly prevalent in natural language, existing for {\textgreater}2{\%} of English words.

Exponential Weights Algorithms for Selective Learning

no code implementations29 Jun 2021 Mingda Qiao, Gregory Valiant

We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time.

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

1 code implementation17 Feb 2021 Kai Sheng Tai, Peter Bailis, Gregory Valiant

Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training.

Classification General Classification +1

On Misspecification in Prediction Problems and Robustness via Improper Learning

no code implementations13 Jan 2021 Annie Marsden, John Duchi, Gregory Valiant

We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model.

Stronger Calibration Lower Bounds via Sidestepping

no code implementations7 Dec 2020 Mingda Qiao, Gregory Valiant

In this paper, we prove an $\Omega(T^{0. 528})$ bound on the calibration error, which is the first super-$\sqrt{T}$ lower bound for this setting to the best of our knowledge.

Sublinear Optimal Policy Value Estimation in Contextual Bandits

no code implementations12 Dec 2019 Weihao Kong, Gregory Valiant, Emma Brunskill

We study the problem of estimating the expected reward of the optimal policy in the stochastic disjoint linear bandit setting.

Multi-Armed Bandits

Worst-Case Analysis for Randomly Collected Data

1 code implementation NeurIPS 2020 Justin Y. Chen, Gregory Valiant, Paul Valiant

Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$.

Making AI Forget You: Data Deletion in Machine Learning

4 code implementations NeurIPS 2019 Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou

Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort.

BIG-bench Machine Learning Clustering

A Surprising Density of Illusionable Natural Speech

no code implementations3 Jun 2019 Melody Y. Guan, Gregory Valiant

Recent work on adversarial examples has demonstrated that most natural inputs can be perturbed to fool even state-of-the-art machine learning systems.

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

no code implementations ICML 2020 Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

In the Gaussian case, we show that an $\left(n, n+\Theta(\frac{n}{\sqrt{d}} )\right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $\Theta(d)$ samples.

valid

Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

no code implementations19 Apr 2019 Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant

We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point.

Memory-Sample Tradeoffs for Linear Regression with Small Error

no code implementations18 Apr 2019 Vatsal Sharan, Aaron Sidford, Gregory Valiant

We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints.

regression

A Theory of Selective Prediction

no code implementations12 Feb 2019 Mingda Qiao, Gregory Valiant

The algorithm is allowed to choose when to make the prediction as well as the length of the prediction window, possibly depending on the observations so far.

Open-Ended Question Answering

Maximum Likelihood Estimation for Learning Populations of Parameters

no code implementations12 Feb 2019 Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham M. Kakade

Precisely, for sufficiently large $N$, the MLE achieves the information theoretic optimal error bound of $\mathcal{O}(\frac{1}{t})$ for $t < c\log{N}$, with regards to the earth mover's distance (between the estimated and true distributions).

Equivariant Transformer Networks

3 code implementations25 Jan 2019 Kai Sheng Tai, Peter Bailis, Gregory Valiant

How can prior knowledge on the transformation invariances of a domain be incorporated into the architecture of a neural network?

General Classification Image Classification

A Spectral View of Adversarially Robust Features

no code implementations NeurIPS 2018 Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant

This connection can be leveraged to provide both robust features, and a lower bound on the robustness of any function that has significant variance across the dataset.

Estimating Learnability in the Sublinear Data Regime

no code implementations NeurIPS 2018 Weihao Kong, Gregory Valiant

In this setting, we show that with $O(\sqrt{d})$ samples, one can accurately estimate the fraction of the variance of the label that can be explained via the best linear function of the data.

Binary Classification

Learning Discrete Distributions from Untrusted Batches

no code implementations22 Nov 2017 Mingda Qiao, Gregory Valiant

Specifically, we consider the setting where there is some underlying distribution, $p$, and each data source provides a batch of $\ge k$ samples, with the guarantee that at least a $(1-\epsilon)$ fraction of the sources draw their samples from a distribution with total variation distance at most $\eta$ from $p$.

Learning Overcomplete HMMs

no code implementations NeurIPS 2017 Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree.

Sketching Linear Classifiers over Data Streams

1 code implementation7 Nov 2017 Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model.

feature selection

Learning Populations of Parameters

no code implementations NeurIPS 2017 Kevin Tian, Weihao Kong, Gregory Valiant

Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_i \in [0, 1]$, and we observe $n$ independent random variables, $X_1,\ldots, X_n$, with $X_i \sim $ Binomial$(t, p_i)$.

Sports Analytics

A Data Prism: Semi-Verified Learning in the Small-Alpha Regime

no code implementations9 Aug 2017 Michela Meister, Gregory Valiant

This setting can be viewed as an instance of the semi-verified learning model introduced in [CSV17], which explores the tradeoff between the number of items evaluated by each worker and the fraction of good evaluators.

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

no code implementations15 Mar 2017 Jacob Steinhardt, Moses Charikar, Gregory Valiant

We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data.

Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use

no code implementations ICML 2017 Vatsal Sharan, Gregory Valiant

The popular Alternating Least Squares (ALS) algorithm for tensor decomposition is efficient and easy to implement, but often converges to poor local optima---particularly when the weights of the factors are non-uniform.

Tensor Decomposition Word Embeddings

Prediction with a Short Memory

no code implementations8 Dec 2016 Vatsal Sharan, Sham Kakade, Percy Liang, Gregory Valiant

For a Hidden Markov Model with $n$ hidden states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time, and we show that the trivial prediction algorithm based on the empirical frequencies of length $O(\log n/\epsilon)$ windows of observations achieves this error, provided the length of the sequence is $d^{\Omega(\log n/\epsilon)}$, where $d$ is the size of the observation alphabet.

Learning from Untrusted Data

no code implementations7 Nov 2016 Moses Charikar, Jacob Steinhardt, Gregory Valiant

For example, given a dataset of $n$ points for which an unknown subset of $\alpha n$ points are drawn from a distribution of interest, and no assumptions are made about the remaining $(1-\alpha)n$ points, is it possible to return a list of $\operatorname{poly}(1/\alpha)$ answers, one of which is correct?

Stochastic Optimization

Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction

no code implementations NeurIPS 2016 Jacob Steinhardt, Gregory Valiant, Moses Charikar

We consider a crowdsourcing model in which $n$ workers are asked to rate the quality of $n$ items previously generated by other workers.

Spectrum Estimation from Samples

1 code implementation30 Jan 2016 Weihao Kong, Gregory Valiant

We consider this fundamental recovery problem in the regime where the number of samples is comparable, or even sublinear in the dimensionality of the distribution in question.

Instance Optimal Learning

no code implementations21 Apr 2015 Gregory Valiant, Paul Valiant

One conceptual implication of this result is that for large samples, Bayesian assumptions on the "shape" or bounds on the tail probabilities of a distribution over discrete support are not helpful for the task of learning the distribution.

Testing Closeness With Unequal Sized Samples

no code implementations NeurIPS 2015 Bhaswar B. Bhattacharya, Gregory Valiant

We consider the problem of closeness testing for two discrete distributions in the practically relevant setting of \emph{unequal} sized samples drawn from each of them.

Estimating the Unseen: Improved Estimators for Entropy and other Properties

no code implementations NeurIPS 2013 Paul Valiant, Gregory Valiant

Recently, [Valiant and Valiant] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a SUBLINEAR sized sample.

Least Squares Revisited: Scalable Approaches for Multi-class Prediction

no code implementations7 Oct 2013 Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant

This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.

Cannot find the paper you are looking for? You can Submit a new open access paper.