no code implementations • 11 Feb 2024 • Itay Safran, Daniel Reichman, Paul Valiant
We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating an $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in $[0, 1]^{d}$, assuming exponentially bounded weights.
no code implementations • 21 Nov 2023 • Trung Dang, Jasper C. H. Lee, Maoyuan Song, Paul Valiant
The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [BCL13] and a lower bound by [DLLO16], characterizing the big-O optimal errors for distributions for which only a $1+\alpha$ moment exists for $\alpha \in (0, 1)$.
no code implementations • 18 Jul 2023 • Itay Safran, Daniel Reichman, Paul Valiant
Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights.
no code implementations • 6 Jun 2022 • Shivam Gupta, Jasper C. H. Lee, Eric Price, Paul Valiant
We consider 1-dimensional location estimation, where we estimate a parameter $\lambda$ from $n$ samples $\lambda + \eta_i$, with each $\eta_i$ drawn i. i. d.
no code implementations • 17 Nov 2020 • Jasper C. H. Lee, Paul Valiant
We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, "our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian distribution of matching variance."
1 code implementation • NeurIPS 2020 • Justin Y. Chen, Gregory Valiant, Paul Valiant
Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$.
no code implementations • 19 Apr 2019 • Jasper C. H. Lee, Paul Valiant
Given a mixture between two populations of coins, "positive" coins that each have -- unknown and potentially different -- bias $\geq\frac{1}{2}+\Delta$ and "negative" coins with bias $\leq\frac{1}{2}-\Delta$, we consider the task of estimating the fraction $\rho$ of positive coins to within additive error $\epsilon$.
no code implementations • 19 Apr 2019 • Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant
We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point.
no code implementations • 21 Apr 2015 • Gregory Valiant, Paul Valiant
One conceptual implication of this result is that for large samples, Bayesian assumptions on the "shape" or bounds on the tail probabilities of a distribution over discrete support are not helpful for the task of learning the distribution.
no code implementations • NeurIPS 2013 • Paul Valiant, Gregory Valiant
Recently, [Valiant and Valiant] showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a SUBLINEAR sized sample.
no code implementations • 19 Aug 2013 • Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, Paul Valiant
We study the question of closeness testing for two discrete distributions.