Making Progress Based on False Discoveries

19 Apr 2022 · Roi Livni ·

The study of adaptive data analysis examines how many statistical queries can be answered accurately using a fixed dataset while avoiding false discoveries (statistically inaccurate answers). In this paper, we tackle a question that precedes the field of study: Is data only valuable when it provides accurate answers to statistical queries? To answer this question, we use Stochastic Convex Optimization as a case study. In this model, algorithms are considered as analysts who query an estimate of the gradient of a noisy function at each iteration and move towards its minimizer. It is known that $O(1/\epsilon^2)$ examples can be used to minimize the objective function, but none of the existing methods depend on the accuracy of the estimated gradients along the trajectory. Therefore, we ask: How many samples are needed to minimize a noisy convex function if we require $\epsilon$-accurate estimates of $O(1/\epsilon^2)$ gradients? Or, might it be that inaccurate gradient estimates are \emph{necessary} for finding the minimum of a stochastic convex function at an optimal statistical rate? We provide two partial answers to this question. First, we show that a general analyst (queries that may be maliciously chosen) requires $\Omega(1/\epsilon^3)$ samples, ruling out the possibility of a foolproof mechanism. Second, we show that, under certain assumptions on the oracle, $\tilde \Omega(1/\epsilon^{2.5})$ samples are necessary for gradient descent to interact with the oracle. Our results are in contrast to classical bounds that show that $O(1/\epsilon^2)$ samples can optimize the population risk to an accuracy of $O(\epsilon)$, but with spurious gradients.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

NON

Edit Social Preview

Making Progress Based on False Discoveries

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove