Search Results for author: Aaron Sidford

Found 51 papers, 8 papers with code

A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions

no code implementations • 17 Nov 2023 • Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

For $n>d$ and $\epsilon=1/\sqrt{n}$ this improves over all existing first-order methods.

Paper
Add Code

Matrix Completion in Almost-Verification Time

no code implementations • 7 Aug 2023 • Jonathan A. Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian

In the well-studied setting where $\mathbf{M}$ has incoherent row and column spans, our algorithms complete $\mathbf{M}$ to high precision from $mr^{2+o(1)}$ observations in $mr^{3 + o(1)}$ time (omitting logarithmic factors in problem parameters), improving upon the prior state-of-the-art [JN15] which used $\approx mr^5$ samples and $\approx mr^7$ time.

Low-Rank Matrix Completion

Paper
Add Code

Moments, Random Walks, and Limits for Spectrum Approximation

no code implementations • 2 Jul 2023 • Yujia Jin, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh

We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments.

Paper
Add Code

ReSQueing Parallel and Private Stochastic Convex Optimization

no code implementations • 1 Jan 2023 • Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator.

Paper
Add Code

On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood

no code implementations • 13 Oct 2022 • Moses Charikar, Zhihao Jiang, Kirankumar Shiragur, Aaron Sidford

We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples.

Paper
Add Code

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

1 code implementation • 17 Jun 2022 • Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i. e., regularized minimization).

Paper
Code

Efficient Convex Optimization Requires Superlinear Memory

no code implementations • 29 Mar 2022 • Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant

We show that any memory-constrained, first-order algorithm which minimizes $d$-dimensional, $1$-Lipschitz convex functions over the unit ball to $1/\mathrm{poly}(d)$ accuracy using at most $d^{1. 25 - \delta}$ bits of memory must make at least $\tilde{\Omega}(d^{1 + (4/3)\delta})$ first-order queries (for any constant $\delta \in [0, 1/4]$).

Paper
Add Code

Semi-Random Sparse Recovery in Nearly-Linear Time

no code implementations • 8 Mar 2022 • Jonathan A. Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian

We design a new iterative method tailored to the geometry of sparse recovery which is provably robust to our semi-random model.

Paper
Add Code

Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods

no code implementations • 9 Feb 2022 • Yujia Jin, Aaron Sidford, Kevin Tian

We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.

Paper
Add Code

Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales

no code implementations • 4 Nov 2021 • Jonathan Kelner, Annie Marsden, Vatsal Sharan, Aaron Sidford, Gregory Valiant, Honglin Yuan

We consider the problem of minimizing a function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions and provide a method which solves this problem with a number of gradient evaluations that scales (up to logarithmic factors) as the product of the square-root of the condition numbers of the components.

Paper
Add Code

Stochastic Bias-Reduced Gradient Methods

no code implementations • NeurIPS 2021 • Hilal Asi, Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function.

Stochastic Optimization

Paper
Add Code

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

no code implementations • 13 Jun 2021 • Yujia Jin, Aaron Sidford

We prove new upper and lower bounds for sample complexity of finding an $\epsilon$-optimal policy of an infinite-horizon average-reward Markov decision process (MDP) given access to a generative model.

Paper
Add Code

Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

no code implementations • 4 May 2021 • Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$.

Paper
Add Code

Minimum Cost Flows, MDPs, and $\ell_1$-Regression in Nearly Linear Time for Dense Instances

no code implementations • 14 Jan 2021 • Jan van den Brand, Yin Tat Lee, Yang P. Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, Di Wang

In the special case of the minimum cost flow problem on $n$-vertex $m$-edge graphs with integer polynomially-bounded costs and capacities we obtain a randomized method which solves the problem in $\tilde{O}(m+n^{1. 5})$ time.

Data Structures and Algorithms Optimization and Control

Paper
Add Code

Relative Lipschitzness in Extragradient Methods and a Direct Recipe for Acceleration

no code implementations • 12 Nov 2020 • Michael B. Cohen, Aaron Sidford, Kevin Tian

We show that standard extragradient methods (i. e. mirror prox and dual extrapolation) recover optimal accelerated rates for first-order minimization of smooth convex functions.

regression

Paper
Add Code

Instance Based Approximations to Profile Maximum Likelihood

no code implementations • NeurIPS 2020 • Nima Anari, Moses Charikar, Kirankumar Shiragur, Aaron Sidford

In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation.

Paper
Add Code

Large-Scale Methods for Distributionally Robust Optimization

1 code implementation • NeurIPS 2020 • Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets.

Paper
Code

Coordinate Methods for Matrix Games

no code implementations • 17 Sep 2020 • Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/(m+n)}$.

regression

Paper
Add Code

Efficiently Solving MDPs with Stochastic Mirror Descent

no code implementations • ICML 2020 • Yujia Jin, Aaron Sidford

We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model.

Paper
Add Code

Fast and Near-Optimal Diagonal Preconditioning

no code implementations • 4 Aug 2020 • Arun Jambulapati, Jerry Li, Christopher Musco, Aaron Sidford, Kevin Tian

In this paper, we revisit the decades-old problem of how to best improve $\mathbf{A}$'s condition number by left or right diagonal rescaling.

Paper
Add Code

The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

no code implementations • 6 Apr 2020 • Nima Anari, Moses Charikar, Kirankumar Shiragur, Aaron Sidford

For each problem we provide polynomial time algorithms that given $n$ i. i. d.\ samples from a discrete distribution, achieve an approximation factor of $\exp\left(-O(\sqrt{n} \log n) \right)$, improving upon the previous best-known bound achievable in polynomial time of $\exp(-O(n^{2/3} \log n))$ (Charikar, Shiragur and Sidford, 2019).

Paper
Add Code

A General Framework for Symmetric Property Estimation

1 code implementation • NeurIPS 2019 • Moses Charikar, Kirankumar Shiragur, Aaron Sidford

In this paper we provide a general framework for estimating symmetric properties of distributions from i. i. d.

Paper
Code

A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport

no code implementations • NeurIPS 2019 • Arun Jambulapati, Aaron Sidford, Kevin Tian

Optimal transportation, or computing the Wasserstein or ``earth mover's'' distance between two $n$-dimensional distributions, is a fundamental primitive which arises in many learning and statistical settings.

Paper
Add Code

Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG

no code implementations • NeurIPS 2019 • Yujia Jin, Aaron Sidford

Given a data matrix $\mathbf{A} \in \mathbb{R}^{n \times d}$, principal component projection (PCP) and principal component regression (PCR), i. e. projection and regression restricted to the top-eigenspace of $\mathbf{A}$, are fundamental problems in machine learning, optimization, and numerical analysis.

regression

Paper
Add Code

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

no code implementations • 29 Aug 2019 • Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye

In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.

Q-Learning

Paper
Add Code

Variance Reduction for Matrix Games

no code implementations • NeurIPS 2019 • Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries.

Paper
Add Code

Near-Optimal Methods for Minimizing Star-Convex Functions and Beyond

1 code implementation • 27 Jun 2019 • Oliver Hinder, Aaron Sidford, Nimit S. Sohoni

This function class, which we call the class of smooth quasar-convex functions, is parameterized by a constant $\gamma \in (0, 1]$, where $\gamma = 1$ encompasses the classes of smooth convex and star-convex functions, and smaller values of $\gamma$ indicate that the function can be "more nonconvex."

Paper
Code

Complexity of Highly Parallel Non-Smooth Convex Optimization

no code implementations • NeurIPS 2019 • Sébastien Bubeck, Qijia Jiang, Yin Tat Lee, Yuanzhi Li, Aaron Sidford

Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel.

Paper
Add Code

A Direct $\tilde{O}(1/ε)$ Iteration Parallel Algorithm for Optimal Transport

no code implementations • 3 Jun 2019 • Arun Jambulapati, Aaron Sidford, Kevin Tian

Optimal transportation, or computing the Wasserstein or ``earth mover's'' distance between two distributions, is a fundamental primitive which arises in many learning and statistical settings.

Paper
Add Code

Efficient Profile Maximum Likelihood for Universal Symmetric Property Estimation

no code implementations • 21 May 2019 • Moses Charikar, Kirankumar Shiragur, Aaron Sidford

Generalizing work of Acharya et al. 2016 on the utility of approximate PML we show that our algorithm provides a nearly linear time universal plug-in estimator for all symmetric functions up to accuracy $\epsilon = \Omega(n^{-0. 166})$.

Paper
Add Code

Memory-Sample Tradeoffs for Linear Regression with Small Error

no code implementations • 18 Apr 2019 • Vatsal Sharan, Aaron Sidford, Gregory Valiant

We consider the problem of performing linear regression over a stream of $d$-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints.

regression

Paper
Add Code

A Rank-1 Sketch for Matrix Multiplicative Weights

no code implementations • 7 Mar 2019 • Yair Carmon, John C. Duchi, Aaron Sidford, Kevin Tian

We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys (in expectation) the same regret bounds as MMW, up to a small constant factor.

Paper
Add Code

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

no code implementations • NeurIPS 2018 • Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Paper
Add Code

Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression

no code implementations • NeurIPS 2018 • Neha Gupta, Aaron Sidford

This running time improves upon the previous best unaccelerated running time of $\tilde{O}(nd + n L d / \mu)$.

regression

Paper
Add Code

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

1 code implementation • 5 Jun 2018 • Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye

Optimization and Control

Paper
Code

Leverage Score Sampling for Faster Accelerated Regression and ERM

no code implementations • 22 Nov 2017 • Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford

Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$.

regression

Paper
Add Code

Variance Reduced Value Iteration and Faster Algorithms for Solving Markov Decision Processes

1 code implementation • 27 Oct 2017 • Aaron Sidford, Mengdi Wang, Xian Wu, Yinyu Ye

Given a discounted Markov Decision Process (DMDP) with $|S|$ states, $|A|$ actions, discount factor $\gamma\in(0, 1)$, and rewards in the range $[-M, M]$, we show how to compute an $\epsilon$-optimal policy, with probability $1 - \delta$ in time \[ \tilde{O}\left( \left(|S|^2 |A| + \frac{|S| |A|}{(1 - \gamma)^3} \right) \log\left( \frac{M}{\epsilon} \right) \log\left( \frac{1}{\delta} \right) \right) ~ .

Paper
Code

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

no code implementations • 25 Oct 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares.

Paper
Add Code

Stability of the Lanczos Method for Matrix Function Approximation

1 code implementation • 25 Aug 2017 • Cameron Musco, Christopher Musco, Aaron Sidford

In exact arithmetic, the method's error after $k$ iterations is bounded by the error of the best degree-$k$ polynomial uniformly approximating $f(x)$ on the range $[\lambda_{min}(A), \lambda_{max}(A)]$.

Data Structures and Algorithms Numerical Analysis

Paper
Code

“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

no code implementations • ICML 2017 • Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions.

Paper
Add Code

Accelerating Stochastic Gradient Descent For Least Squares Regression

no code implementations • 26 Apr 2017 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

There is widespread sentiment that it is not possible to effectively utilize fast gradient methods (e. g. Nesterov's acceleration, conjugate gradient, heavy ball) for the purposes of stochastic optimization due to their instability and error accumulation, a notion made precise in d'Aspremont 2008 and Devolder, Glineur, and Nesterov 2014.

regression Stochastic Optimization

Paper
Add Code

Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and Hardness

no code implementations • 13 Apr 2017 • Cameron Musco, Praneeth Netrapalli, Aaron Sidford, Shashanka Ubaru, David P. Woodruff

We thus effectively compute a histogram of the spectrum, which can stand in for the true singular values in many applications.

Paper
Add Code

Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification

1 code implementation • 12 Oct 2016 • Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few iterates of SGD to decrease the variance in SGD's final iterate.

regression

210

Paper
Code

Faster Eigenvector Computation via Shift-and-Invert Preconditioning

no code implementations • 26 May 2016 • Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $\Sigma$ -- i. e. computing a unit vector $x$ such that $x^T \Sigma x \ge (1-\epsilon)\lambda_1(\Sigma)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $\Sigma = A^TA$, we show how to compute an $\epsilon$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/\epsilon )$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/\epsilon )$.

Stochastic Optimization

Paper
Add Code

Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

no code implementations • 13 Apr 2016 • Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor.

Paper
Add Code

Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

no code implementations • 22 Feb 2016 • Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

This work provides improved guarantees for streaming principle component analysis (PCA).

Paper
Add Code

Principal Component Projection Without Principal Component Analysis

no code implementations • 22 Feb 2016 • Roy Frostig, Cameron Musco, Christopher Musco, Aaron Sidford

To achieve our results, we first observe that ridge regression can be used to obtain a "smooth projection" onto the top principal components.

regression

Paper
Add Code

Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation

no code implementations • 29 Oct 2015 • Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

Combining our algorithm with previous work to initialize $x_0$, we obtain a number of improved sample complexity and runtime results.

Stochastic Optimization

Paper
Add Code

Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

no code implementations • 24 Jun 2015 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

We develop a family of accelerated stochastic algorithms that minimize sums of convex functions.

Paper
Add Code

Competing with the Empirical Risk Minimizer in a Single Pass

no code implementations • 20 Dec 2014 • Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.

Paper
Add Code

Uniform Sampling for Matrix Approximation

no code implementations • 21 Aug 2014 • Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford

In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows.

regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.