Search Results for author: Blake Woodworth

Found 26 papers, 6 papers with code

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

no code implementations7 Feb 2023 Blake Woodworth, Konstantin Mishchenko, Francis Bach

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

1 code implementation15 Jun 2022 Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay.

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

no code implementations11 Apr 2022 Blake Woodworth, Francis Bach, Alessandro Rudi

We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized.

A Stochastic Newton Algorithm for Distributed Convex Optimization

no code implementations NeurIPS 2021 Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth

We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication.


The Minimax Complexity of Distributed Optimization

no code implementations1 Sep 2021 Blake Woodworth

In this setting, I analyze the theoretical properties of the popular Local Stochastic Gradient Descent (SGD) algorithm in convex setting, both for homogeneous and heterogeneous objectives.

Distributed Optimization

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

no code implementations NeurIPS 2021 Blake Woodworth, Nathan Srebro

We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates.

Stochastic Optimization

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

no code implementations19 Feb 2021 Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.

Inductive Bias

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

no code implementations2 Feb 2021 Blake Woodworth, Brian Bullins, Ohad Shamir, Nathan Srebro

We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates.

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy

no code implementations NeurIPS 2020 Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry

We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks".

General Classification

Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

no code implementations2 Apr 2020 Suriya Gunasekar, Blake Woodworth, Nathan Srebro

We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential.

Kernel and Rich Regimes in Overparametrized Models

1 code implementation20 Feb 2020 Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Edward Moroshko, Pedro Savarese, Itay Golan, Daniel Soudry, Nathan Srebro

We provide a complete and detailed analysis for a family of simple depth-$D$ models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

Is Local SGD Better than Minibatch SGD?

no code implementations ICML 2020 Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

Distributed Optimization

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations5 Dec 2019 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

The gradient complexity of linear regression

no code implementations6 Nov 2019 Mark Braverman, Elad Hazan, Max Simchowitz, Blake Woodworth

We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle.


Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory

no code implementations1 Jul 2019 Blake Woodworth, Nathan Srebro

We note that known methods achieving the optimal oracle complexity for first order convex optimization require quadratic memory, and ask whether this is necessary, and more broadly seek to characterize the minimax number of first order queries required to optimize a convex Lipschitz function subject to a memory constraint.

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

1 code implementation21 Jun 2019 Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth

We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates.


Kernel and Rich Regimes in Overparametrized Models

1 code implementation13 Jun 2019 Blake Woodworth, Suriya Gunasekar, Pedro Savarese, Edward Moroshko, Itay Golan, Jason Lee, Daniel Soudry, Nathan Srebro

A recent line of work studies overparametrized neural networks in the "kernel regime," i. e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution.

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

no code implementations13 Feb 2019 Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.

Stochastic Optimization

Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

1 code implementation29 Jun 2018 Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals.


Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

no code implementations NeurIPS 2018 Blake Woodworth, Jialei Wang, Adam Smith, Brendan Mcmahan, Nathan Srebro

We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph.

Stochastic Optimization

The Everlasting Database: Statistical Validity at a Fair Price

no code implementations NeurIPS 2018 Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro

The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries.

Implicit Regularization in Matrix Factorization

no code implementations NeurIPS 2017 Suriya Gunasekar, Blake Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, Nathan Srebro

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix $X$ with gradient descent on a factorization of $X$.

Learning Non-Discriminatory Predictors

no code implementations20 Feb 2017 Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, Nathan Srebro

We consider learning a predictor which is non-discriminatory with respect to a "protected attribute" according to the notion of "equalized odds" proposed by Hardt et al. [2016].


Tight Complexity Bounds for Optimizing Composite Objectives

no code implementations NeurIPS 2016 Blake Woodworth, Nathan Srebro

We provide tight upper and lower bounds on the complexity of minimizing the average of $m$ convex functions using gradient and prox oracles of the component functions.

Cannot find the paper you are looking for? You can Submit a new open access paper.