Search Results for author: Aaron Defazio

Found 31 papers, 16 papers with code

The Road Less Scheduled

1 code implementation24 May 2024 Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems.


Directional Smoothness and Gradient Methods: Convergence and Adaptivity

no code implementations6 Mar 2024 Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower

We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants.

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

1 code implementation11 Oct 2023 Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko

To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task.


Prodigy: An Expeditiously Adaptive Parameter-Free Learner

1 code implementation9 Jun 2023 Konstantin Mishchenko, Aaron Defazio

We consider the problem of estimating the learning rate in adaptive methods, such as AdaGrad and Adam.

MoMo: Momentum Models for Adaptive Learning Rates

1 code implementation12 May 2023 Fabian Schaipp, Ruben Ohana, Michael Eickenberg, Aaron Defazio, Robert M. Gower

MoMo uses momentum estimates of the losses and gradients sampled at each iteration to build a model of the loss function.

Recommendation Systems Stochastic Optimization

Learning-Rate-Free Learning by D-Adaptation

1 code implementation18 Jan 2023 Aaron Defazio, Konstantin Mishchenko

D-Adaptation is an approach to automatically setting the learning rate which asymptotically achieves the optimal rate of convergence for minimizing convex Lipschitz functions, with no back-tracking or line searches, and no additional function value or gradient evaluations per step.

Grad-GradaGrad? A Non-Monotone Adaptive Stochastic Gradient Method

no code implementations14 Jun 2022 Aaron Defazio, Baoyu Zhou, Lin Xiao

The classical AdaGrad method adapts the learning rate by dividing by the square root of a sum of squared gradients.

Stochastic Polyak Stepsize with a Moving Target

no code implementations22 Jun 2021 Robert M. Gower, Aaron Defazio, Michael Rabbat

MOTAPS can be seen as a variant of the Stochastic Polyak (SP) which is also a method that also uses loss values to adjust the stepsize.

Image Classification Translation

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

no code implementations20 Oct 2020 Samy Jelassi, Aaron Defazio

First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks.

Stochastic Optimization

Momentum via Primal Averaging: Theoretical Insights and Learning Rate Schedules for Non-Convex Optimization

1 code implementation1 Oct 2020 Aaron Defazio

Momentum methods are now used pervasively within the machine learning community for training non-convex models such as deep neural networks.

BIG-bench Machine Learning Single Particle Analysis

Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball

no code implementations14 Jun 2020 Othmane Sebbouh, Robert M. Gower, Aaron Defazio

We show that these results still hold when using stochastic line search and stochastic Polyak stepsizes, thereby giving the first proof of convergence of these methods in the non-overparametrized regime.

The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization

no code implementations1 Jun 2020 Aaron Defazio, Robert M. Gower

The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step sizes, Lyapunov function constants and momentum constants.

Stochastic Optimization

End-to-End Variational Networks for Accelerated MRI Reconstruction

3 code implementations14 Apr 2020 Anuroop Sriram, Jure Zbontar, Tullie Murrell, Aaron Defazio, C. Lawrence Zitnick, Nafissa Yakubova, Florian Knoll, Patricia Johnson

The slow acquisition speed of magnetic resonance imaging (MRI) has led to the development of two complementary methods: acquiring multiple views of the anatomy simultaneously (parallel imaging) and acquiring fewer samples than necessary for traditional signal processing methods (compressed sensing).

Anatomy MRI Reconstruction

MRI Banding Removal via Adversarial Training

1 code implementation NeurIPS 2020 Aaron Defazio, Tullie Murrell, Michael P. Recht

MRI images reconstructed from sub-sampled Cartesian data using deep learning techniques often show a characteristic banding (sometimes described as streaking), which is particularly strong in low signal-to-noise regions of the reconstructed image.

Advancing machine learning for MR image reconstruction with an open competition: Overview of the 2019 fastMRI challenge

1 code implementation6 Jan 2020 Florian Knoll, Tullie Murrell, Anuroop Sriram, Nafissa Yakubova, Jure Zbontar, Michael Rabbat, Aaron Defazio, Matthew J. Muckley, Daniel K. Sodickson, C. Lawrence Zitnick, Michael P. Recht

Conclusion: The challenge led to new developments in machine learning for image reconstruction, provided insight into the current state of the art in the field, and highlighted remaining hurdles for clinical adoption.

BIG-bench Machine Learning Image Reconstruction

Scaling Laws for the Principled Design, Initialization, and Preconditioning of ReLU Networks

no code implementations ICLR 2020 Aaron Defazio, Leon Bottou

Abstract In this work, we describe a set of rules for the design and initialization of well-conditioned neural networks, guided by the goal of naturally balancing the diagonal blocks of the Hessian at the start of training.

Offset Sampling Improves Deep Learning based Accelerated MRI Reconstructions by Exploiting Symmetry

2 code implementations2 Dec 2019 Aaron Defazio

Deep learning approaches to accelerated MRI take a matrix of sampled Fourier-space lines as input and produce a spatial image as output.

GrappaNet: Combining Parallel Imaging with Deep Learning for Multi-Coil MRI Reconstruction

1 code implementation CVPR 2020 Anuroop Sriram, Jure Zbontar, Tullie Murrell, C. Lawrence Zitnick, Aaron Defazio, Daniel K. Sodickson

In this paper, we present a novel method to integrate traditional parallel imaging methods into deep neural networks that is able to generate high quality reconstructions even for high acceleration factors.

MRI Reconstruction

Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks

no code implementations10 Jun 2019 Aaron Defazio, Léon Bottou

We propose a system for calculating a "scaling constant" for layers and weights of neural networks.


no code implementations ICLR 2019 Aaron Defazio

We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs.

Controlling Covariate Shift using Balanced Normalization of Weights

no code implementations ICLR 2019 Aaron Defazio, Léon Bottou

We introduce a new normalization technique that exhibits the fast convergence properties of batch normalization using a transformation of layer weights instead of layer outputs.

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

1 code implementation ICLR 2019 Aaron Defazio, Léon Bottou

The applicability of these techniques to the hard non-convex optimization problems encountered during training of modern deep neural networks is an open problem.

On the Curved Geometry of Accelerated Optimization

no code implementations NeurIPS 2019 Aaron Defazio

In this work we propose a differential geometric motivation for Nesterov's accelerated gradient method (AGM) for strongly-convex problems.

A Simple Practical Accelerated Method for Finite Sums

1 code implementation NeurIPS 2016 Aaron Defazio

We describe a novel optimization method for finite sums (such as empirical risk minimization problems) building on the recently introduced SAGA method.

New Optimisation Methods for Machine Learning

no code implementations9 Oct 2015 Aaron Defazio

For problems where the structure is known but the parameters unknown, we introduce an approximate maximum likelihood learning algorithm that is capable of learning a useful subclass of Gaussian graphical models.

BIG-bench Machine Learning Philosophy

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

5 code implementations NeurIPS 2014 Aaron Defazio, Francis Bach, Simon Lacoste-Julien

In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates.

A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

no code implementations NeurIPS 2012 Aaron Defazio, Tibério S. Caetano

We consider the case where the structure of the graph to be reconstructed is known to be scale-free.

Cannot find the paper you are looking for? You can Submit a new open access paper.