Search Results for author: Ilya Loshchilov

Found 15 papers, 7 papers with code

SGDR: Stochastic Gradient Descent with Warm Restarts

17 code implementations13 Aug 2016 Ilya Loshchilov, Frank Hutter

Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions.

EEG Stochastic Optimization

Online Batch Selection for Faster Training of Neural Networks

1 code implementation19 Nov 2015 Ilya Loshchilov, Frank Hutter

We investigate online batch selection strategies for two state-of-the-art methods of stochastic gradient-based optimization, AdaDelta and Adam.

Decoupled Weight Decay Regularization

20 code implementations ICLR 2019 Ilya Loshchilov, Frank Hutter

L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam.

Image Classification

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

1 code implementation24 Feb 2018 Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter

Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep RL problems, including Atari games and MuJoCo humanoid locomotion benchmarks.

Atari Games Benchmarking +1

Self-Adaptive Surrogate-Assisted Covariance Matrix Adaptation Evolution Strategy

1 code implementation11 Apr 2012 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag

The resulting algorithm, saACM-ES, adjusts online the lifelength of the current surrogate model (the number of CMA-ES generations before learning a new surrogate) and the surrogate hyper-parameters.

Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization

2 code implementations18 May 2017 Ilya Loshchilov, Tobias Glasmachers, Hans-Georg Beyer

The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a popular method to deal with nonconvex and/or stochastic optimization problems when the gradient information is not available.

Stochastic Optimization

Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)

no code implementations9 May 2016 Ilya Loshchilov, Tobias Glasmachers

We propose a multi-objective optimization algorithm aimed at achieving good anytime performance over a wide range of problems.

Benchmarking

CMA-ES for Hyperparameter Optimization of Deep Neural Networks

no code implementations25 Apr 2016 Ilya Loshchilov, Frank Hutter

Hyperparameters of deep neural networks are often optimized by grid search, random search or Bayesian optimization.

Bayesian Optimization Hyperparameter Optimization

LM-CMA: an Alternative to L-BFGS for Large Scale Black-box Optimization

no code implementations1 Nov 2015 Ilya Loshchilov

Invariance properties of the algorithm do not prevent it from demonstrating a comparable performance to L-BFGS on non-trivial large scale smooth and nonsmooth optimization problems.

Maximum Likelihood-based Online Adaptation of Hyper-parameters in CMA-ES

no code implementations10 Jun 2014 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag, Nikolaus Hansen

The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is widely accepted as a robust derivative-free continuous optimization algorithm for non-linear and non-convex optimization problems.

A Computationally Efficient Limited Memory CMA-ES for Large Scale Optimization

no code implementations21 Apr 2014 Ilya Loshchilov

We propose a computationally efficient limited memory Covariance Matrix Adaptation Evolution Strategy for large scale optimization, which we call the LM-CMA-ES.

KL-based Control of the Learning Schedule for Surrogate Black-Box Optimization

no code implementations12 Aug 2013 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag

This weakness is commonly addressed through surrogate optimization, learning an estimate of the objective function a. k. a.

Fixing Weight Decay Regularization in Adam

no code implementations ICLR 2018 Ilya Loshchilov, Frank Hutter

We note that common implementations of adaptive gradient algorithms, such as Adam, limit the potential benefit of weight decay regularization, because the weights do not decay multiplicatively (as would be expected for standard weight decay) but by an additive constant factor.

Image Classification

Weight Norm Control

no code implementations19 Nov 2023 Ilya Loshchilov

We note that decoupled weight decay regularization is a particular case of weight norm control where the target norm of weights is set to 0.

Cannot find the paper you are looking for? You can Submit a new open access paper.