no code implementations • 19 Nov 2023 • Ilya Loshchilov
We note that decoupled weight decay regularization is a particular case of weight norm control where the target norm of weights is set to 0.
1 code implementation • 24 Feb 2018 • Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter
Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep RL problems, including Atari games and MuJoCo humanoid locomotion benchmarks.
no code implementations • ICLR 2018 • Ilya Loshchilov, Frank Hutter
We note that common implementations of adaptive gradient algorithms, such as Adam, limit the potential benefit of weight decay regularization, because the weights do not decay multiplicatively (as would be expected for standard weight decay) but by an additive constant factor.
20 code implementations • ICLR 2019 • Ilya Loshchilov, Frank Hutter
L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam.
7 code implementations • 27 Jul 2017 • Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter
The original ImageNet dataset is a popular large-scale benchmark for training Deep Neural Networks.
Ranked #1 on Image Classification on ImageNet-32
2 code implementations • 18 May 2017 • Ilya Loshchilov, Tobias Glasmachers, Hans-Georg Beyer
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a popular method to deal with nonconvex and/or stochastic optimization problems when the gradient information is not available.
17 code implementations • 13 Aug 2016 • Ilya Loshchilov, Frank Hutter
Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions.
no code implementations • 9 May 2016 • Ilya Loshchilov, Tobias Glasmachers
We propose a multi-objective optimization algorithm aimed at achieving good anytime performance over a wide range of problems.
no code implementations • 25 Apr 2016 • Ilya Loshchilov, Frank Hutter
Hyperparameters of deep neural networks are often optimized by grid search, random search or Bayesian optimization.
1 code implementation • 19 Nov 2015 • Ilya Loshchilov, Frank Hutter
We investigate online batch selection strategies for two state-of-the-art methods of stochastic gradient-based optimization, AdaDelta and Adam.
no code implementations • 1 Nov 2015 • Ilya Loshchilov
Invariance properties of the algorithm do not prevent it from demonstrating a comparable performance to L-BFGS on non-trivial large scale smooth and nonsmooth optimization problems.
no code implementations • 10 Jun 2014 • Ilya Loshchilov, Marc Schoenauer, Michèle Sebag, Nikolaus Hansen
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is widely accepted as a robust derivative-free continuous optimization algorithm for non-linear and non-convex optimization problems.
no code implementations • 21 Apr 2014 • Ilya Loshchilov
We propose a computationally efficient limited memory Covariance Matrix Adaptation Evolution Strategy for large scale optimization, which we call the LM-CMA-ES.
no code implementations • 12 Aug 2013 • Ilya Loshchilov, Marc Schoenauer, Michèle Sebag
This weakness is commonly addressed through surrogate optimization, learning an estimate of the objective function a. k. a.
1 code implementation • 11 Apr 2012 • Ilya Loshchilov, Marc Schoenauer, Michèle Sebag
The resulting algorithm, saACM-ES, adjusts online the lifelength of the current surrogate model (the number of CMA-ES generations before learning a new surrogate) and the surrogate hyper-parameters.