Search Results for author: Hideaki Iiduka

Found 12 papers, 3 papers with code

Iteration and Stochastic First-order Oracle Complexities of Stochastic Gradient Descent using Constant and Decaying Learning Rates

no code implementations • 23 Feb 2024 • Kento Imaizumi, Hideaki Iiduka

In particular, the previous numerical results indicated that, for SGD using a constant learning rate, the number of iterations needed for training decreases when the batch size increases, and the SFO complexity needed for training is minimized at a critical batch size and that it increases once the batch size exceeds that size.

Paper
Add Code

Role of Momentum in Smoothing Objective Function in Implicit Graduated Optimization

no code implementations • 4 Feb 2024 • Naoki Sato, Hideaki Iiduka

While stochastic gradient descent (SGD) with momentum has fast convergence and excellent generalizability, a theoretical explanation for this is lacking.

Paper
Add Code

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling

no code implementations • 15 Nov 2023 • Naoki Sato, Hideaki Iiduka

The graduated optimization approach is a heuristic method for finding globally optimal solutions for nonconvex functions and has been theoretically analyzed in several studies.

Image Classification Scheduling

Paper
Add Code

Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search

no code implementations • 25 Jul 2023 • Yuki Tsukada, Hideaki Iiduka

Next, we show that, for SGD with the Armijo-line-search learning rate, the number of steps needed for nonconvex optimization is a monotone decreasing convex function of the batch size; that is, the number of steps needed for nonconvex optimization decreases as the batch size increases.

Paper
Add Code

Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One

no code implementations • 21 Aug 2022 • Hideaki Iiduka

That is, the numerical results indicate that Adam using a small constant learning rate, hyperparameters close to one, and the critical batch size minimizing SFO complexity has faster convergence than Momentum and stochastic gradient descent (SGD).

Paper
Add Code

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

no code implementations • 27 Jun 2022 • Hideaki Iiduka

Since computing the Lipschitz constant is NP-hard, the Lipschitz smoothness condition would be unrealistic.

Paper
Add Code

Conjugate Gradient Method for Generative Adversarial Networks

1 code implementation • 28 Mar 2022 • Hiroki Naganuma, Hideaki Iiduka

Since data distribution is unknown, generative adversarial networks (GANs) formulate this problem as a game between two models, a generator and a discriminator.

Paper
Code

Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule

1 code implementation • 28 Jan 2022 • Naoki Sato, Hideaki Iiduka

Previous results have shown that a two time-scale update rule (TTUR) using different learning rates, such as different constant rates or different decaying rates, is useful for training generative adversarial networks (GANs) in theory and in practice.

Paper
Code

Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization

no code implementations • 14 Dec 2021 • Hideaki Iiduka

Numerical evaluations have definitively shown that, for deep learning optimizers such as stochastic gradient descent, momentum, and adaptive methods, the number of steps needed to train a deep neural network halves for each doubling of the batch size and that there is a region of diminishing returns beyond the critical batch size.

Paper
Add Code

The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

no code implementations • 26 Aug 2021 • Hideaki Iiduka

In particular, it is shown theoretically that momentum and Adam-type optimizers can exploit larger optimal batches and further reduce the minimum number of steps needed for nonconvex optimization than can the stochastic gradient descent optimizer.

Paper
Add Code

Riemannian Adaptive Optimization Algorithm and Its Application to Natural Language Processing

no code implementations • 2 Apr 2020 • Hiroyuki Sakai, Hideaki Iiduka

This paper proposes a Riemannian adaptive optimization algorithm to optimize the parameters of deep neural networks.

Stochastic Optimization Optimization and Control 65k05, 90C25, 57R25 G.1.6

Paper
Add Code

Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning

1 code implementation • 29 Feb 2020 • Yu Kobayashi, Hideaki Iiduka

This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis.

General Classification Image Classification +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.