Search Results for author: Hideaki Iiduka

Found 12 papers, 3 papers with code

Iteration and Stochastic First-order Oracle Complexities of Stochastic Gradient Descent using Constant and Decaying Learning Rates

no code implementations23 Feb 2024 Kento Imaizumi, Hideaki Iiduka

In particular, the previous numerical results indicated that, for SGD using a constant learning rate, the number of iterations needed for training decreases when the batch size increases, and the SFO complexity needed for training is minimized at a critical batch size and that it increases once the batch size exceeds that size.

Role of Momentum in Smoothing Objective Function in Implicit Graduated Optimization

no code implementations4 Feb 2024 Naoki Sato, Hideaki Iiduka

While stochastic gradient descent (SGD) with momentum has fast convergence and excellent generalizability, a theoretical explanation for this is lacking.

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling

no code implementations15 Nov 2023 Naoki Sato, Hideaki Iiduka

The graduated optimization approach is a heuristic method for finding globally optimal solutions for nonconvex functions and has been theoretically analyzed in several studies.

Image Classification Scheduling

Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search

no code implementations25 Jul 2023 Yuki Tsukada, Hideaki Iiduka

Next, we show that, for SGD with the Armijo-line-search learning rate, the number of steps needed for nonconvex optimization is a monotone decreasing convex function of the batch size; that is, the number of steps needed for nonconvex optimization decreases as the batch size increases.

Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One

no code implementations21 Aug 2022 Hideaki Iiduka

That is, the numerical results indicate that Adam using a small constant learning rate, hyperparameters close to one, and the critical batch size minimizing SFO complexity has faster convergence than Momentum and stochastic gradient descent (SGD).

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

no code implementations27 Jun 2022 Hideaki Iiduka

Since computing the Lipschitz constant is NP-hard, the Lipschitz smoothness condition would be unrealistic.

Conjugate Gradient Method for Generative Adversarial Networks

1 code implementation28 Mar 2022 Hiroki Naganuma, Hideaki Iiduka

Since data distribution is unknown, generative adversarial networks (GANs) formulate this problem as a game between two models, a generator and a discriminator.

Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule

1 code implementation28 Jan 2022 Naoki Sato, Hideaki Iiduka

Previous results have shown that a two time-scale update rule (TTUR) using different learning rates, such as different constant rates or different decaying rates, is useful for training generative adversarial networks (GANs) in theory and in practice.

Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization

no code implementations14 Dec 2021 Hideaki Iiduka

Numerical evaluations have definitively shown that, for deep learning optimizers such as stochastic gradient descent, momentum, and adaptive methods, the number of steps needed to train a deep neural network halves for each doubling of the batch size and that there is a region of diminishing returns beyond the critical batch size.

The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

no code implementations26 Aug 2021 Hideaki Iiduka

In particular, it is shown theoretically that momentum and Adam-type optimizers can exploit larger optimal batches and further reduce the minimum number of steps needed for nonconvex optimization than can the stochastic gradient descent optimizer.

Riemannian Adaptive Optimization Algorithm and Its Application to Natural Language Processing

no code implementations2 Apr 2020 Hiroyuki Sakai, Hideaki Iiduka

This paper proposes a Riemannian adaptive optimization algorithm to optimize the parameters of deep neural networks.

Stochastic Optimization Optimization and Control 65k05, 90C25, 57R25 G.1.6

Conjugate-gradient-based Adam for stochastic optimization and its application to deep learning

1 code implementation29 Feb 2020 Yu Kobayashi, Hideaki Iiduka

This paper proposes a conjugate-gradient-based Adam algorithm blending Adam with nonlinear conjugate gradient methods and shows its convergence analysis.

General Classification Image Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.