Search Results for author: Atsushi Nitanda

Found 26 papers, 2 papers with code

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

no code implementations12 Jun 2023 Taiji Suzuki, Denny Wu, Atsushi Nitanda

Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.

Tight and fast generalization error bound of graph embedding in metric space

no code implementations13 May 2023 Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, Kenji Yamanishi

However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance.

Graph Embedding

Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

no code implementations6 Mar 2023 Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki

The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.

Image Generation

Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

no code implementations18 Feb 2023 Atsushi Nitanda, Ryuhei Kikuchi, Shugo Maeda

Stochastic gradient descent is a workhorse for training deep neural networks due to its excellent generalization performance.

Koopman-based generalization bound: New aspect for full-rank weights

no code implementations12 Feb 2023 Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki

Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small.

Convex Analysis of the Mean Field Langevin Dynamics

no code implementations25 Jan 2022 Atsushi Nitanda, Denny Wu, Taiji Suzuki

In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.

Generalization Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic

no code implementations NeurIPS 2021 Atsushi Suzuki, Atsushi Nitanda, Jing Wang, Linchuan Xu, Kenji Yamanishi, Marc Cavazza

Graph embedding, which represents real-world entities in a mathematical space, has enabled numerous applications such as analyzing natural languages, social networks, biochemical networks, and knowledge bases. It has been experimentally shown that graph embedding in hyperbolic space can represent hierarchical tree-like data more effectively than embedding in linear space, owing to hyperbolic space's exponential growth property.

Generalization Bounds Graph Embedding

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

no code implementations ICLR 2022 Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu

We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.

Generalization Error Bound for Hyperbolic Ordinal Embedding

no code implementations21 May 2021 Atsushi Suzuki, Atsushi Nitanda, Jing Wang, Linchuan Xu, Marc Cavazza, Kenji Yamanishi

Hyperbolic ordinal embedding (HOE) represents entities as points in hyperbolic space so that they agree as well as possible with given constraints in the form of entity i is more similar to entity j than to entity k. It has been experimentally shown that HOE can obtain representations of hierarchical data such as a knowledge base and a citation network effectively, owing to hyperbolic space's exponential growth property.

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

BODAME: Bilevel Optimization for Defense Against Model Extraction

no code implementations11 Mar 2021 Yuto Mori, Atsushi Nitanda, Akiko Takeda

Model extraction attacks have become serious issues for service providers using machine learning.

Bilevel Optimization Model extraction

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Online Robust and Adaptive Learning from Data Streams

1 code implementation23 Jul 2020 Shintaro Fukushima, Atsushi Nitanda, Kenji Yamanishi

We address the relation between the two parameters: one is the step size of the stochastic approximation, and the other is the threshold parameter of the norm of the stochastic update.

Attribute

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

no code implementations ICLR 2021 Atsushi Nitanda, Taiji Suzuki

In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK.

When Does Preconditioning Help or Hurt Generalization?

no code implementations ICLR 2021 Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

regression Second-order methods

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

no code implementations13 Nov 2019 Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms.

Binary Classification Classification +1

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

no code implementations NeurIPS 2021 Taiji Suzuki, Atsushi Nitanda

The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

Data Cleansing for Models Trained with SGD

1 code implementation NeurIPS 2019 Satoshi Hara, Atsushi Nitanda, Takanori Maehara

Data cleansing is a typical approach used to improve the accuracy of machine learning models, which, however, requires extensive domain knowledge to identify the influential instances that affect the models.

BIG-bench Machine Learning

Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

no code implementations23 May 2019 Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out.

General Classification Generalization Bounds

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

no code implementations14 Jun 2018 Atsushi Nitanda, Taiji Suzuki

In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.

Binary Classification Classification +1

Functional Gradient Boosting based on Residual Network Perception

no code implementations ICML 2018 Atsushi Nitanda, Taiji Suzuki

Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well.

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

no code implementations7 Jan 2018 Atsushi Nitanda, Taiji Suzuki

In this paper, this phenomenon is explained from the functional gradient method perspective of the gradient layer.

Accelerated Stochastic Gradient Descent for Minimizing Finite Sums

no code implementations9 Jun 2015 Atsushi Nitanda

We propose an optimization method for minimizing the finite sums of smooth convex functions.

Stochastic Proximal Gradient Descent with Acceleration Techniques

no code implementations NeurIPS 2014 Atsushi Nitanda

Accelerated proximal gradient descent (APG) and proximal stochastic variance reduction gradient (Prox-SVRG) are in a trade-off relationship.

Cannot find the paper you are looking for? You can Submit a new open access paper.