Search Results for author: Ryo Karakida

Found 16 papers, 3 papers with code

Self-attention Networks Localize When QK-eigenspectrum Concentrates

no code implementations • 3 Feb 2024 • Han Bao, Ryuichiro Hataya, Ryo Karakida

To this end, we characterize the notion of attention localization by the eigenspectrum of query-key parameter matrices and reveal that a small eigenspectrum variance leads attention to be localized.

Paper
Add Code

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

no code implementations • 19 Dec 2023 • Satoki Ishikawa, Ryo Karakida

Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models.

Paper
Add Code

MLP-Mixer as a Wide and Sparse MLP

no code implementations • 2 Jun 2023 • Tomohiro Hayase, Ryo Karakida

Multi-layer perceptron (MLP) is a fundamental component of deep learning that has been extensively employed for various problems.

Paper
Add Code

Attention in a family of Boltzmann machines emerging from modern Hopfield networks

1 code implementation • 9 Dec 2022 • Toshihiro Ota, Ryo Karakida

Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models.

Denoising

Paper
Code

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

no code implementations • 6 Oct 2022 • Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR.

Paper
Add Code

Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

1 code implementation • 10 Feb 2022 • Kaito Watanabe, Kotaro Sakamoto, Ryo Karakida, Sho Sonoda, Shun-ichi Amari

In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields.

Learning Theory

Paper
Code

Learning Curves for Continual Learning in Neural Networks: Self-Knowledge Transfer and Forgetting

no code implementations • ICLR 2022 • Ryo Karakida, Shotaro Akaho

Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task.

Continual Learning Transfer Learning

Paper
Add Code

Self-paced Data Augmentation for Training Neural Networks

no code implementations • 29 Oct 2020 • Tomoumi Takase, Ryo Karakida, Hideki Asoh

A typical method that applies data augmentation to all training samples disregards sample suitability, which may reduce classifier performance.

Data Augmentation Single Particle Analysis

Paper
Add Code

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

1 code implementation • NeurIPS 2020 • Ryo Karakida, Kazuki Osawa

In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD.

Paper
Code

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

no code implementations • 14 Jun 2020 • Tomohiro Hayase, Ryo Karakida

We investigate the spectral distribution of the conditional FIM, which is the FIM given a single sample, by focusing on fully-connected networks achieving dynamical isometry.

Paper
Add Code

Pathological spectra of the Fisher information metric and its variants in deep neural networks

no code implementations • 14 Oct 2019 • Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions.

Paper
Add Code

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

no code implementations • NeurIPS 2019 • Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

Thus, we can conclude that batch normalization in the last layer significantly contributes to decreasing the sharpness induced by the FIM.

Paper
Add Code

Fisher Information and Natural Gradient Learning of Random Deep Networks

no code implementations • 22 Aug 2018 • Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

The natural gradient method uses the steepest descent direction in a Riemannian manifold, so it is effective in learning, avoiding plateaus.

Paper
Add Code

Statistical Neurodynamics of Deep Networks: Geometry of Signal Spaces

no code implementations • 22 Aug 2018 • Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

The manifold of input signals is embedded in a higher dimensional manifold of the next layer as a curved submanifold, provided the number of neurons is larger than that of inputs.

Paper
Add Code

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

no code implementations • 4 Jun 2018 • Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs).

Paper
Add Code

Concept Formation and Dynamics of Repeated Inference in Deep Generative Models

no code implementations • 12 Dec 2017 • Yoshihiro Nagano, Ryo Karakida, Masato Okada

Our study demonstrated that transient dynamics of inference first approaches a concept, and then moves close to a memory.

Image Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.