Search Results for author: Ryo Karakida

Found 16 papers, 3 papers with code

Self-attention Networks Localize When QK-eigenspectrum Concentrates

no code implementations3 Feb 2024 Han Bao, Ryuichiro Hataya, Ryo Karakida

To this end, we characterize the notion of attention localization by the eigenspectrum of query-key parameter matrices and reveal that a small eigenspectrum variance leads attention to be localized.

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width

no code implementations19 Dec 2023 Satoki Ishikawa, Ryo Karakida

Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models.

MLP-Mixer as a Wide and Sparse MLP

no code implementations2 Jun 2023 Tomohiro Hayase, Ryo Karakida

Multi-layer perceptron (MLP) is a fundamental component of deep learning that has been extensively employed for various problems.

Attention in a family of Boltzmann machines emerging from modern Hopfield networks

1 code implementation9 Dec 2022 Toshihiro Ota, Ryo Karakida

Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models.

Denoising

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

no code implementations6 Oct 2022 Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR.

Deep Learning in Random Neural Fields: Numerical Experiments via Neural Tangent Kernel

1 code implementation10 Feb 2022 Kaito Watanabe, Kotaro Sakamoto, Ryo Karakida, Sho Sonoda, Shun-ichi Amari

In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields.

Learning Theory

Self-paced Data Augmentation for Training Neural Networks

no code implementations29 Oct 2020 Tomoumi Takase, Ryo Karakida, Hideki Asoh

A typical method that applies data augmentation to all training samples disregards sample suitability, which may reduce classifier performance.

Data Augmentation Single Particle Analysis

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

1 code implementation NeurIPS 2020 Ryo Karakida, Kazuki Osawa

In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD.

The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

no code implementations14 Jun 2020 Tomohiro Hayase, Ryo Karakida

We investigate the spectral distribution of the conditional FIM, which is the FIM given a single sample, by focusing on fully-connected networks achieving dynamical isometry.

Pathological spectra of the Fisher information metric and its variants in deep neural networks

no code implementations14 Oct 2019 Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions.

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

no code implementations NeurIPS 2019 Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

Thus, we can conclude that batch normalization in the last layer significantly contributes to decreasing the sharpness induced by the FIM.

Fisher Information and Natural Gradient Learning of Random Deep Networks

no code implementations22 Aug 2018 Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

The natural gradient method uses the steepest descent direction in a Riemannian manifold, so it is effective in learning, avoiding plateaus.

Statistical Neurodynamics of Deep Networks: Geometry of Signal Spaces

no code implementations22 Aug 2018 Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi

The manifold of input signals is embedded in a higher dimensional manifold of the next layer as a curved submanifold, provided the number of neurons is larger than that of inputs.

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

no code implementations4 Jun 2018 Ryo Karakida, Shotaro Akaho, Shun-ichi Amari

The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs).

Concept Formation and Dynamics of Repeated Inference in Deep Generative Models

no code implementations12 Dec 2017 Yoshihiro Nagano, Ryo Karakida, Masato Okada

Our study demonstrated that transient dynamics of inference first approaches a concept, and then moves close to a memory.

Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.