no code implementations • 3 Feb 2024 • Han Bao, Ryuichiro Hataya, Ryo Karakida
To this end, we characterize the notion of attention localization by the eigenspectrum of query-key parameter matrices and reveal that a small eigenspectrum variance leads attention to be localized.
no code implementations • 19 Dec 2023 • Satoki Ishikawa, Ryo Karakida
Second-order optimization has been developed to accelerate the training of deep neural networks and it is being applied to increasingly larger-scale models.
no code implementations • 2 Jun 2023 • Tomohiro Hayase, Ryo Karakida
Multi-layer perceptron (MLP) is a fundamental component of deep learning that has been extensively employed for various problems.
1 code implementation • 9 Dec 2022 • Toshihiro Ota, Ryo Karakida
Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models.
no code implementations • 6 Oct 2022 • Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa
In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR.
1 code implementation • 10 Feb 2022 • Kaito Watanabe, Kotaro Sakamoto, Ryo Karakida, Sho Sonoda, Shun-ichi Amari
In this paper, we investigate such neural fields in a multilayer architecture to investigate the supervised learning of the fields.
no code implementations • ICLR 2022 • Ryo Karakida, Shotaro Akaho
Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task.
no code implementations • 29 Oct 2020 • Tomoumi Takase, Ryo Karakida, Hideki Asoh
A typical method that applies data augmentation to all training samples disregards sample suitability, which may reduce classifier performance.
1 code implementation • NeurIPS 2020 • Ryo Karakida, Kazuki Osawa
In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD.
no code implementations • 14 Jun 2020 • Tomohiro Hayase, Ryo Karakida
We investigate the spectral distribution of the conditional FIM, which is the FIM given a single sample, by focusing on fully-connected networks achieving dynamical isometry.
no code implementations • 14 Oct 2019 • Ryo Karakida, Shotaro Akaho, Shun-ichi Amari
The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions.
no code implementations • NeurIPS 2019 • Ryo Karakida, Shotaro Akaho, Shun-ichi Amari
Thus, we can conclude that batch normalization in the last layer significantly contributes to decreasing the sharpness induced by the FIM.
no code implementations • 22 Aug 2018 • Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi
The manifold of input signals is embedded in a higher dimensional manifold of the next layer as a curved submanifold, provided the number of neurons is larger than that of inputs.
no code implementations • 22 Aug 2018 • Shun-ichi Amari, Ryo Karakida, Masafumi Oizumi
The natural gradient method uses the steepest descent direction in a Riemannian manifold, so it is effective in learning, avoiding plateaus.
no code implementations • 4 Jun 2018 • Ryo Karakida, Shotaro Akaho, Shun-ichi Amari
The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs).
no code implementations • 12 Dec 2017 • Yoshihiro Nagano, Ryo Karakida, Masato Okada
Our study demonstrated that transient dynamics of inference first approaches a concept, and then moves close to a memory.