Search Results for author: Kazuki Osawa

Found 9 papers, 7 papers with code

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

2 code implementations8 May 2023 Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler

Gradient preconditioning is a key technique to integrate the second-order information into gradients for improving and extending gradient-based learning algorithms.

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

1 code implementation25 Nov 2022 Kazuki Osawa, Shigang Li, Torsten Hoefler

Pipeline parallelism enables efficient training of Large Language Models (LLMs) on large-scale distributed accelerator clusters.

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

no code implementations6 Oct 2022 Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR.

Neural Graph Databases

no code implementations20 Sep 2022 Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler

In general, LPG2vec enables combining predictive power of the most powerful GNNs with the full scope of information encoded in the LPG model, paving the way for neural graph databases, a class of systems where the vast complexity of maintained data will benefit from modern and future graph machine learning methods.

Efficient Quantized Sparse Matrix Operations on Tensor Cores

1 code implementation14 Sep 2022 Shigang Li, Kazuki Osawa, Torsten Hoefler

We propose Magicube, a high-performance sparse-matrix library for low-precision integers on Tensor cores.


Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

1 code implementation NeurIPS 2020 Ryo Karakida, Kazuki Osawa

In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD.

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

1 code implementation13 Feb 2020 Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, Rio Yokota

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size.

Image Classification

Practical Deep Learning with Bayesian Principles

1 code implementation NeurIPS 2019 Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan

Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted.

Continual Learning Data Augmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.