Search Results for author: Kazuki Osawa

Found 9 papers, 7 papers with code

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

2 code implementations • 8 May 2023 • Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler

Gradient preconditioning is a key technique to integrate the second-order information into gradients for improving and extending gradient-based learning algorithms.

172

Paper
Code

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

1 code implementation • 25 Nov 2022 • Kazuki Osawa, Shigang Li, Torsten Hoefler

Pipeline parallelism enables efficient training of Large Language Models (LLMs) on large-scale distributed accelerator clusters.

Paper
Code

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias

no code implementations • 6 Oct 2022 • Ryo Karakida, Tomoumi Takase, Tomohiro Hayase, Kazuki Osawa

In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR.

Paper
Add Code

Neural Graph Databases

no code implementations • 20 Sep 2022 • Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler

In general, LPG2vec enables combining predictive power of the most powerful GNNs with the full scope of information encoded in the LPG model, paving the way for neural graph databases, a class of systems where the vast complexity of maintained data will benefit from modern and future graph machine learning methods.

Paper
Add Code

Efficient Quantized Sparse Matrix Operations on Tensor Cores

1 code implementation • 14 Sep 2022 • Shigang Li, Kazuki Osawa, Torsten Hoefler

We propose Magicube, a high-performance sparse-matrix library for low-precision integers on Tensor cores.

Quantization

Paper
Code

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

1 code implementation • NeurIPS 2020 • Ryo Karakida, Kazuki Osawa

In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD.

Paper
Code

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

1 code implementation • 13 Feb 2020 • Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, Rio Yokota

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size.

Image Classification

Paper
Code

Practical Deep Learning with Bayesian Principles

1 code implementation • NeurIPS 2019 • Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan

Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted.

Continual Learning Data Augmentation +1

239

Paper
Code

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

3 code implementations • CVPR 2019 • Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

Large-scale distributed training of deep neural networks suffer from the generalization gap caused by the increase in the effective mini-batch size.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.