Search Results for author: Rio Yokota

Found 31 papers, 16 papers with code

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

no code implementations19 Dec 2024 Koshiro Saito, Sakae Mizuki, Masanari Ohi, Taishi Nakamura, Taihei Shiotani, Koki Maeda, Youmi Ma, Kakeru Hattori, Kazuki Fujii, Takumi Okamoto, Shigeki Ishida, Hiroya Takamura, Rio Yokota, Naoaki Okazaki

In contrast, training on Japanese text could improve question-answering tasks about Japanese knowledge and English-Japanese translation, which indicates that abilities for solving these two tasks can be regarded as \textit{Japanese abilities} for LLMs.

Arithmetic Reasoning Code Generation +2

Lion Cub: Minimizing Communication Overhead in Distributed Lion

no code implementations25 Nov 2024 Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden

Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottleneck.

Quantization

Accelerating Large Language Model Training with 4D Parallelism and Memory Consumption Estimator

no code implementations10 Nov 2024 Kazuki Fujii, Kohei Watanabe, Rio Yokota

In large language model (LLM) training, several parallelization strategies, including Tensor Parallelism (TP), Pipeline Parallelism (PP), Data Parallelism (DP), as well as Sequence Parallelism (SP) and Context Parallelism (CP), are employed to distribute model parameters, activations, and optimizer states across devices.

Language Modeling Language Modelling +1

Variational Low-Rank Adaptation Using IVON

1 code implementation7 Nov 2024 Bai Cong, Nico Daheim, Yuesong Shen, Daniel Cremers, Rio Yokota, Mohammad Emtiyaz Khan, Thomas Möllenhoff

We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models.

Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation

no code implementations4 Nov 2024 Satoki Ishikawa, Rio Yokota, Ryo Karakida

We demonstrate that, in specific standard settings, PC in the infinite-width limit behaves more similarly to the first-order gradient.

Formula-Supervised Visual-Geometric Pre-training

no code implementations20 Sep 2024 Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara, Nakamasa Inoue, Rio Yokota, Yutaka Satoh

Throughout the history of computer vision, while research has explored the integration of images (visual) and point clouds (geometric), many advancements in image and 3D object recognition have tended to process these modalities separately.

3D Object Classification 3D Object Recognition +2

Scaling Backwards: Minimal Synthetic Pre-training?

1 code implementation1 Aug 2024 Ryo Nakamura, Ryu Tadokoro, Ryosuke Yamada, Yuki M. Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k.

Transfer Learning

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

no code implementations4 Jul 2024 LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano, Atsushi Keyaki, Keisuke Kiryu, Hirokazu Kiyomaru, Takashi Kodama, Takahiro Kubo, Yohei Kuga, Ryoma Kumon, Shuhei Kurita, Sadao Kurohashi, Conglong Li, Taiki Maekawa, Hiroshi Matsuda, Yusuke Miyao, Kentaro Mizuki, Sakae Mizuki, Yugo Murawaki, Akim Mousterou, Ryo Nakamura, Taishi Nakamura, Kouta Nakayama, Tomoka Nakazato, Takuro Niitsuma, Jiro Nishitoba, Yusuke Oda, Hayato Ogawa, Takumi Okamoto, Naoaki Okazaki, Yohei Oseki, Shintaro Ozaki, Koki Ryu, Rafal Rzepka, Keisuke Sakaguchi, Shota Sasaki, Satoshi Sekine, Kohei Suda, Saku Sugawara, Issa Sugiura, Hiroaki Sugiyama, Hisami Suzuki, Jun Suzuki, Toyotaro Suzumura, Kensuke Tachibana, Yu Takagi, Kyosuke Takami, Koichi Takeda, Masashi Takeshita, Masahiro Tanaka, Kenjiro Taura, Arseny Tolmachev, Nobuhiro Ueda, Zhen Wan, Shuntaro Yada, Sakiko Yahata, Yuya Yamamoto, Yusuke Yamauchi, Hitomi Yanaka, Rio Yokota, Koichiro Yoshino

This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs).

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

no code implementations27 Apr 2024 Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, Naoaki Okazaki

The results showed that the efficiency gained through vocabulary expansion had no negative impact on performance, except for the summarization task, and that the combined use of parallel corpora enhanced translation ability.

Question Answering

Building a Large Japanese Web Corpus for Large Language Models

no code implementations27 Apr 2024 Naoaki Okazaki, Kakeru Hattori, Hirai Shota, Hiroki Iida, Masanari Ohi, Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Rio Yokota, Sakae Mizuki

Open Japanese large language models (LLMs) have been trained on the Japanese portions of corpora such as CC-100, mC4, and OSCAR.

Variational Learning is Effective for Large Deep Networks

1 code implementation27 Feb 2024 Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff

We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks.

SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

1 code implementation ICCV 2023 Risa Shinoda, Ryo Hayamizu, Kodai Nakashima, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

SegRCDB has a high potential to contribute to semantic segmentation pre-training and investigation by enabling the creation of large datasets without manual annotation.

Segmentation Semantic Segmentation

Pre-training Vision Transformers with Very Limited Synthesized Images

1 code implementation ICCV 2023 Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat Martinez Noriega, Rio Yokota, Nakamasa Inoue

Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks.

Data Augmentation

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

2 code implementations8 May 2023 Kazuki Osawa, Satoki Ishikawa, Rio Yokota, Shigang Li, Torsten Hoefler

Gradient preconditioning is a key technique to integrate the second-order information into gradients for improving and extending gradient-based learning algorithms.

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

no code implementations CVPR 2023 Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility.

Informative Sample-Aware Proxy for Deep Metric Learning

no code implementations18 Nov 2022 Aoyu Li, Ikuro Sato, Kohta Ishikawa, Rei Kawakami, Rio Yokota

Among various supervised deep metric learning methods proxy-based approaches have achieved high retrieval accuracies.

Metric Learning Retrieval

Replacing Labeled Real-image Datasets with Auto-generated Contours

no code implementations CVPR 2022 Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota

In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs).

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

no code implementations29 Sep 2021 Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato

Generalization measures are intensively studied in the machine learning community for better modeling generalization gaps.

Hyperparameter Optimization

RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering

1 code implementation ICCV 2021 Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, Kris M. Kitani

Furthermore, we utilize differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the feature-metric error between the input and rendered image representations without the need of zooming in.

6D Pose Estimation 6D Pose Estimation using RGB +1

Epipolar-Guided Deep Object Matching for Scene Change Detection

no code implementations30 Jul 2020 Kento Doi, Ryuhei Hamaguchi, Shun Iwase, Rio Yokota, Yutaka Matsuo, Ken Sakurada

To cope with the difficulty, we introduce a deep graph matching network that establishes object correspondence between an image pair.

Change Detection Graph Matching +2

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

1 code implementation13 Feb 2020 Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Chuan-Sheng Foo, Rio Yokota

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size.

Deep Learning Image Classification

Effect of Mixed Precision Computing on H-Matrix Vector Multiplication in BEM Analysis

1 code implementation30 Oct 2019 Rise Ooi, Takeshi Iwashita, Takeshi Fukaya, Akihiro Ida, Rio Yokota

Hierarchical Matrix (H-matrix) is an approximation technique which splits a target dense matrix into multiple submatrices, and where a selected portion of submatrices are low-rank approximated.

Mathematical Software Distributed, Parallel, and Cluster Computing

Practical Deep Learning with Bayesian Principles

1 code implementation NeurIPS 2019 Kazuki Osawa, Siddharth Swaroop, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota, Mohammad Emtiyaz Khan

Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted.

Continual Learning Data Augmentation +2

Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering

1 code implementation27 Mar 2018 Mustafa Abduljabbar, Mohammed Al Farhan, Noha Al-Harthi, Rui Chen, Rio Yokota, Hakan Bagci, David Keyes

With distributed memory optimizations, on the other hand, we report near-optimal efficiency in the weak scalability study with respect to both the logarithmic communication complexity as well as the theoretical scaling complexity of FMM.

Performance Computational Engineering, Finance, and Science Mathematical Software

Asynchronous Execution of the Fast Multipole Method Using Charm++

1 code implementation29 May 2014 Mustafa AbdulJabbar, Rio Yokota, David Keyes

Fast multipole methods (FMM) on distributed mem- ory have traditionally used a bulk-synchronous model of com- municating the local essential tree (LET) and overlapping it with computation of the local data.

Distributed, Parallel, and Cluster Computing 70F10 D.1.2; D.1.3; G.1.0; G.1.2

Cannot find the paper you are looking for? You can Submit a new open access paper.