Search Results for author: Satoshi Matsuoka

Found 14 papers, 8 papers with code

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks

3 code implementations • CVPR 2019 • Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno, Akira Naruse, Rio Yokota, Satoshi Matsuoka

Large-scale distributed training of deep neural networks suffer from the generalization gap caused by the increase in the effective mini-batch size.

Paper
Code

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

1 code implementation • 13 Apr 2018 • Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning.

Paper
Code

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

1 code implementation • 27 Mar 2019 • Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka

We evaluated the performance of the GCNs application on TSUBAME3. 0 implementing NVIDIA Tesla P100 GPU, and our batched approach shows significant speedups of up to 1. 59x and 1. 37x in training and inference, respectively.

Distributed, Parallel, and Cluster Computing

Paper
Code

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

1 code implementation • 1 Feb 2018 • Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

Furthermore, we estimate that the upcoming Stratix 10 devices can achieve a performance of up to 3. 5 TFLOP/s and 1. 6 TFLOP/s for 2D and 3D stencil computation, respectively.

Distributed, Parallel, and Cluster Computing Hardware Architecture

Paper
Code

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

1 code implementation • 14 Feb 2020 • Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective.

Distributed, Parallel, and Cluster Computing

Paper
Code

Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn't.

no code implementations • NAACL 2016 • Anna Gladkova, Aleks Drozd, R, Satoshi Matsuoka

Morphological Analysis Word Embeddings +1

Paper
Add Code

Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen

no code implementations • COLING 2016 • Aleks Drozd, R, Anna Gladkova, Satoshi Matsuoka

Solving word analogies became one of the most popular benchmarks for word embeddings on the assumption that linear relations between word pairs (such as \textit{king}:\textit{man} :: \textit{woman}:\textit{queen}) are indicative of the quality of the embedding.

BIG-bench Machine Learning Morphological Analysis +3

Paper
Add Code

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

1 code implementation • 25 Jul 2020 • Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Peter Nugent, Brian Van Essen

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks.

Paper
Code

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

no code implementations • 26 Aug 2020 • Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka

An alternative solution is to use out-of-core methods instead of, or in addition to, data parallelism.

Paper
Add Code

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

no code implementations • 9 Apr 2020 • Artur Podobas, Kentaro Sano, Satoshi Matsuoka

With the end of both Dennard's scaling and Moore's law, computer users and researchers are aggressively exploring alternative forms of computing in order to continue the performance scaling that we have come to enjoy.

Hardware Architecture A.1; B.0; C.1; C.3

Paper
Add Code

High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

1 code implementation • 5 Apr 2018 • Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydın Buluç

Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type.

Distributed, Parallel, and Cluster Computing

Paper
Code

Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs

1 code implementation • 21 Dec 2018 • Hiroki Kanezashi, Toyotaro Suzumura, Dario Garcia-Gasulla, Min-hwan Oh, Satoshi Matsuoka

We propose an incremental graph pattern matching algorithm to deal with time-evolving graph data and also propose an adaptive optimization system based on reinforcement learning to recompute vertices in the incremental process more efficiently.

Databases

Paper
Code

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

no code implementations • 21 Oct 2021 • Steven Farrell, Murali Emani, Jacob Balma, Lukas Drescher, Aleksandr Drozd, Andreas Fink, Geoffrey Fox, David Kanter, Thorsten Kurth, Peter Mattson, Dawei Mu, Amit Ruhela, Kento Sato, Koichi Shirahata, Tsuguchika Tabaru, Aristeidis Tsaris, Jan Balewski, Ben Cumming, Takumi Danjo, Jens Domke, Takaaki Fukai, Naoto Fukumoto, Tatsuya Fukushi, Balazs Gerofi, Takumi Honda, Toshiyuki Imamura, Akihiko Kasagi, Kentaro Kawakami, Shuhei Kudo, Akiyoshi Kuroda, Maxime Martinasso, Satoshi Matsuoka, Henrique Mendonça, Kazuki Minami, Prabhat Ram, Takashi Sawada, Mallikarjun Shankar, Tom St. John, Akihiro Tabuchi, Venkatram Vishwanath, Mohamed Wahib, Masafumi Yamazaki, Junqi Yin

Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights.

Benchmarking BIG-bench Machine Learning +1

Paper
Add Code

Myths and Legends in High-Performance Computing

no code implementations • 6 Jan 2023 • Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler

While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research.

Vocal Bursts Intensity Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.