Search Results for author: Satoshi Matsuoka

Found 14 papers, 8 papers with code

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

1 code implementation13 Apr 2018 Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning.

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

1 code implementation27 Mar 2019 Yusuke Nagasaka, Akira Nukada, Ryosuke Kojima, Satoshi Matsuoka

We evaluated the performance of the GCNs application on TSUBAME3. 0 implementing NVIDIA Tesla P100 GPU, and our batched approach shows significant speedups of up to 1. 59x and 1. 37x in training and inference, respectively.

Distributed, Parallel, and Cluster Computing

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

1 code implementation1 Feb 2018 Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

Furthermore, we estimate that the upcoming Stratix 10 devices can achieve a performance of up to 3. 5 TFLOP/s and 1. 6 TFLOP/s for 2D and 3D stencil computation, respectively.

Distributed, Parallel, and Cluster Computing Hardware Architecture

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

1 code implementation14 Feb 2020 Hamid Reza Zohouri, Artur Podobas, Satoshi Matsuoka

We show that despite the higher computation intensity and on-chip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective.

Distributed, Parallel, and Cluster Computing

Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen

no code implementations COLING 2016 Aleks Drozd, R, Anna Gladkova, Satoshi Matsuoka

Solving word analogies became one of the most popular benchmarks for word embeddings on the assumption that linear relations between word pairs (such as \textit{king}:\textit{man} :: \textit{woman}:\textit{queen}) are indicative of the quality of the embedding.

BIG-bench Machine Learning Morphological Analysis +3

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

no code implementations9 Apr 2020 Artur Podobas, Kentaro Sano, Satoshi Matsuoka

With the end of both Dennard's scaling and Moore's law, computer users and researchers are aggressively exploring alternative forms of computing in order to continue the performance scaling that we have come to enjoy.

Hardware Architecture A.1; B.0; C.1; C.3

High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

1 code implementation5 Apr 2018 Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydın Buluç

Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type.

Distributed, Parallel, and Cluster Computing

Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs

1 code implementation21 Dec 2018 Hiroki Kanezashi, Toyotaro Suzumura, Dario Garcia-Gasulla, Min-hwan Oh, Satoshi Matsuoka

We propose an incremental graph pattern matching algorithm to deal with time-evolving graph data and also propose an adaptive optimization system based on reinforcement learning to recompute vertices in the incremental process more efficiently.

Databases

Myths and Legends in High-Performance Computing

no code implementations6 Jan 2023 Satoshi Matsuoka, Jens Domke, Mohamed Wahib, Aleksandr Drozd, Torsten Hoefler

While some laws end, new directions are emerging, such as algorithmic scaling or novel architecture research.

Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.