no code implementations • 19 Sep 2022 • Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu
Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3. 0x and 3. 1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16. 7x/1. 4x for three binary NNs.
no code implementations • 16 Jul 2022 • Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu
Our K-Means clustering on PIM is $2. 8\times$ and $3. 2\times$ than state-of-the-art CPU and GPU versions, respectively.
no code implementations • 13 Jun 2022 • Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu
Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate machine learning training.
no code implementations • 29 May 2022 • Geraldo F. Oliveira, Amirali Boroumand, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu
One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging applications is processing-in-memory (PIM), where the cost of data movement to/from main memory is reduced by placing computation capabilities close to memory.
no code implementations • 29 Sep 2021 • Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu
To understand how edge ML accelerators perform, we characterize the performance of a commercial Google Edge TPU, using 24 Google edge NN models (which span a wide range of NN model types) and analyzing each NN layer within each model.
no code implementations • 1 Mar 2021 • Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu
We comprehensively study the characteristics of each NN layer in all of the Google edge models, and find that these shortcomings arise from the one-size-fits-all approach of the accelerator, as there is a high amount of heterogeneity in key layer characteristics both across different models and across different layers in the same model.
no code implementations • 22 Dec 2020 • Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, João Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu
Compared to a CPU and a high-end GPU, SIMDRAM is 257x and 31x more energy-efficient, while providing 93x and 6x higher operation throughput, respectively.
Hardware Architecture Distributed, Parallel, and Cluster Computing Emerging Technologies