no code implementations • 30 Dec 2018 • Hajar Falahati, Pejman Lotfi-Kamran, Mohammad Sadrosadati, Hamid Sarbazi-Azad
To utilize available bandwidth without violating area and power budgets of logic layer, ORIGAMI comes with a computation-splitting compiler that divides an ML algorithm between in-memory accelerators and an out-of-the-memory platform in a balanced way and with minimum inter-communications.
1 code implementation • 1 Sep 2022 • Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu
To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads.
1 code implementation • 9 Dec 2022 • Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu
However, for many applications, the majority of reads do no match the reference genome of interest (i. e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation.
1 code implementation • 3 Apr 2023 • Maurus Item, Juan Gómez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu
In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc.
1 code implementation • 8 Oct 2023 • Joël Lindegger, Can Firtina, Nika Mansouri Ghiasi, Mohammad Sadrosadati, Mohammed Alser, Onur Mutlu
mean) while improving accuracy by 1. 35$\times$ (1. 34$\times$) in terms of F-1 score on average (geo.
no code implementations • 26 Feb 2024 • Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko
Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors.
no code implementations • 10 Apr 2024 • Steve Rhyner, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu
Processor-centric architectures (e. g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i. e., due to repeatedly accessing the training dataset.