Search Results for author: Mohammad Sadrosadati

Found 8 papers, 5 papers with code

Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System

no code implementations10 Apr 2024 Steve Rhyner, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu

Processor-centric architectures (e. g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i. e., due to repeatedly accessing the training dataset.

Distributed Optimization

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems

no code implementations26 Feb 2024 Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko

Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors.

Graph Neural Network

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

1 code implementation3 Apr 2023 Maurus Item, Juan Gómez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu

In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc.

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

1 code implementation9 Dec 2022 Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu

However, for many applications, the majority of reads do no match the reference genome of interest (i. e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation.

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

1 code implementation1 Sep 2022 Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads.

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning

no code implementations30 Dec 2018 Hajar Falahati, Pejman Lotfi-Kamran, Mohammad Sadrosadati, Hamid Sarbazi-Azad

To utilize available bandwidth without violating area and power budgets of logic layer, ORIGAMI comes with a computation-splitting compiler that divides an ML algorithm between in-memory accelerators and an out-of-the-memory platform in a balanced way and with minimum inter-communications.

Cannot find the paper you are looking for? You can Submit a new open access paper.