1 code implementation • 3 Apr 2023 • Maurus Item, Juan Gómez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu
In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc.
1 code implementation • 22 Jan 2023 • Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, Onur Mutlu
RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value.
1 code implementation • 15 Jan 2023 • André Santos, João Dinis Ferreira, Onur Mutlu, Gabriel Falcao
However, the large strides in accuracy obtained by CNNs have been derived from increasing the complexity of network topologies, which incurs sizeable performance and energy penalties in the training and inference of CNNs.
1 code implementation • 9 Dec 2022 • Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu
However, for many applications, the majority of reads do no match the reference genome of interest (i. e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation.
no code implementations • 10 Nov 2022 • Aditya Manglik, Minesh Patel, Haiyu Mao, Behzad Salami, Jisung Park, Lois Orosa, Onur Mutlu
Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads as RRAM-based Processing-in-Memory (PIM) architectures natively support highly-parallel multiply-accumulate (MAC) operations that form the backbone of most NN workloads.
no code implementations • 19 Sep 2022 • Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu
Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3. 0x and 3. 1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16. 7x/1. 4x for three binary NNs.
1 code implementation • 1 Sep 2022 • Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu
To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads.
no code implementations • 22 Aug 2022 • Gagandeep Singh, Dionysios Diamantopoulos, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu
The key idea of LEAPER is to transfer an ML-based performance and resource usage model trained for a low-end edge environment to a new, high-end cloud environment to provide fast and accurate predictions for accelerator implementation.
1 code implementation • 20 Jul 2022 • Can Firtina, Kamlesh Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie Kim, Taha Shahroodi, Meryem Banu Cavlak, Joel Lindegger, Mohammed Alser, Juan Gómez Luna, Sreenivas Subramoney, Onur Mutlu
When we analyze the state-of-the-art works, we find that there is a pressing need for a flexible, high-performant, and energy-efficient hardware-software co-design to efficiently and effectively solve all the major inefficiencies in the Baum-Welch algorithm for pHMMs.
1 code implementation • 16 Jul 2022 • Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu
Our K-Means clustering on PIM is $2. 8\times$ and $3. 2\times$ than state-of-the-art CPU and GPU versions, respectively.
1 code implementation • 14 Jun 2022 • Mohammed Alser, Jeremie S. Kim, Nour Almadhoun Alserr, Stefan W. Tell, Onur Mutlu
We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region, predicts COVID-19 statistics (the daily number of cases, hospitalizations, and deaths), and provides suggestions on what strength the upcoming mitigation measure should be.
no code implementations • 13 Jun 2022 • Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu
Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate machine learning training.
no code implementations • 29 May 2022 • Geraldo F. Oliveira, Amirali Boroumand, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu
One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging applications is processing-in-memory (PIM), where the cost of data movement to/from main memory is reduced by placing computation capabilities close to memory.
1 code implementation • 16 May 2022 • Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun, Haiyu Mao, Gagandeep Singh, Juan Gomez-Luna, Onur Mutlu
We hope that these efforts and the challenges we discuss provide a foundation for future work in making genome analysis more intelligent.
1 code implementation • 15 May 2022 • Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu
We introduce Sibyl, the first technique that uses reinforcement learning for data placement in hybrid storage systems.
no code implementations • 30 Mar 2022 • Mohammed Alser, Sharon Waymost, Ram Ayyala, Brendan Lawlor, Richard J. Abdill, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, Andre M. Ribeiro-dos-Santos, Can Firtina, Nour Almadhoun, Varuni Sarwal, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do, Kim, Malak S. Abedalthagafi, Onur Mutlu, Serghei Mangul
Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research.
no code implementations • 17 Feb 2022 • Jisung Park, Jeoggyun Kim, Yeseong Kim, Sungjin Lee, Onur Mutlu
Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center.
no code implementations • 4 Feb 2022 • Lois Orosa, Skanda Koppula, Yaman Umuroglu, Konstantinos Kanellopoulos, Juan Gomez-Luna, Michaela Blott, Kees Vissers, Onur Mutlu
We find that commonly-used low-power CNN inference accelerators based on spatial architectures are not optimized for both of these convolutional kernels.
1 code implementation • 17 Jan 2022 • Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Can Alkan, Onur Mutlu
Also available in Bioconda at: https://anaconda. org/bioconda/fastremap-bio.
no code implementations • 30 Dec 2021 • Nour Almadhoun Alserr, Ozgur Ulusoy, Erman Ayday, Onur Mutlu
While sharing genomic data across researchers is an essential driver of advances in health and biomedical research, the sharing process is often infeasible due to data privacy concerns.
1 code implementation • 16 Dec 2021 • Can Firtina, Jisung Park, Mohammed Alser, Jeremie S. Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, Onur Mutlu
We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches.
no code implementations • 29 Sep 2021 • Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu
To understand how edge ML accelerators perform, we characterize the performance of a commercial Google Edge TPU, using 24 Google edge NN models (which span a wide range of NN model types) and analyzing each NN layer within each model.
2 code implementations • 24 Sep 2021 • Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu
In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design.
no code implementations • 8 Sep 2021 • Sherif A. S. Mohamed, Mohammad-Hashem Haghbayan, Antonio Miele, Onur Mutlu, Juha Plosila
We propose an energy-efficient controller to minimize the energy consumption of a mobile robot by dynamically manipulating the mechanical and computational actuators of the robot.
1 code implementation • 27 Mar 2021 • Zülal Bingöl, Mohammed Alser, Onur Mutlu, Ozcan Ozturk, Can Alkan
At the last step of short read mapping, the candidate locations of the reads on the reference genome are verified to compute their differences from the corresponding reference segments using sequence alignment algorithms.
no code implementations • 5 Mar 2021 • Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi, Jakub Beranek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Ozdemir, Adrian Balla, Marcin Copik, Philipp Lindenberger, Pavel Kalvoda, Marek Konieczny, Onur Mutlu, Torsten Hoefler
We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms.
no code implementations • 1 Mar 2021 • Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu
We comprehensively study the characteristics of each NN layer in all of the Google edge models, and find that these shortcomings arise from the one-size-fits-all approach of the accelerator, as there is a high amount of heterogeneity in key layer characteristics both across different models and across different layers in the same model.
1 code implementation • 6 Feb 2021 • Mohammed Alser, Jeremie S. Kim, Nour Almadhoun Alserr, Stefan W. Tell, Onur Mutlu
The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e. g., climate, temperature, humidity) and mitigation measures.
no code implementations • 4 Jan 2021 • Muhammad Shafique, Mahum Naseer, Theocharis Theocharides, Christos Kyrkou, Onur Mutlu, Lois Orosa, Jungwook Choi
Machine Learning (ML) techniques have been rapidly adopted by smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT) due to their powerful decision-making capabilities.
no code implementations • 22 Dec 2020 • Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, João Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu
Compared to a CPU and a high-end GPU, SIMDRAM is 257x and 31x more energy-efficient, while providing 93x and 6x higher operation throughput, respectively.
Hardware Architecture Distributed, Parallel, and Cluster Computing Emerging Technologies
no code implementations • 22 Dec 2020 • Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, Onur Mutlu
Through a detailed analysis of the read mechanism and rigorous characterization of 160 real 3D NAND flash memory chips, we find new opportunities to reduce the read-retry latency by exploiting two advanced features widely adopted in modern NAND flash-based SSDs: 1) the CACHE READ command and 2) strong ECC engine.
Hardware Architecture Distributed, Parallel, and Cluster Computing
1 code implementation • 17 Sep 2020 • Minesh Patel, Jeremie S. Kim, Taha Shahroodi, Hasan Hassan, Onur Mutlu
As a concrete example, we introduce and evaluate BEEP, the first error profiling methodology that uses the known on-die ECC function to recover the number and bit-exact locations of unobservable raw bit errors responsible for observable post-correction errors.
Hardware Architecture
2 code implementations • 16 Sep 2020 • Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu
Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data.
Hardware Architecture Genomics
no code implementations • 4 May 2020 • Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman S. Unsal, Hamid Sarbazi-Azad, Onur Mutlu
We evaluate the reliability-power trade-off for such accelerators.
1 code implementation • 13 Apr 2020 • Orestis Zachariadis, Andrea Teatini, Nitin Satpute, Juan Gómez-Luna, Onur Mutlu, Ole Jakob Elle, Joaquín Olivares
In this paper, we introduce a novel GPU implementation of BSI to accelerate the calculation of the deformation field in non-rigid image registration algorithms.
Distributed, Parallel, and Cluster Computing Image and Video Processing
2 code implementations • 28 Feb 2020 • Mohammed Alser, Jeremy Rotman, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D. Singer, Brunilda Balliu, David Koslicki, Pavel Skums, Alex Zelikovsky, Can Alkan, Onur Mutlu, Serghei Mangul
We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques.
1 code implementation • 18 Dec 2019 • Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Mohammed Alser, Nastaran Hajinazar, Can Alkan, Onur Mutlu
There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i. e., remapping).
1 code implementation • 20 Oct 2019 • Mohammed Alser, Taha Shahroodi, Juan Gomez-Luna, Can Alkan, Onur Mutlu
The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout.
no code implementations • 12 Oct 2019 • Skanda Koppula, Lois Orosa, Abdullah Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, Onur Mutlu
Based on this observation, we propose EDEN, a general framework that reduces DNN energy consumption and DNN evaluation latency by using approximate DRAM devices, while strictly meeting a user-specified target DNN accuracy.
1 code implementation • ICML 2020 • Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons
Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem.
no code implementations • 26 Jul 2019 • Saugata Ghose, Amirali Boroumand, Jeremie S. Kim, Juan Gómez-Luna, Onur Mutlu
First, we describe our work on systematically identifying opportunities for PIM in real applications, and quantify potential gains for popular emerging applications (e. g., machine learning, data analytics, genome analysis).
Distributed, Parallel, and Cluster Computing Hardware Architecture
no code implementations • 10 Mar 2019 • Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun
This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory is already a key bottleneck as applications become more data-intensive and memory bandwidth and energy do not scale well, (2) energy consumption is a key constraint in especially mobile and server systems, (3) data movement is very expensive in terms of bandwidth, energy and latency, much more so than computation.
Hardware Architecture
1 code implementation • 8 Mar 2019 • Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, Ce Zhang
Learning from the data stored in a database is an important function increasingly available in relational engines.
7 code implementations • 20 Feb 2019 • Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, Onur Mutlu
As a result, the combined DRAM-workload behavior is often difficult to intuitively determine today, which can hinder memory optimizations in both hardware and software.
Hardware Architecture Performance
1 code implementation • 12 Feb 2019 • Can Firtina, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A. Ercument Cicek, Can Alkan, Onur Mutlu
Our experiments with real read sets demonstrate that Apollo is the only algorithm that 1) uses reads from any sequencing technology within a single run and 2) scales well to polish large assemblies without splitting the assembly into multiple parts.
no code implementations • 1 Feb 2018 • Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, Onur Mutlu
This requires efficient mechanisms that can provide logic in DRAM with access to CPU structures without having to communicate frequently with the CPU.
Hardware Architecture
no code implementations • 10 Jan 2018 • Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu
Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time.
1 code implementation • 2 Nov 2017 • Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, Onur Mutlu
State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i. e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i. e., sequence alignment) to determine the origin of the read.
1 code implementation • 6 Apr 2016 • Mohammed Alser, Hasan Hassan, Hongyi Xin, Oğuz Ergin, Onur Mutlu, Can Alkan
The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10.