Search Results for author: Onur Mutlu

Found 57 papers, 33 papers with code

Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System

no code implementations10 Apr 2024 Steve Rhyner, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu

Processor-centric architectures (e. g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i. e., due to repeatedly accessing the training dataset.

Distributed Optimization

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems

no code implementations26 Feb 2024 Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko

Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors.

Graph Neural Network

Demystifying Chains, Trees, and Graphs of Thoughts

no code implementations25 Jan 2024 Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler

Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph.

Mathematical Reasoning Prompt Engineering

MetaTrinity: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation

3 code implementations3 Nov 2023 Arvid E. Gollwitzer, Mohammed Alser, Joel Bergtholdt, Joel Lindegger, Maximilian-David Rumpf, Can Firtina, Serghei Mangul, Onur Mutlu

This dual comparison positions MetaTrinity as a broadly applicable solution for metagenomic classification, combining advantages of both ends of the spectrum: speed and accuracy.

RawHash2: Mapping Raw Nanopore Signals Using Hash-Based Seeding and Adaptive Quantization

1 code implementation11 Sep 2023 Can Firtina, Melina Soysal, Joël Lindegger, Onur Mutlu

Summary: Raw nanopore signals can be analyzed while they are being generated, a process known as real-time analysis.

Quantization

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

1 code implementation3 Apr 2023 Maurus Item, Juan Gómez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu

In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc.

RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes

1 code implementation22 Jan 2023 Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, Onur Mutlu

RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value.

Quantization

RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs

1 code implementation15 Jan 2023 André Santos, João Dinis Ferreira, Onur Mutlu, Gabriel Falcao

However, the large strides in accuracy obtained by CNNs have been derived from increasing the complexity of network topologies, which incurs sizeable performance and energy penalties in the training and inference of CNNs.

Quantization

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

1 code implementation9 Dec 2022 Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu

However, for many applications, the majority of reads do no match the reference genome of interest (i. e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation.

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators

no code implementations10 Nov 2022 Aditya Manglik, Minesh Patel, Haiyu Mao, Behzad Salami, Jisung Park, Lois Orosa, Onur Mutlu

Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads as RRAM-based Processing-in-Memory (PIM) architectures natively support highly-parallel multiply-accumulate (MAC) operations that form the backbone of most NN workloads.

Compiler Optimization

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

no code implementations19 Sep 2022 Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu

Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3. 0x and 3. 1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16. 7x/1. 4x for three binary NNs.

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

1 code implementation1 Sep 2022 Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads.

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

no code implementations22 Aug 2022 Gagandeep Singh, Dionysios Diamantopoulos, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu

The key idea of LEAPER is to transfer an ML-based performance and resource usage model trained for a low-end edge environment to a new, high-end cloud environment to provide fast and accurate predictions for accelerator implementation.

Design Synthesis Transfer Learning

COVIDHunter: COVID-19 pandemic wave prediction and mitigation via seasonality-aware modeling

1 code implementation14 Jun 2022 Mohammed Alser, Jeremie S. Kim, Nour Almadhoun Alserr, Stefan W. Tell, Onur Mutlu

We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region, predicts COVID-19 statistics (the daily number of cases, hospitalizations, and deaths), and provides suggestions on what strength the upcoming mitigation measure should be.

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

no code implementations29 May 2022 Geraldo F. Oliveira, Amirali Boroumand, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging applications is processing-in-memory (PIM), where the cost of data movement to/from main memory is reduced by placing computation capabilities close to memory.

DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

no code implementations17 Feb 2022 Jisung Park, Jeoggyun Kim, Yeseong Kim, Sungjin Lee, Onur Mutlu

Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center.

Management

GenShare: Sharing Accurate Differentially-Private Statistics for Genomic Datasets with Dependent Tuples

no code implementations30 Dec 2021 Nour Almadhoun Alserr, Ozgur Ulusoy, Erman Ayday, Onur Mutlu

While sharing genomic data across researchers is an essential driver of advances in health and biomedical research, the sharing process is often infeasible due to data privacy concerns.

Privacy Preserving

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis

1 code implementation16 Dec 2021 Can Firtina, Jisung Park, Mohammed Alser, Jeremie S. Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, Onur Mutlu

We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches.

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

no code implementations29 Sep 2021 Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

To understand how edge ML accelerators perform, we characterize the performance of a commercial Google Edge TPU, using 24 Google edge NN models (which span a wide range of NN model types) and analyzing each NN layer within each model.

Edge-computing Face Detection +3

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

2 code implementations24 Sep 2021 Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu

In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design.

reinforcement-learning Reinforcement Learning (RL)

Energy-Efficient Mobile Robot Control via Run-time Monitoring of Environmental Complexity and Computing Workload

no code implementations8 Sep 2021 Sherif A. S. Mohamed, Mohammad-Hashem Haghbayan, Antonio Miele, Onur Mutlu, Juha Plosila

We propose an energy-efficient controller to minimize the energy consumption of a mobile robot by dynamically manipulating the mechanical and computational actuators of the robot.

GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping

1 code implementation27 Mar 2021 Zülal Bingöl, Mohammed Alser, Onur Mutlu, Ozcan Ozturk, Can Alkan

At the last step of short read mapping, the candidate locations of the reads on the reference genome are verified to compute their differences from the corresponding reference segments using sequence alignment algorithms.

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

no code implementations1 Mar 2021 Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

We comprehensively study the characteristics of each NN layer in all of the Google edge models, and find that these shortcomings arise from the one-size-fits-all approach of the accelerator, as there is a high amount of heterogeneity in key layer characteristics both across different models and across different layers in the same model.

BIG-bench Machine Learning Edge-computing

COVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model

1 code implementation6 Feb 2021 Mohammed Alser, Jeremie S. Kim, Nour Almadhoun Alserr, Stefan W. Tell, Onur Mutlu

The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e. g., climate, temperature, humidity) and mitigation measures.

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead

no code implementations4 Jan 2021 Muhammad Shafique, Mahum Naseer, Theocharis Theocharides, Christos Kyrkou, Onur Mutlu, Lois Orosa, Jungwook Choi

Machine Learning (ML) techniques have been rapidly adopted by smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT) due to their powerful decision-making capabilities.

BIG-bench Machine Learning Decision Making

Reducing Solid-State Drive Read Latency by Optimizing Read-Retry (Extended Abstract)

no code implementations22 Dec 2020 Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, Onur Mutlu

Through a detailed analysis of the read mechanism and rigorous characterization of 160 real 3D NAND flash memory chips, we find new opportunities to reduce the read-retry latency by exploiting two advanced features widely adopted in modern NAND flash-based SSDs: 1) the CACHE READ command and 2) strong ECC engine.

Hardware Architecture Distributed, Parallel, and Cluster Computing

SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM

no code implementations22 Dec 2020 Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, João Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

Compared to a CPU and a high-end GPU, SIMDRAM is 257x and 31x more energy-efficient, while providing 93x and 6x higher operation throughput, respectively.

Hardware Architecture Distributed, Parallel, and Cluster Computing Emerging Technologies

Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics

1 code implementation17 Sep 2020 Minesh Patel, Jeremie S. Kim, Taha Shahroodi, Hasan Hassan, Onur Mutlu

As a concrete example, we introduce and evaluate BEEP, the first error profiling methodology that uses the known on-die ECC function to recover the number and bit-exact locations of unobservable raw bit errors responsible for observable post-correction errors.

Hardware Architecture

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

2 code implementations16 Sep 2020 Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu

Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data.

Hardware Architecture Genomics

Accelerating B-spline Interpolation on GPUs: Application to Medical Image Registration

1 code implementation13 Apr 2020 Orestis Zachariadis, Andrea Teatini, Nitin Satpute, Juan Gómez-Luna, Onur Mutlu, Ole Jakob Elle, Joaquín Olivares

In this paper, we introduce a novel GPU implementation of BSI to accelerate the calculation of the deformation field in non-rigid image registration algorithms.

Distributed, Parallel, and Cluster Computing Image and Video Processing

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

1 code implementation18 Dec 2019 Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Mohammed Alser, Nastaran Hajinazar, Can Alkan, Onur Mutlu

There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i. e., remapping).

SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

1 code implementation20 Oct 2019 Mohammed Alser, Taha Shahroodi, Juan Gomez-Luna, Can Alkan, Onur Mutlu

The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout.

EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM

no code implementations12 Oct 2019 Skanda Koppula, Lois Orosa, Abdullah Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, Onur Mutlu

Based on this observation, we propose EDEN, a general framework that reduces DNN energy consumption and DNN evaluation latency by using approximate DRAM devices, while strictly meeting a user-specified target DNN accuracy.

The Non-IID Data Quagmire of Decentralized Machine Learning

1 code implementation ICML 2020 Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons

Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem.

BIG-bench Machine Learning

A Workload and Programming Ease Driven Perspective of Processing-in-Memory

no code implementations26 Jul 2019 Saugata Ghose, Amirali Boroumand, Jeremie S. Kim, Juan Gómez-Luna, Onur Mutlu

First, we describe our work on systematically identifying opportunities for PIM in real applications, and quantify potential gains for popular emerging applications (e. g., machine learning, data analytics, genome analysis).

Distributed, Parallel, and Cluster Computing Hardware Architecture

Processing Data Where It Makes Sense: Enabling In-Memory Computation

no code implementations10 Mar 2019 Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun

This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory is already a key bottleneck as applications become more data-intensive and memory bandwidth and energy do not scale well, (2) energy consumption is a key constraint in especially mobile and server systems, (3) data movement is very expensive in terms of bandwidth, energy and latency, much more so than computation.

Hardware Architecture

Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study

7 code implementations20 Feb 2019 Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, Onur Mutlu

As a result, the combined DRAM-workload behavior is often difficult to intuitively determine today, which can hinder memory optimizations in both hardware and software.

Hardware Architecture Performance

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

1 code implementation12 Feb 2019 Can Firtina, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A. Ercument Cicek, Can Alkan, Onur Mutlu

Our experiments with real read sets demonstrate that Apollo is the only algorithm that 1) uses reads from any sequencing technology within a single run and 2) scales well to polish large assemblies without splitting the assembly into multiple parts.

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

no code implementations1 Feb 2018 Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, Onur Mutlu

This requires efficient mechanisms that can provide logic in DRAM with access to CPU structures without having to communicate frequently with the CPU.

Hardware Architecture

Focus: Querying Large Video Datasets with Low Latency and Low Cost

no code implementations10 Jan 2018 Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu

Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time.

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies

1 code implementation2 Nov 2017 Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, Onur Mutlu

State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i. e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i. e., sequence alignment) to determine the origin of the read.

GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

1 code implementation6 Apr 2016 Mohammed Alser, Hasan Hassan, Hongyi Xin, Oğuz Ergin, Onur Mutlu, Can Alkan

The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10.

Cannot find the paper you are looking for? You can Submit a new open access paper.