Search Results for author: Onur Mutlu

Found 56 papers, 32 papers with code

Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System

no code implementations • 10 Apr 2024 • Steve Rhyner, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu

Processor-centric architectures (e. g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i. e., due to repeatedly accessing the training dataset.

Distributed Optimization

Paper
Add Code

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems

no code implementations • 26 Feb 2024 • Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko

Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors.

Paper
Add Code

Demystifying Chains, Trees, and Graphs of Thoughts

no code implementations • 25 Jan 2024 • Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler

Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph.

Mathematical Reasoning Prompt Engineering

Paper
Add Code

MetaTrinity: Enabling Fast Metagenomic Classification via Seed Counting and Edit Distance Approximation

3 code implementations • 3 Nov 2023 • Arvid E. Gollwitzer, Mohammed Alser, Joel Bergtholdt, Joel Lindegger, Maximilian-David Rumpf, Can Firtina, Serghei Mangul, Onur Mutlu

This dual comparison positions MetaTrinity as a broadly applicable solution for metagenomic classification, combining advantages of both ends of the spectrum: speed and accuracy.

Paper
Code

SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences

1 code implementation • 25 Oct 2023 • Maximilian-David Rumpf, Mohammed Alser, Arvid E. Gollwitzer, Joel Lindegger, Nour Almadhoun, Can Firtina, Serghei Mangul, Onur Mutlu

Comparing genomic sequences is one of the most fundamental computational steps in most genomic analyses.

Paper
Code

RawAlign: Accurate, Fast, and Scalable Raw Nanopore Signal Mapping via Combining Seeding and Alignment

1 code implementation • 8 Oct 2023 • Joël Lindegger, Can Firtina, Nika Mansouri Ghiasi, Mohammad Sadrosadati, Mohammed Alser, Onur Mutlu

mean) while improving accuracy by 1. 35$\times$ (1. 34$\times$) in terms of F-1 score on average (geo.

Paper
Code

RawHash2: Accurate and Fast Mapping of Raw Nanopore Signals using a Hash-based Seeding Mechanism

1 code implementation • 11 Sep 2023 • Can Firtina, Melina Soysal, Joël Lindegger, Onur Mutlu

Summary: Raw nanopore signals can be analyzed while they are being generated, a process known as real-time analysis.

Paper
Code

TransPimLib: A Library for Efficient Transcendental Functions on Processing-in-Memory Systems

1 code implementation • 3 Apr 2023 • Maurus Item, Juan Gómez-Luna, Yuxin Guo, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu

In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc.

Paper
Code

RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes

1 code implementation • 22 Jan 2023 • Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, Onur Mutlu

RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value.

Quantization

Paper
Code

RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs

1 code implementation • 15 Jan 2023 • André Santos, João Dinis Ferreira, Onur Mutlu, Gabriel Falcao

However, the large strides in accuracy obtained by CNNs have been derived from increasing the complexity of network topologies, which incurs sizeable performance and energy penalties in the training and inference of CNNs.

Quantization

Paper
Code

TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering

1 code implementation • 9 Dec 2022 • Meryem Banu Cavlak, Gagandeep Singh, Mohammed Alser, Can Firtina, Joël Lindegger, Mohammad Sadrosadati, Nika Mansouri Ghiasi, Can Alkan, Onur Mutlu

However, for many applications, the majority of reads do no match the reference genome of interest (i. e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation.

Paper
Code

NEON: Enabling Efficient Support for Nonlinear Operations in Resistive RAM-based Neural Network Accelerators

no code implementations • 10 Nov 2022 • Aditya Manglik, Minesh Patel, Haiyu Mao, Behzad Salami, Jisung Park, Lois Orosa, Onur Mutlu

Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads as RRAM-based Processing-in-Memory (PIM) architectures natively support highly-parallel multiply-accumulate (MAC) operations that form the backbone of most NN workloads.

Compiler Optimization

Paper
Add Code

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

no code implementations • 19 Sep 2022 • Geraldo F. Oliveira, Juan Gómez-Luna, Saugata Ghose, Amirali Boroumand, Onur Mutlu

Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3. 0x and 3. 1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16. 7x/1. 4x for three binary NNs.

Paper
Add Code

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

1 code implementation • 1 Sep 2022 • Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, Onur Mutlu

To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads.

Paper
Code

LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

no code implementations • 22 Aug 2022 • Gagandeep Singh, Dionysios Diamantopoulos, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu

The key idea of LEAPER is to transfer an ML-based performance and resource usage model trained for a low-end edge environment to a new, high-end cloud environment to provide fast and accurate predictions for accelerator implementation.

Design Synthesis Transfer Learning

Paper
Add Code

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis

1 code implementation • 20 Jul 2022 • Can Firtina, Kamlesh Pillai, Gurpreet S. Kalsi, Bharathwaj Suresh, Damla Senol Cali, Jeremie Kim, Taha Shahroodi, Meryem Banu Cavlak, Joel Lindegger, Mohammed Alser, Juan Gómez Luna, Sreenivas Subramoney, Onur Mutlu

We identify an urgent need for a flexible, high-performance, and energy-efficient HW/SW co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs.

Multiple Sequence Alignment

Paper
Code

An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

1 code implementation • 16 Jul 2022 • Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu

Our K-Means clustering on PIM is $2. 8\times$ and $3. 2\times$ than state-of-the-art CPU and GPU versions, respectively.

Clustering regression

Paper
Code

COVIDHunter: COVID-19 pandemic wave prediction and mitigation via seasonality-aware modeling

1 code implementation • 14 Jun 2022 • Mohammed Alser, Jeremie S. Kim, Nour Almadhoun Alserr, Stefan W. Tell, Onur Mutlu

We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region, predicts COVID-19 statistics (the daily number of cases, hospitalizations, and deaths), and provides suggestions on what strength the upcoming mitigation measure should be.

Paper
Code

Machine Learning Training on a Real Processing-in-Memory System

no code implementations • 13 Jun 2022 • Juan Gómez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, Onur Mutlu

Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate machine learning training.

BIG-bench Machine Learning regression

Paper
Add Code

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

no code implementations • 29 May 2022 • Geraldo F. Oliveira, Amirali Boroumand, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging applications is processing-in-memory (PIM), where the cost of data movement to/from main memory is reduced by placing computation capabilities close to memory.

Paper
Add Code

Going From Molecules to Genomic Variations to Scientific Discovery: Intelligent Algorithms and Architectures for Intelligent Genome Analysis

1 code implementation • 16 May 2022 • Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun, Haiyu Mao, Gagandeep Singh, Juan Gomez-Luna, Onur Mutlu

We hope that these efforts and the challenges we discuss provide a foundation for future work in making genome analysis more intelligent.

Paper
Code

Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning

1 code implementation • 15 May 2022 • Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu

We introduce Sibyl, the first technique that uses reinforcement learning for data placement in hybrid storage systems.

Paper
Code

Packaging, containerization, and virtualization of computational omics methods: Advances, challenges, and opportunities

no code implementations • 30 Mar 2022 • Mohammed Alser, Sharon Waymost, Ram Ayyala, Brendan Lawlor, Richard J. Abdill, Neha Rajkumar, Nathan LaPierre, Jaqueline Brito, Andre M. Ribeiro-dos-Santos, Can Firtina, Nour Almadhoun, Varuni Sarwal, Eleazar Eskin, Qiyang Hu, Derek Strong, Byoung-Do, Kim, Malak S. Abedalthagafi, Onur Mutlu, Serghei Mangul

Omics software tools have reshaped the landscape of modern biology and become an essential component of biomedical research.

Paper
Add Code

DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

no code implementations • 17 Feb 2022 • Jisung Park, Jeoggyun Kim, Yeseong Kim, Sungjin Lee, Onur Mutlu

Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center.

Management

Paper
Add Code

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

1 code implementation • 4 Feb 2022 • Lois Orosa, Skanda Koppula, Yaman Umuroglu, Konstantinos Kanellopoulos, Juan Gomez-Luna, Michaela Blott, Kees Vissers, Onur Mutlu

We find that commonly-used low-power CNN inference accelerators based on spatial architectures are not optimized for both of these convolutional kernels.

Generative Adversarial Network Image Generation +2

Paper
Code

FastRemap: A Tool for Quickly Remapping Reads between Genome Assemblies

1 code implementation • 17 Jan 2022 • Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Can Alkan, Onur Mutlu

Also available in Bioconda at: https://anaconda. org/bioconda/fastremap-bio.

Paper
Code

GenShare: Sharing Accurate Differentially-Private Statistics for Genomic Datasets with Dependent Tuples

no code implementations • 30 Dec 2021 • Nour Almadhoun Alserr, Ozgur Ulusoy, Erman Ayday, Onur Mutlu

While sharing genomic data across researchers is an essential driver of advances in health and biomedical research, the sharing process is often infeasible due to data privacy concerns.

Privacy Preserving

Paper
Add Code

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis

1 code implementation • 16 Dec 2021 • Can Firtina, Jisung Park, Mohammed Alser, Jeremie S. Kim, Damla Senol Cali, Taha Shahroodi, Nika Mansouri Ghiasi, Gagandeep Singh, Konstantinos Kanellopoulos, Can Alkan, Onur Mutlu

We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches.

Paper
Code

Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

no code implementations • 29 Sep 2021 • Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

To understand how edge ML accelerators perform, we characterize the performance of a commercial Google Edge TPU, using 24 Google edge NN models (which span a wide range of NN model types) and analyzing each NN layer within each model.

Edge-computing Face Detection +3

Paper
Add Code

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

2 code implementations • 24 Sep 2021 • Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu

In this paper, we make a case for designing a holistic prefetch algorithm that learns to prefetch using multiple different types of program context and system-level feedback information inherent to its design.

reinforcement-learning Reinforcement Learning (RL)

103

Paper
Code

Energy-Efficient Mobile Robot Control via Run-time Monitoring of Environmental Complexity and Computing Workload

no code implementations • 8 Sep 2021 • Sherif A. S. Mohamed, Mohammad-Hashem Haghbayan, Antonio Miele, Onur Mutlu, Juha Plosila

We propose an energy-efficient controller to minimize the energy consumption of a mobile robot by dynamically manipulating the mechanical and computational actuators of the robot.

Paper
Add Code

GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping

1 code implementation • 27 Mar 2021 • Zülal Bingöl, Mohammed Alser, Onur Mutlu, Ozcan Ozturk, Can Alkan

At the last step of short read mapping, the candidate locations of the reads on the reference genome are verified to compute their differences from the corresponding reference segments using sequence alignment algorithms.

Paper
Code

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra

no code implementations • 5 Mar 2021 • Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi, Jakub Beranek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Ozdemir, Adrian Balla, Marcin Copik, Philipp Lindenberger, Pavel Kalvoda, Marek Konieczny, Onur Mutlu, Torsten Hoefler

We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms.

Benchmarking Graph Mining

Paper
Add Code

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

no code implementations • 1 Mar 2021 • Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu

We comprehensively study the characteristics of each NN layer in all of the Google edge models, and find that these shortcomings arise from the one-size-fits-all approach of the accelerator, as there is a high amount of heterogeneity in key layer characteristics both across different models and across different layers in the same model.

BIG-bench Machine Learning Edge-computing

Paper
Add Code

COVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model

1 code implementation • 6 Feb 2021 • Mohammed Alser, Jeremie S. Kim, Nour Almadhoun Alserr, Stefan W. Tell, Onur Mutlu

The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e. g., climate, temperature, humidity) and mitigation measures.

Paper
Code

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead

no code implementations • 4 Jan 2021 • Muhammad Shafique, Mahum Naseer, Theocharis Theocharides, Christos Kyrkou, Onur Mutlu, Lois Orosa, Jungwook Choi

Machine Learning (ML) techniques have been rapidly adopted by smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT) due to their powerful decision-making capabilities.

BIG-bench Machine Learning Decision Making

Paper
Add Code

SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM

no code implementations • 22 Dec 2020 • Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, João Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu

Compared to a CPU and a high-end GPU, SIMDRAM is 257x and 31x more energy-efficient, while providing 93x and 6x higher operation throughput, respectively.

Hardware Architecture Distributed, Parallel, and Cluster Computing Emerging Technologies

Paper
Add Code

Reducing Solid-State Drive Read Latency by Optimizing Read-Retry (Extended Abstract)

no code implementations • 22 Dec 2020 • Jisung Park, Myungsuk Kim, Myoungjun Chun, Lois Orosa, Jihong Kim, Onur Mutlu

Through a detailed analysis of the read mechanism and rigorous characterization of 160 real 3D NAND flash memory chips, we find new opportunities to reduce the read-retry latency by exploiting two advanced features widely adopted in modern NAND flash-based SSDs: 1) the CACHE READ command and 2) strong ECC engine.

Hardware Architecture Distributed, Parallel, and Cluster Computing

Paper
Add Code

Bit-Exact ECC Recovery (BEER): Determining DRAM On-Die ECC Functions by Exploiting DRAM Data Retention Characteristics

1 code implementation • 17 Sep 2020 • Minesh Patel, Jeremie S. Kim, Taha Shahroodi, Hasan Hassan, Onur Mutlu

As a concrete example, we introduce and evaluate BEEP, the first error profiling methodology that uses the known on-die ECC function to recover the number and bit-exact locations of unobservable raw bit errors responsible for observable post-correction errors.

Hardware Architecture

Paper
Code

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

2 code implementations • 16 Sep 2020 • Damla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, Onur Mutlu

Unfortunately, it is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data.

Hardware Architecture Genomics

Paper
Code

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

no code implementations • 4 May 2020 • Behzad Salami, Erhan Baturay Onural, Ismail Emir Yuksel, Fahrettin Koc, Oguz Ergin, Adrian Cristal Kestelman, Osman S. Unsal, Hamid Sarbazi-Azad, Onur Mutlu

We evaluate the reliability-power trade-off for such accelerators.

Image Classification Quantization

Paper
Add Code

Accelerating B-spline Interpolation on GPUs: Application to Medical Image Registration

1 code implementation • 13 Apr 2020 • Orestis Zachariadis, Andrea Teatini, Nitin Satpute, Juan Gómez-Luna, Onur Mutlu, Ole Jakob Elle, Joaquín Olivares

In this paper, we introduce a novel GPU implementation of BSI to accelerate the calculation of the deformation field in non-rigid image registration algorithms.

Distributed, Parallel, and Cluster Computing Image and Video Processing

Paper
Code

Technology dictates algorithms: Recent developments in read alignment

2 code implementations • 28 Feb 2020 • Mohammed Alser, Jeremy Rotman, Kodi Taraszka, Huwenbo Shi, Pelin Icer Baykal, Harry Taegyun Yang, Victor Xue, Sergey Knyazev, Benjamin D. Singer, Brunilda Balliu, David Koslicki, Pavel Skums, Alex Zelikovsky, Can Alkan, Onur Mutlu, Serghei Mangul

We separately discuss how longer read lengths produce unique advantages and limitations to read alignment techniques.

Paper
Code

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

1 code implementation • 18 Dec 2019 • Jeremie S. Kim, Can Firtina, Meryem Banu Cavlak, Damla Senol Cali, Mohammed Alser, Nastaran Hajinazar, Can Alkan, Onur Mutlu

There are several tools that attempt to accelerate the process of updating a read data set from one reference to another (i. e., remapping).

Paper
Code

SneakySnake: A Fast and Accurate Universal Genome Pre-Alignment Filter for CPUs, GPUs, and FPGAs

1 code implementation • 20 Oct 2019 • Mohammed Alser, Taha Shahroodi, Juan Gomez-Luna, Can Alkan, Onur Mutlu

The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout.

Paper
Code

EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM

no code implementations • 12 Oct 2019 • Skanda Koppula, Lois Orosa, Abdullah Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, Onur Mutlu

Based on this observation, we propose EDEN, a general framework that reduces DNN energy consumption and DNN evaluation latency by using approximate DRAM devices, while strictly meeting a user-specified target DNN accuracy.

Paper
Add Code

The Non-IID Data Quagmire of Decentralized Machine Learning

1 code implementation • ICML 2020 • Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons

Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem.

BIG-bench Machine Learning

Paper
Code

A Workload and Programming Ease Driven Perspective of Processing-in-Memory

no code implementations • 26 Jul 2019 • Saugata Ghose, Amirali Boroumand, Jeremie S. Kim, Juan Gómez-Luna, Onur Mutlu

First, we describe our work on systematically identifying opportunities for PIM in real applications, and quantify potential gains for popular emerging applications (e. g., machine learning, data analytics, genome analysis).

Distributed, Parallel, and Cluster Computing Hardware Architecture

Paper
Add Code

Processing Data Where It Makes Sense: Enabling In-Memory Computation

no code implementations • 10 Mar 2019 • Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, Rachata Ausavarungnirun

This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory is already a key bottleneck as applications become more data-intensive and memory bandwidth and energy do not scale well, (2) energy consumption is a key constraint in especially mobile and server systems, (3) data movement is very expensive in terms of bandwidth, energy and latency, much more so than computation.

Hardware Architecture

Paper
Add Code

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)

1 code implementation • 8 Mar 2019 • Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Onur Mutlu, Ce Zhang

Learning from the data stored in a database is an important function increasingly available in relational engines.

Quantization Retrieval

Paper
Code

Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study

7 code implementations • 20 Feb 2019 • Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, Onur Mutlu

As a result, the combined DRAM-workload behavior is often difficult to intuitively determine today, which can hinder memory optimizations in both hardware and software.

Hardware Architecture Performance

Paper
Code

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

1 code implementation • 12 Feb 2019 • Can Firtina, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A. Ercument Cicek, Can Alkan, Onur Mutlu

Our experiments with real read sets demonstrate that Apollo is the only algorithm that 1) uses reads from any sequencing technology within a single run and 2) scales well to polish large assemblies without splitting the assembly into multiple parts.

Paper
Code

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

no code implementations • 1 Feb 2018 • Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, Onur Mutlu

This requires efficient mechanisms that can provide logic in DRAM with access to CPU structures without having to communicate frequently with the CPU.

Hardware Architecture

Paper
Add Code

Focus: Querying Large Video Datasets with Low Latency and Low Cost

no code implementations • 10 Jan 2018 • Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu

Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time.

Paper
Add Code

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies

1 code implementation • 2 Nov 2017 • Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, Onur Mutlu

State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i. e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i. e., sequence alignment) to determine the origin of the read.

Paper
Code

GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

1 code implementation • 6 Apr 2016 • Mohammed Alser, Hasan Hassan, Hongyi Xin, Oğuz Ergin, Onur Mutlu, Can Alkan

The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.