Search Results for author: Gennady Pekhimenko

Found 27 papers, 16 papers with code

TBD: Benchmarking and Analyzing Deep Neural Network Training

no code implementations • 16 Mar 2018 • Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine).

Benchmarking General Classification +6

Paper
Add Code

Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training

no code implementations • 22 May 2018 • Bojian Zheng, Abhishek Tiwari, Nandita Vijaykumar, Gennady Pekhimenko

For each feature map recomputation to be effective and efficient, its effect on (1) the total memory footprint, and (2) the total execution time has to be carefully estimated.

Machine Translation NMT

Paper
Add Code

StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory

no code implementations • 4 Jan 2019 • Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, Felix Xiaozhu Lin

It dynamically optimizes for both the high bandwidth and limited capacity of HBM, and the limited bandwidth and high capacity of standard DRAM.

Databases

Paper
Add Code

MLSys: The New Frontier of Machine Learning Systems

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

BIG-bench Machine Learning

Paper
Add Code

Priority-based Parameter Propagation for Distributed DNN Training

1 code implementation • 10 May 2019 • Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, Gennady Pekhimenko

Data parallel training is widely used for scaling distributed deep neural network (DNN) training.

Paper
Code

BPPSA: Scaling Back-propagation by Parallel Scan Algorithm

1 code implementation • 23 Jul 2019 • Shang Wang, Yifan Bai, Gennady Pekhimenko

In an era when the performance of a single compute device plateaus, software must be designed to scale on massively parallel systems for better runtime performance.

Paper
Code

MLPerf Training Benchmark

2 code implementations • 2 Oct 2019 • Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, Matei Zaharia

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML.

Benchmarking BIG-bench Machine Learning

1,550

Paper
Code

MLPerf Inference Benchmark

4 code implementations • 6 Nov 2019 • Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, Yuchen Zhou

Machine-learning (ML) hardware and software system demand is burgeoning.

Benchmarking

1,077

Paper
Code

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training

no code implementations • 5 Jun 2020 • Hongyu Zhu, Amar Phanishayee, Gennady Pekhimenko

Modern deep neural network (DNN) training jobs use complex and heterogeneous software/hardware stacks.

Paper
Add Code

Multi-node Bert-pretraining: Cost-efficient Approach

no code implementations • 1 Aug 2020 • Jiahuang Lin, Xin Li, Gennady Pekhimenko

As a result, to train these models within a reasonable time, machine learning (ML) programmers often require advanced hardware setups such as the premium GPU-enabled NVIDIA DGX workstations or specialized accelerators such as Google's TPU Pods.

Paper
Add Code

Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training

1 code implementation • 15 Aug 2020 • Geoffrey X. Yu, Tovi Grossman, Gennady Pekhimenko

Training a state-of-the-art deep neural network (DNN) is a computationally-expensive and time-consuming process, which incentivizes deep learning developers to debug their DNNs for computational performance.

Paper
Code

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

no code implementations • 1 Sep 2020 • Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams.

Paper
Add Code

FPRaker: A Processing Element For Accelerating Neural Network Training

no code implementations • 15 Oct 2020 • Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos

We demonstrate that FPRaker can be used to compose an accelerator for training and that it can improve performance and energy efficiency compared to using conventional floating-point units under ISO-compute area constraints.

Quantization

Paper
Add Code

IOS: Inter-Operator Scheduler for CNN Acceleration

1 code implementation • 2 Nov 2020 • Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han

To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization.

186

Paper
Code

A Runtime-Based Computational Performance Predictor for Deep Neural Network Training

1 code implementation • 31 Jan 2021 • Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko

Our technique exploits the observation that, because DNN training consists of repetitive compute steps, predicting the execution time of a single iteration is usually enough to characterize the performance of an entire training process.

Paper
Code

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

2 code implementations • 3 Feb 2021 • Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko

Driven by the tremendous effort in researching novel deep learning (DL) algorithms, the training cost of developing new models increases staggeringly in recent years.

Paper
Code

RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads

1 code implementation • 8 Feb 2021 • James Gleeson, Srivatsan Krishnan, Moshe Gabel, Vijay Janapa Reddi, Eyal de Lara, Gennady Pekhimenko

Deep reinforcement learning (RL) has made groundbreaking advancements in robotics, data center management and other applications.

Management reinforcement-learning +1

Paper
Code

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

2 code implementations • NeurIPS 2021 • Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko

Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes.

Distributed Optimization Federated Learning

1,832

Paper
Code

Distributed Deep Learning in Open Collaborations

2 code implementations • NeurIPS 2021 • Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko

Modern deep learning applications require increasingly more compute to train state-of-the-art models.

Language Modelling

1,832

Paper
Code

MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation

1 code implementation • 29 Sep 2021 • Alexandros Karargyris, Renato Umeton, Micah J. Sheller, Alejandro Aristizabal, Johnu George, Srini Bala, Daniel J. Beutel, Victor Bittorf, Akshay Chaudhari, Alexander Chowdhury, Cody Coleman, Bala Desinghu, Gregory Diamos, Debo Dutta, Diane Feddema, Grigori Fursin, Junyi Guo, Xinyuan Huang, David Kanter, Satyananda Kashyap, Nicholas Lane, Indranil Mallick, Pietro Mascagni, Virendra Mehta, Vivek Natarajan, Nikola Nikolov, Nicolas Padoy, Gennady Pekhimenko, Vijay Janapa Reddi, G Anthony Reina, Pablo Ribalta, Jacob Rosenthal, Abhishek Singh, Jayaraman J. Thiagarajan, Anna Wuest, Maria Xenochristou, Daguang Xu, Poonam Yadav, Michael Rosenthal, Massimo Loda, Jason M. Johnson, Peter Mattson

Medical AI has tremendous potential to advance healthcare by supporting the evidence-based practice of medicine, personalizing patient treatment, reducing costs, and improving provider and patient experience.

Benchmarking Philosophy

Paper
Code

Optimizing Data Collection in Deep Reinforcement Learning

no code implementations • 15 Jul 2022 • James Gleeson, Daniel Snider, Yvonne Yang, Moshe Gabel, Eyal de Lara, Gennady Pekhimenko

We show that simulator kernel fusion speedups with a simple simulator are $11. 3\times$ and increase by up to $1024\times$ as simulator complexity increases in terms of memory bandwidth requirements.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

2 code implementations • 18 Oct 2022 • Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

With the proposed paradigm, we implement a deep learning compiler Hidet.

Scheduling

613

Paper
Code

Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction

1 code implementation • 19 Oct 2022 • Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko

We implement Tempo and evaluate the throughput, memory usage, and accuracy/loss on the BERT Large pre-training task.

Paper
Code

Speeding up Fourier Neural Operators via Mixed Precision

1 code implementation • 27 Jul 2023 • Colin White, Renbo Tu, Jean Kossaifi, Gennady Pekhimenko, Kamyar Azizzadenesheli, Anima Anandkumar

In this work, we (i) profile memory and runtime for FNO with full and mixed-precision training, (ii) conduct a study on the numerical stability of mixed-precision training of FNO, and (iii) devise a training routine which substantially decreases training time and memory usage (up to 34%), with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations.

1,769

Paper
Code

The Synergy of Speculative Decoding and Batching in Serving Large Language Models

no code implementations • 28 Oct 2023 • Qidong Su, Christina Giannoula, Gennady Pekhimenko

Large Language Models (LLMs) like GPT are state-of-the-art text generation models that provide significant assistance in daily routines.

Text Generation

Paper
Add Code

Minuet: Accelerating 3D Sparse Convolutions on GPUs

1 code implementation • 1 Dec 2023 • Jiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko

Minuet proposes to (i) replace the hash tables used in the Map step with a novel segmented sorting double-traversed binary search algorithm that highly utilizes the on-chip memory hierarchy of GPUs, (ii) use a lightweight scheme to autotune the tile size in the Gather and Scatter operations of the GMaS step, such that to adapt the execution to the particular characteristics of each SC layer, dataset, and GPU architecture, and (iii) employ a padding-efficient GEMM grouping approach that reduces both memory padding and kernel launching overheads.

Paper
Code

Accelerating Graph Neural Networks on Real Processing-In-Memory Systems

no code implementations • 26 Feb 2024 • Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gomez Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko

Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.