Search Results for author: Gennady Pekhimenko

Found 20 papers, 12 papers with code

RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads

1 code implementation8 Feb 2021 James Gleeson, Srivatsan Krishnan, Moshe Gabel, Vijay Janapa Reddi, Eyal de Lara, Gennady Pekhimenko

Deep reinforcement learning (RL) has made groundbreaking advancements in robotics, data center management and other applications.


Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

1 code implementation3 Feb 2021 Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko

Driven by the tremendous effort in researching novel deep learning (DL) algorithms, the training cost of developing new models increases staggeringly in recent years.

A Runtime-Based Computational Performance Predictor for Deep Neural Network Training

1 code implementation31 Jan 2021 Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko

Our technique exploits the observation that, because DNN training consists of repetitive compute steps, predicting the execution time of a single iteration is usually enough to characterize the performance of an entire training process.

IOS: Inter-Operator Scheduler for CNN Acceleration

1 code implementation2 Nov 2020 Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han

To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization.

FPRaker: A Processing Element For Accelerating Neural Network Training

no code implementations15 Oct 2020 Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos

We demonstrate that FPRaker can be used to compose an accelerator for training and that it can improve performance and energy efficiency compared to using conventional floating-point units under ISO-compute area constraints.


TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

no code implementations1 Sep 2020 Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams.

Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training

1 code implementation15 Aug 2020 Geoffrey X. Yu, Tovi Grossman, Gennady Pekhimenko

Training a state-of-the-art deep neural network (DNN) is a computationally-expensive and time-consuming process, which incentivizes deep learning developers to debug their DNNs for computational performance.

Multi-node Bert-pretraining: Cost-efficient Approach

no code implementations1 Aug 2020 Jiahuang Lin, Xin Li, Gennady Pekhimenko

As a result, to train these models within a reasonable time, machine learning (ML) programmers often require advanced hardware setups such as the premium GPU-enabled NVIDIA DGX workstations or specialized accelerators such as Google's TPU Pods.

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training

no code implementations5 Jun 2020 Hongyu Zhu, Amar Phanishayee, Gennady Pekhimenko

Modern deep neural network (DNN) training jobs use complex and heterogeneous software/hardware stacks.

BPPSA: Scaling Back-propagation by Parallel Scan Algorithm

1 code implementation23 Jul 2019 Shang Wang, Yifan Bai, Gennady Pekhimenko

In an era when the performance of a single compute device plateaus, software must be designed to scale on massively parallel systems for better runtime performance.

Priority-based Parameter Propagation for Distributed DNN Training

1 code implementation10 May 2019 Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, Gennady Pekhimenko

Data parallel training is widely used for scaling distributed deep neural network (DNN) training.

StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory

no code implementations4 Jan 2019 Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, Felix Xiaozhu Lin

It dynamically optimizes for both the high bandwidth and limited capacity of HBM, and the limited bandwidth and high capacity of standard DRAM.


Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training

no code implementations22 May 2018 Bojian Zheng, Abhishek Tiwari, Nandita Vijaykumar, Gennady Pekhimenko

For each feature map recomputation to be effective and efficient, its effect on (1) the total memory footprint, and (2) the total execution time has to be carefully estimated.

Machine Translation

TBD: Benchmarking and Analyzing Deep Neural Network Training

no code implementations16 Mar 2018 Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

Our primary goal in this work is to break this myopic view by (i) proposing a new benchmark for DNN training, called TBD (TBD is short for Training Benchmark for DNNs), that uses a representative set of DNN models that cover a wide range of machine learning applications: image classification, machine translation, speech recognition, object detection, adversarial networks, reinforcement learning, and (ii) by performing an extensive performance analysis of training these different applications on three major deep learning frameworks (TensorFlow, MXNet, CNTK) across different hardware configurations (single-GPU, multi-GPU, and multi-machine).

General Classification Image Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.