Search Results for author: Pradeep Dubey

Found 21 papers, 5 papers with code

Game Representations and Extensions of the Shapley Value

no code implementations • 18 Jan 2024 • Pradeep Dubey

This equitable solution turns out to be the Shapley value of the game, and thus provides as an alternative justification of the value.

Paper
Add Code

Microscaling Data Formats for Deep Learning

1 code implementation • 16 Oct 2023 • Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf, Stosic Dusan, Venmugil Elango, Maximilian Golub, Alexander Heinecke, Phil James-Roxby, Dharmesh Jani, Gaurav Kolhe, Martin Langhammer, Ada Li, Levi Melnick, Maral Mesmakhosroshahi, Andres Rodriguez, Michael Schulte, Rasoul Shafipour, Lei Shao, Michael Siu, Pradeep Dubey, Paulius Micikevicius, Maxim Naumov, Colin Verrilli, Ralph Wittig, Doug Burger, Eric Chung

Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications.

Friction

Paper
Code

AUTOSPARSE: Towards Automated Sparse Training of Deep Neural Networks

no code implementations • 14 Apr 2023 • Abhisek Kundu, Naveen K. Mellempudi, Dharma Teja Vooturi, Bharat Kaul, Pradeep Dubey

We integrated GA with the latest learnable pruning methods to create an automated sparse training algorithm called AutoSparse, which achieves better accuracy and/or training/inference FLOPS reduction than existing learnable pruning methods for sparse ResNet50 and MobileNetV1 on ImageNet-1K: AutoSparse achieves (2x, 7x) reduction in (training, inference) FLOPS for ResNet50 on ImageNet at 80% sparsity.

Paper
Add Code

FP8 Formats for Deep Learning

2 code implementations • 12 Sep 2022 • Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Alexander Heinecke, Patrick Judd, John Kamalu, Naveen Mellempudi, Stuart Oberman, Mohammad Shoeybi, Michael Siu, Hao Wu

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors.

Quantization

1,402

Paper
Code

Systolic Computing on GPUs for Productive Performance

no code implementations • 29 Oct 2020 • Hongbo Rong, Xiaochen Hao, Yun Liang, Lidong Xu, Hong H Jiang, Pradeep Dubey

We propose a language and compiler to productively build high-performance {\it software systolic arrays} that run on GPUs.

Paper
Add Code

(Updated submission 11/20/2020) MISIM: A Novel Code Similarity System

no code implementations • 28 Sep 2020 • Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy G Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

First, MISIM uses a novel context-aware semantic structure (CASS), which is designed to aid in lifting semantic meaning from code syntax.

Paper
Add Code

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure

no code implementations • 5 Jun 2020 • Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection.

Clone Detection

Paper
Add Code

Context-Aware Parse Trees

no code implementations • 24 Mar 2020 • Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Paul Petersen, Jesmin Jahan Tithi, Tim Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich

The simplified parse tree (SPT) presented in Aroma, a state-of-the-art code recommendation system, is a tree-structured representation used to infer code semantics by capturing program \emph{structure} rather than program \emph{syntax}.

Paper
Add Code

K-TanH: Efficient TanH For Deep Learning

no code implementations • 17 Sep 2019 • Abhisek Kundu, Alex Heinecke, Dhiraj Kalamkar, Sudarshan Srinivasan, Eric C. Qin, Naveen K. Mellempudi, Dipankar Das, Kunal Banerjee, Bharat Kaul, Pradeep Dubey

We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning.

Translation

Paper
Add Code

A Study of BFLOAT16 for Deep Learning Training

no code implementations • 29 May 2019 • Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey

In this paper, we discuss the flow of tensors and various key operations in mixed precision training, and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16.

Image Classification Language Modelling +3

Paper
Add Code

MLSys: The New Frontier of Machine Learning Systems

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

BIG-bench Machine Learning

Paper
Add Code

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

no code implementations • ICLR 2018 • Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, Vadim Pirogov

The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular, FP16 accumulating into FP32 Micikevicius et al. (2017).

Paper
Add Code

On Scale-out Deep Learning Training for Cloud and HPC

no code implementations • 24 Jan 2018 • Srinivas Sridharan, Karthikeyan Vaidyanathan, Dhiraj Kalamkar, Dipankar Das, Mikhail E. Smorkalov, Mikhail Shiryaev, Dheevatsa Mudigere, Naveen Mellempudi, Sasikanth Avancha, Bharat Kaul, Pradeep Dubey

The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes.

Philosophy

Paper
Add Code

Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

no code implementations • 17 Aug 2017 • Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, Pradeep Dubey

This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures.

General Classification

Paper
Add Code

Ternary Residual Networks

no code implementations • 15 Jul 2017 • Abhisek Kundu, Kunal Banerjee, Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

Aided by such an elegant trade-off between accuracy and compute, the 8-2 model (8-bit activations, ternary weights), enhanced by ternary residual edges, turns out to be sophisticated enough to achieve very high accuracy ($\sim 1\%$ drop from our FP-32 baseline), despite $\sim 1. 6\times$ reduction in model size, $\sim 26\times$ reduction in number of multiplications, and potentially $\sim 2\times$ power-performance gain comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset.

Paper
Add Code

Ternary Neural Networks with Fine-Grained Quantization

no code implementations • 2 May 2017 • Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey

We address this by fine-tuning Resnet-50 with 8-bit activations and ternary weights at $N=64$, improving the Top-1 accuracy to within $4\%$ of the full precision result with $<30\%$ additional training overhead.

Quantization

Paper
Add Code

Parallelizing Word2Vec in Multi-Core and Many-Core Architectures

1 code implementation • 18 Nov 2016 • Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey

Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words.

190

Paper
Code

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

1 code implementation • 4 Aug 2016 • Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, Pradeep Dubey

Pruning CNNs in a way that increase inference speed often imposes specific sparsity structures, thus limiting the achievable sparsity levels.

238

Paper
Code

Parallelizing Word2Vec in Shared and Distributed Memory

no code implementations • 15 Apr 2016 • Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey

In combination, these techniques allow us to scale up the computation near linearly across cores and nodes, and process hundreds of millions of words per second, which is the fastest word2vec implementation to the best of our knowledge.

Machine Translation named-entity-recognition +5

Paper
Add Code

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

no code implementations • 22 Feb 2016 • Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, Pradeep Dubey

We design and implement a distributed multinode synchronous SGD algorithm, without altering hyper parameters, or compressing data, or altering algorithmic behavior.

Paper
Add Code

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

1 code implementation • 21 Nov 2015 • Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey

One way to understand BlackOut is to view it as an extension of the DropOut strategy to the output layer, wherein we use a discriminative training loss and a weighted sampling scheme.

Language Modelling

121

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.