Search Results for author: Sasikanth Avancha

Found 17 papers, 5 papers with code

DistGNN-MB: Distributed Large-Scale Graph Neural Network Training on x86 via Minibatch Sampling

no code implementations • 11 Nov 2022 • Md Vasimuddin, Ramanarayan Mohanty, Sanchit Misra, Sasikanth Avancha

DistGNN-MB trains GraphSAGE and GAT 10x and 17. 2x faster, respectively, as compute nodes scale from 2 to 32.

Paper
Add Code

DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks

no code implementations • 14 Apr 2021 • Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed, Sasikanth Avancha

Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.

graph partitioning

Paper
Add Code

Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning & HPC Workloads

2 code implementations • 12 Apr 2021 • Evangelos Georganas, Dhiraj Kalamkar, Sasikanth Avancha, Menachem Adelman, Deepti Aggarwal, Cristina Anderson, Alexander Breuer, Jeremy Bruestle, Narendra Chaudhary, Abhisek Kundu, Denise Kutnick, Frank Laub, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Brian Retford, Barukh Ziv, Alexander Heinecke

The TPP specification is platform-agnostic, thus code expressed via TPPs is portable, whereas the TPP implementation is highly-optimized and platform-specific.

799

Paper
Code

Deep Graph Library Optimizations for Intel(R) x86 Architecture

1 code implementation • 13 Jul 2020 • Sasikanth Avancha, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty

The Deep Graph Library (DGL) was designed as a tool to enable structure learning from graphs, by supporting a core abstraction for graphs, including the popular Graph Neural Networks (GNN).

121

Paper
Code

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

no code implementations • 2 Jul 2020 • Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li

Machine learning (ML) models are widely used in many important domains.

Medical Diagnosis Quantization +1

Paper
Add Code

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

1 code implementation • 2 Jun 2020 • Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul

However, given the constant emergence of new DNN architectures, creating hand optimized code is expensive, slow and is not scalable.

speech-recognition Speech Recognition +2

Paper
Code

PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

no code implementations • 6 Feb 2020 • Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul

In this paper, we develop a hybrid solution to the development of deep learning kernels that achieves the best of both worlds: the expert coded microkernels are utilized for the innermost loops of kernels and we use the advanced polyhedral technology to automatically tune the outer loops for performance.

Paper
Add Code

SEERL: Sample Efficient Ensemble Reinforcement Learning

no code implementations • 15 Jan 2020 • Rohan Saphal, Balaraman Ravindran, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved in obtaining a diverse ensemble.

Continuous Control Ensemble Learning +3

Paper
Add Code

High Performance Scalable FPGA Accelerator for Deep Neural Networks

no code implementations • 29 Aug 2019 • Sudarshan Srinivasan, Pradeep Janedula, Saurabh Dhoble, Sasikanth Avancha, Dipankar Das, Naveen Mellempudi, Bharat Daga, Martin Langhammer, Gregg Baeckler, Bharat Kaul

Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS).

Vocal Bursts Intensity Prediction

Paper
Add Code

High-Performance Deep Learning via a Single Building Block

no code implementations • 15 Jun 2019 • Evangelos Georganas, Kunal Banerjee, Dhiraj Kalamkar, Sasikanth Avancha, Anand Venkat, Michael Anderson, Greg Henry, Hans Pabst, Alexander Heinecke

Deep learning (DL) is one of the most prominent branches of machine learning.

Vocal Bursts Intensity Prediction

Paper
Add Code

A Study of BFLOAT16 for Deep Learning Training

no code implementations • 29 May 2019 • Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey

In this paper, we discuss the flow of tensors and various key operations in mixed precision training, and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16.

Image Classification Language Modelling +3

Paper
Add Code

Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures

2 code implementations • 16 Aug 2018 • Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj Kalamkar, Greg Henry, Hans Pabst, Alexander Heinecke

Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition.

Distributed, Parallel, and Cluster Computing

799

Paper
Code

Hierarchical Block Sparse Neural Networks

no code implementations • 10 Aug 2018 • Dharma Teja Vooturi, Dheevatsa Mudigere, Sasikanth Avancha

In this work, we jointly address both accuracy and performance of sparse DNNs using our proposed class of sparse neural networks called HBsNN (Hierarchical Block sparse Neural Networks).

Paper
Add Code

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

no code implementations • ICLR 2018 • Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, Vadim Pirogov

The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular, FP16 accumulating into FP32 Micikevicius et al. (2017).

Paper
Add Code

On Scale-out Deep Learning Training for Cloud and HPC

no code implementations • 24 Jan 2018 • Srinivas Sridharan, Karthikeyan Vaidyanathan, Dhiraj Kalamkar, Dipankar Das, Mikhail E. Smorkalov, Mikhail Shiryaev, Dheevatsa Mudigere, Naveen Mellempudi, Sasikanth Avancha, Bharat Kaul, Pradeep Dubey

The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes.

Philosophy

Paper
Add Code

RAIL: Risk-Averse Imitation Learning

1 code implementation • 20 Jul 2017 • Anirban Santara, Abhishek Naik, Balaraman Ravindran, Dipankar Das, Dheevatsa Mudigere, Sasikanth Avancha, Bharat Kaul

Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories.

Autonomous Driving Continuous Control +1

Paper
Code

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

no code implementations • 22 Feb 2016 • Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, Pradeep Dubey

We design and implement a distributed multinode synchronous SGD algorithm, without altering hyper parameters, or compressing data, or altering algorithmic behavior.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.