Search Results for author: Sudarshan Srinivasan

Found 12 papers, 3 papers with code

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

1 code implementation24 Mar 2023 William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna

In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms.

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

1 code implementation26 Feb 2022 Maria Mahbub, Sudarshan Srinivasan, Edmon Begoli, Gregory D Peterson

We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets.

Domain Adaptation Machine Reading Comprehension +1

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

no code implementations9 Oct 2021 Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan, Tushar Krishna

Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e. g., GPU/TPU).


Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models

no code implementations24 Sep 2021 William Won, Saeed Rashidi, Sudarshan Srinivasan, Tushar Krishna

High-performance distributed training platforms should leverage multi-dimensional hierarchical networks, which interconnect accelerators through different levels of the network, to dramatically reduce expensive NICs required for the scale-out network.

K-TanH: Efficient TanH For Deep Learning

no code implementations17 Sep 2019 Abhisek Kundu, Alex Heinecke, Dhiraj Kalamkar, Sudarshan Srinivasan, Eric C. Qin, Naveen K. Mellempudi, Dipankar Das, Kunal Banerjee, Bharat Kaul, Pradeep Dubey

We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning.


Mixed Precision Training With 8-bit Floating Point

no code implementations29 May 2019 Naveen Mellempudi, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul

Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size.


Cannot find the paper you are looking for? You can Submit a new open access paper.