no code implementations • 15 May 2023 • Maria Mahbub, Ian Goethert, Ioana Danciu, Kathryn Knight, Sudarshan Srinivasan, Suzanne Tamang, Karine Rozenberg-Ben-Dror, Hugo Solares, Susana Martins, Edmon Begoli, Gregory D. Peterson
Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity.
no code implementations • 11 Apr 2023 • William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Ajaya Durg, Swati Gupta, Tushar Krishna
Collective communications are an indispensable part of distributed training.
1 code implementation • 24 Mar 2023 • William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna
In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms.
1 code implementation • 26 Feb 2022 • Maria Mahbub, Sudarshan Srinivasan, Edmon Begoli, Gregory D Peterson
We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets.
no code implementations • 9 Oct 2021 • Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan, Tushar Krishna
Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e. g., GPU/TPU).
no code implementations • 24 Sep 2021 • William Won, Saeed Rashidi, Sudarshan Srinivasan, Tushar Krishna
High-performance distributed training platforms should leverage multi-dimensional hierarchical networks, which interconnect accelerators through different levels of the network, to dramatically reduce expensive NICs required for the scale-out network.
no code implementations • 23 Feb 2021 • Jeremiah Duncan, Fabian Fallas, Chris Gropp, Emily Herron, Maria Mahbub, Paula Olaya, Eduardo Ponce, Tabitha K. Samuel, Daniel Schultz, Sudarshan Srinivasan, Maofeng Tang, Viktor Zenkov, Quan Zhou, Edmon Begoli
To this end, and using those established techniques, we first developed an experimental frame-work for author detection and input perturbations.
2 code implementations • 10 May 2020 • Dhiraj Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, Alexander Heinecke
During the last two years, the goal of many researchers has been to squeeze the last bit of performance out of HPC system for AI tasks.
no code implementations • 17 Sep 2019 • Abhisek Kundu, Alex Heinecke, Dhiraj Kalamkar, Sudarshan Srinivasan, Eric C. Qin, Naveen K. Mellempudi, Dipankar Das, Kunal Banerjee, Bharat Kaul, Pradeep Dubey
We propose K-TanH, a novel, highly accurate, hardware efficient approximation of popular activation function TanH for Deep Learning.
no code implementations • 29 Aug 2019 • Sudarshan Srinivasan, Pradeep Janedula, Saurabh Dhoble, Sasikanth Avancha, Dipankar Das, Naveen Mellempudi, Bharat Daga, Martin Langhammer, Gregg Baeckler, Bharat Kaul
Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS).
no code implementations • 29 May 2019 • Naveen Mellempudi, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul
Reduced precision computation for deep neural networks is one of the key areas addressing the widening compute gap driven by an exponential growth in model size.
no code implementations • 29 May 2019 • Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey
In this paper, we discuss the flow of tensors and various key operations in mixed precision training, and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16.