no code implementations • 25 Feb 2024 • Dan Zhao, Siddharth Samsi, Joseph McDonald, Baolin Li, David Bestor, Michael Jones, Devesh Tiwari, Vijay Gadepally
In this paper, we study the aggregate effect of power-capping GPUs on GPU temperature and power draw at a research supercomputing center.
no code implementations • 26 Jan 2024 • Mark S. Veillette, James M. Kurdzo, Phillip M. Stepanian, John Y. N. Cho, Siddharth Samsi, Joseph McDonald
A number of ML baselines for tornado detection are developed and compared, including a novel deep learning (DL) architecture capable of processing raw radar imagery without the need for manual feature extraction required for existing ML algorithms.
1 code implementation • 13 Oct 2023 • Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner
Finally, a brief description of each of the new accelerators that have been added in the survey this year is included.
no code implementations • 4 Oct 2023 • Siddharth Samsi, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, Vijay Gadepally
Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art.
no code implementations • 27 Jan 2023 • Dan Zhao, Nathan C. Frey, Joseph McDonald, Matthew Hubbell, David Bestor, Michael Jones, Andrew Prout, Vijay Gadepally, Siddharth Samsi
applications, we are sure to face an ever-mounting energy footprint to sustain these computational budgets, data storage needs, and more.
no code implementations • 12 Oct 2022 • Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari
Online inference is becoming a key service product for many businesses, deployed in cloud platforms to meet customer demands.
no code implementations • 12 Sep 2022 • Matthew L. Weiss, Joseph McDonald, David Bestor, Charles Yee, Daniel Edelman, Michael Jones, Andrew Prout, Andrew Bowne, Lindsey McEvoy, Vijay Gadepally, Siddharth Samsi
Our best performing models achieve a classification accuracy greater than 95%, outperforming previous approaches to multi-channel time series classification with the MIT SuperCloud Dataset by 5%.
1 code implementation • 14 Jul 2022 • Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron, Jean Piou, Hrishikesh M. Rao, Hayley Reynolds, Kaira Samuel, Siddharth Samsi, Morgan Schmidt, Leslie Shing, Olga Simek, Brandon Swenson, Vivienne Sze, Jonathan Taylor, Paul Tylkin, Mark Veillette, Matthew L Weiss, Allan Wollaber, Sophia Yuditskaya, Jeremy Kepner
Through a series of federal initiatives and orders, the U. S. Government has been making a concerted effort to ensure American leadership in AI.
no code implementations • Findings (NAACL) 2022 • Joseph McDonald, Baolin Li, Nathan Frey, Devesh Tiwari, Vijay Gadepally, Siddharth Samsi
In particular, we focus on techniques to measure energy usage and different hardware and datacenter-oriented settings that can be tuned to reduce energy consumption for training and inference for language models.
no code implementations • 12 Apr 2022 • Benny J. Tang, Qiqi Chen, Matthew L. Weiss, Nathan Frey, Joseph McDonald, David Bestor, Charles Yee, William Arcand, Chansup Byun, Daniel Edelman, Matthew Hubbell, Michael Jones, Jeremy Kepner, Anna Klein, Adam Michaleas, Peter Michaleas, Lauren Milechin, Julia Mullen, Andrew Prout, Albert Reuther, Antonio Rosa, Andrew Bowne, Lindsey McEvoy, Baolin Li, Devesh Tiwari, Vijay Gadepally, Siddharth Samsi
We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches.
no code implementations • 28 Jan 2022 • Nathan C. Frey, Baolin Li, Joseph McDonald, Dan Zhao, Michael Jones, David Bestor, Devesh Tiwari, Vijay Gadepally, Siddharth Samsi
Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains.
no code implementations • NeurIPS Workshop AI4Scien 2021 • Nathan C. Frey, Siddharth Samsi, Bharath Ramsundar, Connor W. Coley, Vijay Gadepally
Artificial intelligence has not yet revolutionized the design of materials and molecules.
1 code implementation • NeurIPS Workshop AI4Scien 2021 • Nathan C. Frey, Siddharth Samsi, Joseph McDonald, Lin Li, Connor W. Coley, Vijay Gadepally
Deep learning in molecular and materials sciences is limited by the lack of integration between applied science, artificial intelligence, and high-performance computing.
no code implementations • 13 Nov 2021 • Matthew L. Weiss, Nathan C. Frey, Siddharth Samsi, Randy C. Paffenroth, Vijay Gadepally
Traditional frequency based projection filters, or projection operators (PO), separate signal and noise through a series of transformations which remove frequencies where noise is present.
1 code implementation • 18 Sep 2021 • Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner
Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications.
no code implementations • 4 Aug 2021 • Siddharth Samsi, Matthew L Weiss, David Bestor, Baolin Li, Michael Jones, Albert Reuther, Daniel Edelman, William Arcand, Chansup Byun, John Holodnack, Matthew Hubbell, Jeremy Kepner, Anna Klein, Joseph McDonald, Adam Michaleas, Peter Michaleas, Lauren Milechin, Julia Mullen, Charles Yee, Benjamin Price, Andrew Prout, Antonio Rosa, Allan Vanterpool, Lindsey McEvoy, Anson Cheng, Devesh Tiwari, Vijay Gadepally
In this paper we introduce the MIT Supercloud Dataset which aims to foster innovative AI/ML approaches to the analysis of large scale HPC and datacenter/cloud operations.
2 code implementations • NeurIPS 2020 • Mark Veillette, Siddharth Samsi, Chris Mattioli
To help address this problem, we introduce the Storm EVent ImagRy (SEVIR) dataset - a single, rich dataset that combines spatially and temporally aligned data from multiple sensors, along with baseline implementations of deep learning models and evaluation metrics, to accelerate new algorithmic innovations.
Ranked #6 on Weather Forecasting on SEVIR
no code implementations • 1 Sep 2020 • Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner
New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications.
no code implementations • 20 Aug 2020 • Matthew Hutchinson, Siddharth Samsi, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Micheal Houle, Matthew Hubbell, Micheal Jones, Jeremy Kepner, Andrew Kirby, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Albert Reuther, Charles Yee, Vijay Gadepally
Over the past few years, there has been significant interest in video action recognition systems and models.
no code implementations • 18 Aug 2020 • Siddharth Samsi, Michael Jones, Mark M. Veillette
In this paper we examine the compute, energy and time costs of training a UNet based deep neural network for the problem of predicting short term weather forecasts (called precipitation Nowcasting).
no code implementations • 18 Aug 2020 • Siddharth Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Charles Yee, Albert Reuther, Jeremy Kepner
The large computational requirements for training deep models have necessitated the development of new methods for faster training.
no code implementations • 14 Jul 2020 • Andrew C. Kirby, Siddharth Samsi, Michael Jones, Albert Reuther, Jeremy Kepner, Vijay Gadepally
A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs.
no code implementations • 18 Mar 2020 • Siddharth Samsi, Jeremy Kepner, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Albert Reuther, Steven Smith, William Song, Diane Staheli, Paul Monticciolo
In 2017, 2018, and 2019 many triangle counting submissions were received from a wide range of authors and organizations.
Distributed, Parallel, and Cluster Computing Performance
no code implementations • 29 Aug 2019 • Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner
Advances in multicore processors and accelerators have opened the flood gates to greater exploration and application of machine learning techniques to a variety of applications.
Performance B.8; C.4
no code implementations • 29 Aug 2019 • Tao B. Schardl, Siddharth Samsi
This work introduces TapirXLA, a replacement for TensorFlow's XLA compiler that embeds recursive fork-join parallelism into XLA's low-level representation of code.
no code implementations • 28 Aug 2019 • Siddharth Samsi, Christopher J. Mattioli, Mark S. Veillette
Effective training of Deep Neural Networks requires massive amounts of data and compute.
no code implementations • 20 Aug 2019 • Andrew Prout, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther, Jeremy Kepner
Federated authentication can drastically reduce the overhead of basic account maintenance while simultaneously improving overall system security.
Distributed, Parallel, and Cluster Computing Cryptography and Security
1 code implementation • 16 Aug 2019 • Jeffrey Liu, David Strohschein, Siddharth Samsi, Andrew Weinert
Video applications and analytics are routinely projected as a stressing and significant service of the Nationwide Public Safety Broadband Network.
no code implementations • 6 Jul 2019 • Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Michael Houle, Michael Jones, Anne Klein, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther
This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array.
no code implementations • 8 May 2019 • Vijay Gadepally, Justin Goodwin, Jeremy Kepner, Albert Reuther, Hayley Reynolds, Siddharth Samsi, Jonathan Su, David Martinez
Artificial Intelligence (AI) has the opportunity to revolutionize the way the United States Department of Defense (DoD) and Intelligence Community (IC) address the challenges of evolving threats, data deluge, and rapid courses of action.
no code implementations • 3 Feb 2019 • Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Micheal Houle, Micheal Jones, Anne Klein, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther
Streaming updates to a large associative array requires a hierarchical implementation to optimize the performance of the memory hierarchy.
Databases Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Networking and Internet Architecture
no code implementations • 23 Aug 2017 • Siddharth Samsi, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Paul Monticciolo, Albert Reuther, Steven Smith, William Song, Diane Staheli, Jeremy Kepner
The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems.
Distributed, Parallel, and Cluster Computing Data Structures and Algorithms
no code implementations • 12 Jul 2017 • Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther
Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities.
Performance Instrumentation and Methods for Astrophysics Distributed, Parallel, and Cluster Computing Computational Physics