Search Results for author: Siddharth Samsi

Found 33 papers, 5 papers with code

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale

no code implementations25 Feb 2024 Dan Zhao, Siddharth Samsi, Joseph McDonald, Baolin Li, David Bestor, Michael Jones, Devesh Tiwari, Vijay Gadepally

In this paper, we study the aggregate effect of power-capping GPUs on GPU temperature and power draw at a research supercomputing center.

A Benchmark Dataset for Tornado Detection and Prediction using Full-Resolution Polarimetric Weather Radar Data

no code implementations26 Jan 2024 Mark S. Veillette, James M. Kurdzo, Phillip M. Stepanian, John Y. N. Cho, Siddharth Samsi, Joseph McDonald

A number of ML baselines for tornado detection are developed and compared, including a novel deep learning (DL) architecture capable of processing raw radar imagery without the need for manual feature extraction required for existing ML algorithms.

Feature Engineering

Lincoln AI Computing Survey (LAICS) Update

1 code implementation13 Oct 2023 Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

Finally, a brief description of each of the new accelerators that have been added in the survey this year is included.

A Green(er) World for A.I

no code implementations27 Jan 2023 Dan Zhao, Nathan C. Frey, Joseph McDonald, Matthew Hubbell, David Bestor, Michael Jones, Andrew Prout, Vijay Gadepally, Siddharth Samsi

applications, we are sure to face an ever-mounting energy footprint to sustain these computational budgets, data storage needs, and more.

KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources

no code implementations12 Oct 2022 Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari

Online inference is becoming a key service product for many businesses, deployed in cloud platforms to meet customer demands.

An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine Learning

no code implementations12 Sep 2022 Matthew L. Weiss, Joseph McDonald, David Bestor, Charles Yee, Daniel Edelman, Michael Jones, Andrew Prout, Andrew Bowne, Lindsey McEvoy, Vijay Gadepally, Siddharth Samsi

Our best performing models achieve a classification accuracy greater than 95%, outperforming previous approaches to multi-channel time series classification with the MIT SuperCloud Dataset by 5%.

Classification Time Series +2

Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models

no code implementations Findings (NAACL) 2022 Joseph McDonald, Baolin Li, Nathan Frey, Devesh Tiwari, Vijay Gadepally, Siddharth Samsi

In particular, we focus on techniques to measure energy usage and different hardware and datacenter-oriented settings that can be tuned to reduce energy consumption for training and inference for language models.

Cloud Computing Language Modelling

Benchmarking Resource Usage for Efficient Distributed Deep Learning

no code implementations28 Jan 2022 Nathan C. Frey, Baolin Li, Joseph McDonald, Dan Zhao, Michael Jones, David Bestor, Devesh Tiwari, Vijay Gadepally, Siddharth Samsi

Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains.

Benchmarking

Scalable Geometric Deep Learning on Molecular Graphs

1 code implementation NeurIPS Workshop AI4Scien 2021 Nathan C. Frey, Siddharth Samsi, Joseph McDonald, Lin Li, Connor W. Coley, Vijay Gadepally

Deep learning in molecular and materials sciences is limited by the lack of integration between applied science, artificial intelligence, and high-performance computing.

The Pseudo Projection Operator: Applications of Deep Learning to Projection Based Filtering in Non-Trivial Frequency Regimes

no code implementations13 Nov 2021 Matthew L. Weiss, Nathan C. Frey, Siddharth Samsi, Randy C. Paffenroth, Vijay Gadepally

Traditional frequency based projection filters, or projection operators (PO), separate signal and noise through a series of transformations which remove frequencies where noise is present.

Denoising

AI Accelerator Survey and Trends

1 code implementation18 Sep 2021 Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

Over the past several years, new machine learning accelerators were being announced and released every month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications.

Benchmarking Computational Efficiency +4

SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology

2 code implementations NeurIPS 2020 Mark Veillette, Siddharth Samsi, Chris Mattioli

To help address this problem, we introduce the Storm EVent ImagRy (SEVIR) dataset - a single, rich dataset that combines spatially and temporally aligned data from multiple sensors, along with baseline implementations of deep learning models and evaluation metrics, to accelerate new algorithmic innovations.

Descriptive Weather Forecasting

Survey of Machine Learning Accelerators

no code implementations1 Sep 2020 Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications.

BIG-bench Machine Learning object-detection +3

Compute, Time and Energy Characterization of Encoder-Decoder Networks with Automatic Mixed Precision Training

no code implementations18 Aug 2020 Siddharth Samsi, Michael Jones, Mark M. Veillette

In this paper we examine the compute, energy and time costs of training a UNet based deep neural network for the problem of predicting short term weather forecasts (called precipitation Nowcasting).

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

no code implementations14 Jul 2020 Andrew C. Kirby, Siddharth Samsi, Michael Jones, Albert Reuther, Jeremy Kepner, Vijay Gadepally

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs.

GraphChallenge.org Triangle Counting Performance

no code implementations18 Mar 2020 Siddharth Samsi, Jeremy Kepner, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Albert Reuther, Steven Smith, William Song, Diane Staheli, Paul Monticciolo

In 2017, 2018, and 2019 many triangle counting submissions were received from a wide range of authors and organizations.

Distributed, Parallel, and Cluster Computing Performance

Survey and Benchmarking of Machine Learning Accelerators

no code implementations29 Aug 2019 Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, Jeremy Kepner

Advances in multicore processors and accelerators have opened the flood gates to greater exploration and application of machine learning techniques to a variety of applications.

Performance B.8; C.4

TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir

no code implementations29 Aug 2019 Tao B. Schardl, Siddharth Samsi

This work introduces TapirXLA, a replacement for TensorFlow's XLA compiler that embeds recursive fork-join parallelism into XLA's low-level representation of code.

BIG-bench Machine Learning

Distributed Deep Learning for Precipitation Nowcasting

no code implementations28 Aug 2019 Siddharth Samsi, Christopher J. Mattioli, Mark S. Veillette

Effective training of Deep Neural Networks requires massive amounts of data and compute.

Securing HPC using Federated Authentication

no code implementations20 Aug 2019 Andrew Prout, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther, Jeremy Kepner

Federated authentication can drastically reduce the overhead of basic account maintenance while simultaneously improving overall system security.

Distributed, Parallel, and Cluster Computing Cryptography and Security

Large Scale Organization and Inference of an Imagery Dataset for Public Safety

no code implementations16 Aug 2019 Jeffrey Liu, David Strohschein, Siddharth Samsi, Andrew Weinert

Video applications and analytics are routinely projected as a stressing and significant service of the Nationwide Public Safety Broadband Network.

Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

no code implementations6 Jul 2019 Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Michael Houle, Michael Jones, Anne Klein, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array.

AI Enabling Technologies: A Survey

no code implementations8 May 2019 Vijay Gadepally, Justin Goodwin, Jeremy Kepner, Albert Reuther, Hayley Reynolds, Siddharth Samsi, Jonathan Su, David Martinez

Artificial Intelligence (AI) has the opportunity to revolutionize the way the United States Department of Defense (DoD) and Intelligence Community (IC) address the challenges of evolving threats, data deluge, and rapid courses of action.

A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M Databases

no code implementations3 Feb 2019 Jeremy Kepner, Vijay Gadepally, Lauren Milechin, Siddharth Samsi, William Arcand, David Bestor, William Bergeron, Chansup Byun, Matthew Hubbell, Micheal Houle, Micheal Jones, Anne Klein, Peter Michaleas, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Albert Reuther

Streaming updates to a large associative array requires a hierarchical implementation to optimize the performance of the memory hierarchy.

Databases Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Networking and Internet Architecture

Static Graph Challenge: Subgraph Isomorphism

no code implementations23 Aug 2017 Siddharth Samsi, Vijay Gadepally, Michael Hurley, Michael Jones, Edward Kao, Sanjeev Mohindra, Paul Monticciolo, Albert Reuther, Steven Smith, William Song, Diane Staheli, Jeremy Kepner

The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems.

Distributed, Parallel, and Cluster Computing Data Structures and Algorithms

Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

no code implementations12 Jul 2017 Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert Reuther

Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities.

Performance Instrumentation and Methods for Astrophysics Distributed, Parallel, and Cluster Computing Computational Physics

Cannot find the paper you are looking for? You can Submit a new open access paper.