Search Results for author: Junqi Yin

Found 18 papers, 4 papers with code

Pretraining Billion-scale Geospatial Foundational Models on Frontier

no code implementations17 Apr 2024 Aristeidis Tsaris, Philipe Ambrozio Dias, Abhishek Potnis, Junqi Yin, Feiyi Wang, Dalton Lunga

Although large FMs have demonstrated significant impact in natural language processing and computer vision, efforts toward FMs for geospatial applications have been restricted to smaller size models, as pretraining larger models requires very large computing resources equipped with state-of-the-art hardware accelerators.

The Case for Co-Designing Model Architectures with Hardware

1 code implementation25 Jan 2024 Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models.

Optimizing Distributed Training on Frontier for Large Language Models

no code implementations20 Dec 2023 Sajal Dash, Isaac Lyngaas, Junqi Yin, Xiao Wang, Romain Egele, Guojing Cong, Feiyi Wang, Prasanna Balaprakash

For the training of the 175 Billion parameter model and the 1 Trillion parameter model, we achieved $100\%$ weak scaling efficiency on 1024 and 3072 MI250X GPUs, respectively.

Computational Efficiency

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

no code implementations6 Oct 2023 Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael Irvin, J. Gregory Pauloski, Logan Ward, Valerie Hayot, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian Foster, James J. Davis, Michael E. Papka, Thomas Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi Hanson, Thomas E Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin Aji, Angela Dalton, Michael Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens

In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences.

Atomic structure generation from reconstructing structural fingerprints

1 code implementation27 Jul 2022 Victor Fung, Shuyi Jia, Jiaxin Zhang, Sirui Bi, Junqi Yin, P. Ganesh

These methods would help identify or, in the case of generative models, even create novel crystal structures of materials with a set of specified functional properties to then be synthesized or isolated in the laboratory.

BIG-bench Machine Learning valid

Stable Parallel Training of Wasserstein Conditional Generative Adversarial Neural Networks

no code implementations25 Jul 2022 Massimiliano Lupo Pasini, Junqi Yin

We propose a stable, parallel approach to train Wasserstein Conditional Generative Adversarial Neural Networks (W-CGANs) under the constraint of a fixed computational budget.

Stable Anderson Acceleration for Deep Learning

1 code implementation26 Oct 2021 Massimiliano Lupo Pasini, Junqi Yin, Viktor Reshniak, Miroslav Stoyanov

Anderson acceleration (AA) is an extrapolation technique designed to speed-up fixed-point iterations like those arising from the iterative training of DL models.

Image Classification

Neural network based order parameter for phase transitions and its applications in high-entropy alloys

no code implementations12 Sep 2021 Junqi Yin, Zongrui Pei, Michael Gao

We propose that the Manhattan distance in the VAE latent space can serve as a generic order parameter for order-disorder phase transitions.

Scalable Balanced Training of Conditional Generative Adversarial Neural Networks on Image Data

no code implementations21 Feb 2021 Massimiliano Lupo Pasini, Vittorio Gabbi, Junqi Yin, Simona Perotto, Nouamane Laanait

We propose a distributed approach to train deep convolutional generative adversarial neural network (DC-CGANs) models.

Data optimization for large batch distributed training of deep neural networks

no code implementations16 Dec 2020 Shubhankar Gahlot, Junqi Yin, Mallikarjun Shankar

Distributed training in deep learning (DL) is common practice as data and models grow.

Distributed Training and Optimization Of Neural Networks

no code implementations3 Dec 2020 Jean-Roch Vlimant, Junqi Yin

Deep learning models are yielding increasingly better performances thanks to multiple factors.

Integrating Deep Learning in Domain Sciences at Exascale

no code implementations23 Nov 2020 Rick Archibald, Edmond Chow, Eduardo D'Azevedo, Jack Dongarra, Markus Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai Wong, Junqi Yin

This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e. g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and its modular memory management, MPI, CuBLAS, CuDNN, MKL, and HIP.

Management

Exascale Deep Learning for Scientific Inverse Problems

no code implementations24 Sep 2019 Nouamane Laanait, Joshua Romero, Junqi Yin, M. Todd Young, Sean Treichler, Vitalii Starchenko, Albina Borisevich, Alex Sergeev, Michael Matheson

We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors.

Materials Imaging

Robust data-driven approach for predicting the configurational energy of high entropy alloys

no code implementations10 Aug 2019 Jiaxin Zhang, Xianglin Liu, Sirui Bi, Junqi Yin, Guannan Zhang, Markus Eisenbach

In this study, a robust data-driven framework based on Bayesian approaches is proposed and demonstrated on the accurate and efficient prediction of configurational energy of high entropy alloys.

feature selection Small Data Image Classification

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

2 code implementations6 Nov 2018 Drew Schmidt, Junqi Yin, Michael Matheson, Bronson Messer, Mallikarjun Shankar

The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking.

Performance

Cannot find the paper you are looking for? You can Submit a new open access paper.