1 code implementation • 28 Jan 2022 • Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick Legresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro
Next, we detail the training process, the design of our training corpus, and our data curation techniques, which we believe is a key ingredient to the success of the model.
Ranked #2 on Language Modelling on LAMBADA
no code implementations • 29 Oct 2019 • Liu Yang, Sean Treichler, Thorsten Kurth, Keno Fischer, David Barajas-Solano, Josh Romero, Valentin Churavy, Alexandre Tartakovsky, Michael Houston, Prabhat, George Karniadakis
Uncertainty quantification for forward and inverse problems is a central challenge across physical and biomedical disciplines.
3 code implementations • 3 Oct 2018 • Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Michael Houston
The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21. 0 PF/s and parallel efficiency of 79. 0%.
Distributed, Parallel, and Cluster Computing
Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x.