no code implementations • 5 Sep 2020 • Drew Schmidt, Bronson Messer, M. Todd Young, Michael Matheson
The use of AI and ML for scientific applications is currently a very exciting and dynamic field.
no code implementations • 24 Sep 2019 • Nouamane Laanait, Joshua Romero, Junqi Yin, M. Todd Young, Sean Treichler, Vitalii Starchenko, Albina Borisevich, Alex Sergeev, Michael Matheson
We introduce novel communication strategies in synchronous distributed Deep Learning consisting of decentralized gradient reduction orchestration and computational graph-aware grouping of gradient tensors.
2 code implementations • 6 Nov 2018 • Drew Schmidt, Junqi Yin, Michael Matheson, Bronson Messer, Mallikarjun Shankar
The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking.
Performance
3 code implementations • 3 Oct 2018 • Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Michael Houston
The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21. 0 PF/s and parallel efficiency of 79. 0%.
Distributed, Parallel, and Cluster Computing