no code implementations • 15 Mar 2023 • Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda
However, such distributed DL parallelism strategies require a varied mixture of collective and point-to-point communication operations across a broad range of message sizes and scales.
no code implementations • 23 Feb 2022 • Pirah Noor Soomro, Mustafa Abduljabbar, Jeronimo Castrillon, Miquel Pericàs
Shisha targets heterogeneity in compute performance and memory bandwidth and tunes the pipeline schedule through a fast online exploration technique.
1 code implementation • 27 Mar 2018 • Mustafa Abduljabbar, Mohammed Al Farhan, Noha Al-Harthi, Rui Chen, Rio Yokota, Hakan Bagci, David Keyes
With distributed memory optimizations, on the other hand, we report near-optimal efficiency in the weak scalability study with respect to both the logarithmic communication complexity as well as the theoretical scaling complexity of FMM.
Performance Computational Engineering, Finance, and Science Mathematical Software
1 code implementation • 29 May 2014 • Mustafa AbdulJabbar, Rio Yokota, David Keyes
Fast multipole methods (FMM) on distributed mem- ory have traditionally used a bulk-synchronous model of com- municating the local essential tree (LET) and overlapping it with computation of the local data.
Distributed, Parallel, and Cluster Computing 70F10 D.1.2; D.1.3; G.1.0; G.1.2
no code implementations • 19 Aug 2012 • Jing-Yan Wang, Mustafa Abduljabbar
In this paper, we propose a novel idea which engages a Multiple Kernel Learning approach into refining the graph structure that reflects the factorization of the matrix and the new data space.