no code implementations • 27 May 2022 • Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis
This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs.
no code implementations • 4 Mar 2021 • Christie Alappat, Nils Meyer, Jan Laukemann, Thomas Gruber, Georg Hager, Gerhard Wellein, Tilo Wettig
We present an architectural analysis of the A64FX used in the Fujitsu FX1000 supercomputer at a level of detail that allows for the construction of Execution-Cache-Memory (ECM) performance models for steady-state loops.
Performance Distributed, Parallel, and Cluster Computing High Energy Physics - Lattice
no code implementations • 4 Mar 2021 • Ayesha Afzal, Georg Hager, Gerhard Wellein
We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns.
Distributed, Parallel, and Cluster Computing Performance
no code implementations • 30 Aug 2020 • Dimosthenis Pasadakis, Christie Louis Alappat, Olaf Schenk, Gerhard Wellein
We demonstrate the effectiveness and accuracy of our algorithm in various artificial test-cases.
1 code implementation • 15 Jul 2019 • Christie L. Alappat, Georg Hager, Olaf Schenk, Jonas Thies, Achim Basermann, Alan R. Bishop, Holger Fehske, Gerhard Wellein
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications.
Distributed, Parallel, and Cluster Computing Performance
1 code implementation • 13 Jan 2017 • Julian Hammer, Jan Eitzinger, Georg Hager, Gerhard Wellein
We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis.
Performance
1 code implementation • 29 Jul 2015 • Moritz Kreutzer, Jonas Thies, Melven Röhrig-Zöllner, Andreas Pieper, Faisal Shahzad, Martin Galgon, Achim Basermann, Holger Fehske, Georg Hager, Gerhard Wellein
Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi.
Distributed, Parallel, and Cluster Computing Mathematical Software
no code implementations • 17 Dec 2013 • Johannes Hofmann, Jan Treibig, Georg Hager, Gerhard Wellein
We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography.
1 code implementation • 23 Jul 2013 • Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, Alan R. Bishop
We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage (CRS) and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi and Nvidia Tesla K20) for a wide range of test matrices from different application areas.
Mathematical Software Distributed, Parallel, and Cluster Computing
1 code implementation • 25 Apr 2005 • Alexander Weisse, Gerhard Wellein, Andreas Alvermann, Holger Fehske
Efficient and stable algorithms for the calculation of spectral quantities and correlation functions are some of the key tools in computational condensed matter physics.
Other Condensed Matter Computational Physics