Search Results for author: Gerhard Wellein

Found 10 papers, 5 papers with code

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

no code implementations • 27 May 2022 • Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis

This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs.

Clustering General Classification

Paper
Add Code

ECM modeling and performance tuning of SpMV and Lattice QCD on A64FX

no code implementations • 4 Mar 2021 • Christie Alappat, Nils Meyer, Jan Laukemann, Thomas Gruber, Georg Hager, Gerhard Wellein, Tilo Wettig

We present an architectural analysis of the A64FX used in the Fujitsu FX1000 supercomputer at a level of detail that allows for the construction of Execution-Cache-Memory (ECM) performance models for steady-state loops.

Performance Distributed, Parallel, and Cluster Computing High Energy Physics - Lattice

Paper
Add Code

Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

no code implementations • 4 Mar 2021 • Ayesha Afzal, Georg Hager, Gerhard Wellein

We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns.

Distributed, Parallel, and Cluster Computing Performance

Paper
Add Code

Multiway $p$-spectral graph cuts on Grassmann manifolds

no code implementations • 30 Aug 2020 • Dimosthenis Pasadakis, Christie Louis Alappat, Olaf Schenk, Gerhard Wellein

We demonstrate the effectiveness and accuracy of our algorithm in various artificial test-cases.

Clustering Graph Clustering +1

Paper
Add Code

A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication

1 code implementation • 15 Jul 2019 • Christie L. Alappat, Georg Hager, Olaf Schenk, Jonas Thies, Achim Basermann, Alan R. Bishop, Holger Fehske, Gerhard Wellein

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications.

Distributed, Parallel, and Cluster Computing Performance

Paper
Code

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

1 code implementation • 13 Jan 2017 • Julian Hammer, Jan Eitzinger, Georg Hager, Gerhard Wellein

We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis.

Performance

Paper
Code

GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems

1 code implementation • 29 Jul 2015 • Moritz Kreutzer, Jonas Thies, Melven Röhrig-Zöllner, Andreas Pieper, Faisal Shahzad, Martin Galgon, Achim Basermann, Holger Fehske, Georg Hager, Gerhard Wellein

Today, such resources are available as multicore processors, graphics processing units (GPUs), and other accelerators such as the Intel Xeon Phi.

Distributed, Parallel, and Cluster Computing Mathematical Software

Paper
Code

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

no code implementations • 17 Dec 2013 • Johannes Hofmann, Jan Treibig, Georg Hager, Gerhard Wellein

We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography.

Image Reconstruction

Paper
Add Code

A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units

1 code implementation • 23 Jul 2013 • Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, Alan R. Bishop

We discuss the advantages of SELL-C-sigma compared to established formats like Compressed Row Storage (CRS) and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi and Nvidia Tesla K20) for a wide range of test matrices from different application areas.

Mathematical Software Distributed, Parallel, and Cluster Computing

Paper
Code

The Kernel Polynomial Method

1 code implementation • 25 Apr 2005 • Alexander Weisse, Gerhard Wellein, Andreas Alvermann, Holger Fehske

Efficient and stable algorithms for the calculation of spectral quantities and correlation functions are some of the key tools in computational condensed matter physics.

Other Condensed Matter Computational Physics

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.