Search Results for author: Alessio Netti

Found 5 papers, 1 papers with code

Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data

no code implementations13 Oct 2020 Alessio Netti, Daniele Tafani, Michael Ott, Martin Schulz

Modern High-Performance Computing (HPC) and data center operators rely more and more on data analytics techniques to improve the efficiency and reliability of their operations.

Descriptive Time Series +1

DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems

no code implementations14 Oct 2019 Alessio Netti, Micha Mueller, Carla Guillen, Michael Ott, Daniele Tafani, Gence Ozer, Martin Schulz

However, while monitoring is a common reality in HPC, there is no well-stated and comprehensive list of requirements, nor matching frameworks, to support holistic and online ODA.

Management

Online Fault Classification in HPC Systems through Machine Learning

no code implementations26 Oct 2018 Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sirbu, Andrea Bartolini, Andrea Borghesi

As High-Performance Computing (HPC) systems strive towards the exascale goal, studies suggest that they will experience excessive failure rates.

Distributed, Parallel, and Cluster Computing

FINJ: A Fault Injection Tool for HPC Systems

1 code implementation26 Jul 2018 Alessio Netti, Zeynep Kiziltan, Ozalp Babaoglu, Alina Sirbu, Andrea Bartolini, Andrea Borghesi

We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments.

Distributed, Parallel, and Cluster Computing

Cannot find the paper you are looking for? You can Submit a new open access paper.