no code implementations • 17 Jul 2023 • Dongning Ma, Xun Jiao, Fred Lin, Mengshi Zhang, Alban Desmaison, Thomas Sellinger, Daniel Moore, Sriram Sankar
Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality.
no code implementations • 7 Dec 2022 • Ruixuan Wang, Fred Lin, Daniel Moore, Sriram Sankar, Xun Jiao
Inspired by the inherent algorithmic resilience of DL methods, this paper conducts, for the first time, a large-scale and empirical study of GNN resilience, aiming to understand the relationship between hardware faults and GNN accuracy.
no code implementations • 1 Nov 2019 • Fred Lin, Keyur Muzumdar, Nikolay Pavlovich Laptev, Mihai-Valentin Curelea, Seunghak Lee, Sriram Sankar
In this paper we present a fast dimensional analysis framework that automates the root cause analysis on structured logs with improved scalability.