1 code implementation • 11 Apr 2025 • Aashaka Shah, Abhinav Jangda, Binyang Li, Caio Rocha, Changho Hwang, Jithin Jose, Madan Musuvathi, Olli Saarikivi, Peng Cheng, Qinghua Zhou, Roshan Dathathri, Saeed Maleki, Ziyue Yang
This paper discusses an alternative communication library interface for AI applications that offers both portability and performance by reducing redundant efforts while maintaining flexibility for customization.
no code implementations • 9 Feb 2024 • Liangyu Zhao, Saeed Maleki, Ziyue Yang, Hossein Pourreza, Arvind Krishnamurthy
ForestColl supports any network fabrics, including both switching fabrics and direct accelerator connections.
no code implementations • 26 Nov 2023 • Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang
This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies.
no code implementations • 9 Oct 2023 • Saeed Maleki
AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality both for training and inference.
no code implementations • 21 Jan 2023 • Zhiqi Lin, Youshan Miao, Guodong Liu, Xiaoxiang Shi, Quanlu Zhang, Fan Yang, Saeed Maleki, Yi Zhu, Xu Cao, Cheng Li, Mao Yang, Lintao Zhang, Lidong Zhou
SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans.
no code implementations • 21 Oct 2022 • Saeed Maleki, John Crassidis, Yang Cheng, Matthias Schmid
First, the optimization framework is formulated for the pose estimation problem with observation vectors extracted from unit vectors from the camera center-of-projection, pointing towards the image features.
2 code implementations • 8 Nov 2021 • Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh
TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms.
2 code implementations • 12 May 2021 • Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi
Therefore, we present CoCoNeT, with a DSL to express a program with both computation and communication.
no code implementations • 4 Jun 2020 • Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum
This paper introduces a novel method to combine gradients called Adasum (for adaptive sum) that converges faster than prior work.
no code implementations • 8 Sep 2019 • Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi
This paper presents a distributed training framework for a class of applications that use Skip-gram-like models to generate embeddings.
no code implementations • 1 Oct 2018 • Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz
Just as the hardware ISA interface enabled hardware advances to proceed independent of software advances in the compiler and language runtimes, HISA decouples compiler optimizations and runtimes for supporting FHE applications from advancements in the underlying FHE schemes.
no code implementations • 22 May 2017 • Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz
This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD.