Search Results for author: Saeed Maleki

Found 12 papers, 3 papers with code

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications

1 code implementation11 Apr 2025 Aashaka Shah, Abhinav Jangda, Binyang Li, Caio Rocha, Changho Hwang, Jithin Jose, Madan Musuvathi, Olli Saarikivi, Peng Cheng, Qinghua Zhou, Roshan Dathathri, Saeed Maleki, Ziyue Yang

This paper discusses an alternative communication library interface for AI applications that offers both portability and performance by reducing redundant efforts while maintaining flexibility for customization.

ForestColl: Throughput-Optimal Collective Communications on Heterogeneous Network Fabrics

no code implementations9 Feb 2024 Liangyu Zhao, Saeed Maleki, Ziyue Yang, Hossein Pourreza, Arvind Krishnamurthy

ForestColl supports any network fabrics, including both switching fabrics and direct accelerator connections.

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

no code implementations26 Nov 2023 Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies.

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

no code implementations9 Oct 2023 Saeed Maleki

AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality both for training and inference.

Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares

no code implementations21 Oct 2022 Saeed Maleki, John Crassidis, Yang Cheng, Matthias Schmid

First, the optimization framework is formulated for the pose estimation problem with observation vectors extracted from unit vectors from the camera center-of-projection, pointing towards the image features.

Pose Estimation

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

2 code implementations8 Nov 2021 Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh

TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms.

Scaling Distributed Training with Adaptive Summation

no code implementations4 Jun 2020 Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum

This paper introduces a novel method to combine gradients called Adasum (for adaptive sum) that converges faster than prior work.

16k

Distributed Training of Embeddings using Graph Analytics

no code implementations8 Sep 2019 Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi

This paper presents a distributed training framework for a class of applications that use Skip-gram-like models to generate embeddings.

Graph Generation Word Embeddings

CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs

no code implementations1 Oct 2018 Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

Just as the hardware ISA interface enabled hardware advances to proceed independent of software advances in the compiler and language runtimes, HISA decouples compiler optimizations and runtimes for supporting FHE applications from advancements in the underlying FHE schemes.

Parallel Stochastic Gradient Descent with Sound Combiners

no code implementations22 May 2017 Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD.

Cannot find the paper you are looking for? You can Submit a new open access paper.