Search Results for author: Saeed Maleki

Found 11 papers, 2 papers with code

ForestColl: Efficient Collective Communications on Heterogeneous Network Fabrics

no code implementations9 Feb 2024 Liangyu Zhao, Saeed Maleki, Ziyue Yang, Hossein Pourreza, Aashaka Shah, Changho Hwang, Arvind Krishnamurthy

ForestColl also outperforms other state-of-the-art schedule generation techniques with both up to 61\% more efficient generated schedules and orders of magnitude faster schedule generation speed.

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

no code implementations26 Nov 2023 Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies.

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

no code implementations9 Oct 2023 Saeed Maleki

AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality both for training and inference.

Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares

no code implementations21 Oct 2022 Saeed Maleki, John Crassidis, Yang Cheng, Matthias Schmid

First, the optimization framework is formulated for the pose estimation problem with observation vectors extracted from unit vectors from the camera center-of-projection, pointing towards the image features.

Pose Estimation

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

2 code implementations8 Nov 2021 Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh

TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms.

Scaling Distributed Training with Adaptive Summation

no code implementations4 Jun 2020 Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum

This paper introduces a novel method to combine gradients called Adasum (for adaptive sum) that converges faster than prior work.

16k

Distributed Training of Embeddings using Graph Analytics

no code implementations8 Sep 2019 Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi

This paper presents a distributed training framework for a class of applications that use Skip-gram-like models to generate embeddings.

Graph Generation Word Embeddings

CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs

no code implementations1 Oct 2018 Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

Just as the hardware ISA interface enabled hardware advances to proceed independent of software advances in the compiler and language runtimes, HISA decouples compiler optimizations and runtimes for supporting FHE applications from advancements in the underlying FHE schemes.

Parallel Stochastic Gradient Descent with Sound Combiners

no code implementations22 May 2017 Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD.

Cannot find the paper you are looking for? You can Submit a new open access paper.