Search Results for author: Saeed Maleki

Found 11 papers, 2 papers with code

ForestColl: Efficient Collective Communications on Heterogeneous Network Fabrics

no code implementations • 9 Feb 2024 • Liangyu Zhao, Saeed Maleki, Ziyue Yang, Hossein Pourreza, Aashaka Shah, Changho Hwang, Arvind Krishnamurthy

ForestColl also outperforms other state-of-the-art schedule generation techniques with both up to 61\% more efficient generated schedules and orders of magnitude faster schedule generation speed.

Paper
Add Code

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

no code implementations • 26 Nov 2023 • Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies.

Paper
Add Code

Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM

no code implementations • 9 Oct 2023 • Saeed Maleki

AI models are increasing in size and recent advancement in the community has shown that unlike HPC applications where double precision datatype are required, lower-precision datatypes such as fp8 or int4 are sufficient to bring the same model quality both for training and inference.

Paper
Add Code

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

no code implementations • 21 Jan 2023 • Zhiqi Lin, Youshan Miao, Guodong Liu, Xiaoxiang Shi, Quanlu Zhang, Fan Yang, Saeed Maleki, Yi Zhu, Xu Cao, Cheng Li, Mao Yang, Lintao Zhang, Lidong Zhou

SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans.

Scheduling

Paper
Add Code

Error-Covariance Analysis of Monocular Pose Estimation Using Total Least Squares

no code implementations • 21 Oct 2022 • Saeed Maleki, John Crassidis, Yang Cheng, Matthias Schmid

First, the optimization framework is formulated for the pose estimation problem with observation vectors extracted from unit vectors from the camera center-of-projection, pointing towards the image features.

Pose Estimation

Paper
Add Code

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

2 code implementations • 8 Nov 2021 • Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh

TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms.

243

Paper
Code

Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

2 code implementations • 12 May 2021 • Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi

Therefore, we present CoCoNeT, with a DSL to express a program with both computation and communication.

BIG-bench Machine Learning

243

Paper
Code

Scaling Distributed Training with Adaptive Summation

no code implementations • 4 Jun 2020 • Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum

This paper introduces a novel method to combine gradients called Adasum (for adaptive sum) that converges faster than prior work.

16k

Paper
Add Code

Distributed Training of Embeddings using Graph Analytics

no code implementations • 8 Sep 2019 • Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi

This paper presents a distributed training framework for a class of applications that use Skip-gram-like models to generate embeddings.

Graph Generation Word Embeddings

Paper
Add Code

CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs

no code implementations • 1 Oct 2018 • Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

Just as the hardware ISA interface enabled hardware advances to proceed independent of software advances in the compiler and language runtimes, HISA decouples compiler optimizations and runtimes for supporting FHE applications from advancements in the underlying FHE schemes.

Paper
Add Code

Parallel Stochastic Gradient Descent with Sound Combiners

no code implementations • 22 May 2017 • Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz

This paper proposes SYMSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.