Search Results for author: Elton Zheng

Found 2 papers, 1 papers with code

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

2 code implementations30 Jun 2022 Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He

DeepSpeed Inference reduces latency by up to 7. 3X over the state-of-the-art for latency-oriented scenarios and increases throughput by over 1. 5x for throughput-oriented scenarios.

Cannot find the paper you are looking for? You can Submit a new open access paper.