Search Results for author: Jiarui Fang

Found 11 papers, 5 papers with code

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

1 code implementation6 Feb 2023 Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans.

Scheduling

Elixir: Train a Large Language Model on a Small GPU Cluster

no code implementations10 Dec 2022 Haichen Huang, Jiarui Fang, Hongxin Liu, Shenggui Li, Yang You

People who are inaccessible to a large number of GPUs resort to heterogeneous training systems for storing model parameters in CPU memory.

Language Modelling

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

no code implementations6 Sep 2022 Jiangsu Du, Ziming Liu, Jiarui Fang, Shenggui Li, Yongbin Li, Yutong Lu, Yang You

Although the AI community has expanded the model scale to the trillion parameter level, the practical deployment of 10-100 billion parameter models is still uncertain due to the latency, throughput, and memory constraints.

A Frequency-aware Software Cache for Large Recommendation System Embeddings

1 code implementation8 Aug 2022 Jiarui Fang, Geng Zhang, Jiatong Han, Shenggui Li, Zhengda Bian, Yongbin Li, Jin Liu, Yang You

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies.

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

1 code implementation28 Oct 2021 Shenggui Li, Jiarui Fang, Zhengda Bian, Hongxin Liu, Yuliang Liu, Haichen Huang, Boxiang Wang, Yang You

The success of Transformer models has pushed the deep learning model scale to billions of parameters.

TurboTransformers: An Efficient GPU Serving System For Transformer Models

no code implementations9 Oct 2020 Jiarui Fang, Yang Yu, Chengduo Zhao, Jie zhou

This paper designed a transformer serving system called TurboTransformers, which consists of a computing runtime and a serving framework to solve the above challenges.

Management

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

no code implementations16 Mar 2019 Jiarui Fang, Liandeng Li, Haohuan Fu, Jinlei Jiang, Wenlai Zhao, Conghui He, Xin You, Guangwen Yang

Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe.

Cannot find the paper you are looking for? You can Submit a new open access paper.