Search Results for author: Jiarui Fang

Found 12 papers, 6 papers with code

RedSync : Reducing Synchronization Traffic for Distributed Deep Learning

no code implementations • ICLR 2019 • Jiarui Fang, Haohuan Fu, Guangwen Yang, Cho-Jui Hsieh

Data parallelism has become a dominant method to scale Deep Neural Network (DNN) training across multiple nodes.

Paper
Add Code

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

no code implementations • 16 Mar 2019 • Jiarui Fang, Liandeng Li, Haohuan Fu, Jinlei Jiang, Wenlai Zhao, Conghui He, Xin You, Guangwen Yang

Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe.

Paper
Add Code

TurboTransformers: An Efficient GPU Serving System For Transformer Models

no code implementations • 9 Oct 2020 • Jiarui Fang, Yang Yu, Chengduo Zhao, Jie zhou

This paper designed a transformer serving system called TurboTransformers, which consists of a computing runtime and a serving framework to solve the above challenges.

Management

Paper
Add Code

PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management

1 code implementation • 12 Aug 2021 • Jiarui Fang, Zilin Zhu, Shenggui Li, Hui Su, Yang Yu, Jie zhou, Yang You

PatrickStar uses the CPU-GPU heterogeneous memory space to store the model data.

Management

737

Paper
Code

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

1 code implementation • 28 Oct 2021 • Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, Yang You

The success of Transformer models has pushed the deep learning model scale to billions of parameters.

37,775

Paper
Code

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

1 code implementation • 2 Mar 2022 • Shenggan Cheng, Xuanlei Zhao, Guangyang Lu, Jiarui Fang, Zhongming Yu, Tian Zheng, Ruidong Wu, Xiwen Zhang, Jian Peng, Yang You

In this work, we present FastFold, an efficient implementation of AlphaFold for both training and inference.

Protein Structure Prediction Translation

540

Paper
Code

A Frequency-aware Software Cache for Large Recommendation System Embeddings

1 code implementation • 8 Aug 2022 • Jiarui Fang, Geng Zhang, Jiatong Han, Shenggui Li, Zhengda Bian, Yongbin Li, Jin Liu, Yang You

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies.

Paper
Code

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

no code implementations • 6 Sep 2022 • Jiangsu Du, Ziming Liu, Jiarui Fang, Shenggui Li, Yongbin Li, Yutong Lu, Yang You

Although the AI community has expanded the model scale to the trillion parameter level, the practical deployment of 10-100 billion parameter models is still uncertain due to the latency, throughput, and memory constraints.

Blocking

Paper
Add Code

Elixir: Train a Large Language Model on a Small GPU Cluster

2 code implementations • 10 Dec 2022 • Haichen Huang, Jiarui Fang, Hongxin Liu, Shenggui Li, Yang You

To reduce GPU memory usage, memory partitioning, and memory offloading have been proposed.

Language Modelling Large Language Model

37,775

Paper
Code

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

1 code implementation • 6 Feb 2023 • Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans.

Scheduling

37,775

Paper
Code

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

no code implementations • 19 Jan 2024 • Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You

The experiments demonstrate that AutoChunk can reduce over 80\% of activation memory while maintaining speed loss within 10%, extend max sequence length by 3. 2x to 11. 7x, and outperform state-of-the-art methods by a large margin.

Code Generation

Paper
Add Code

RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining

no code implementations • ACL 2022 • Hui Su, Weiwei Shi, Xiaoyu Shen, Zhou Xiao, Tuo ji, Jiarui Fang, Jie zhou

Large-scale pretrained language models have achieved SOTA results on NLP tasks.

Contrastive Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.