Search Results for author: Shijie Cao

Found 9 papers, 3 papers with code

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

1 code implementation • 16 Feb 2024 • Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges.

Knowledge Distillation Quantization

Paper
Code

AFPQ: Asymmetric Floating Point Quantization for LLMs

1 code implementation • 3 Nov 2023 • Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu

Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth.

Quantization

Paper
Code

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

no code implementations • 23 Aug 2023 • Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang

To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which is able to scale its model size without proportionally scaling up its computational requirements.

Paper
Add Code

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

no code implementations • 31 May 2023 • Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu

In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the model size and inference cost of the Conformer model while preserving high recognition performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

no code implementations • 31 May 2023 • Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu

We find that previous gradient accumulation reduces activation memory but fails to be compatible with gradient memory reduction due to a contradiction between preserving gradients and releasing gradients.

Paper
Add Code

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

no code implementations • 21 May 2023 • Yijia Zhang, Lingran Zhao, Shijie Cao, WenQiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu

In this study, we conduct a comparative analysis of INT and FP quantization with the same bit-width, revealing that the optimal quantization format varies across different layers due to the complexity and diversity of tensor distribution.

Quantization

Paper
Add Code

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

2 code implementations • 29 Dec 2021 • Xiaonan Nie, Xupeng Miao, Shijie Cao, Lingxiao Ma, Qibin Liu, Jilong Xue, Youshan Miao, Yi Liu, Zhi Yang, Bin Cui

Then it diversifies the experts and continues to train the MoE with a novel Dense-to-Sparse gate (DTS-Gate).

Language Modelling Machine Translation +1

Paper
Code

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization

no code implementations • CVPR 2019 • Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang

In this paper we present a novel and general method to accelerate convolutional neural network (CNN) inference by taking advantage of feature map sparsity.

Quantization

Paper
Add Code

Balanced Sparsity for Efficient DNN Inference on GPU

no code implementations • 1 Nov 2018 • Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, Lanshun Nie

However, it requires the customization of hardwares to speed up practical inference.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.