Search Results for author: Shaoguang Yan

Found 1 papers, 1 papers with code

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

1 code implementation26 Jun 2023 Junyan Li, Li Lyna Zhang, Jiahang Xu, Yujing Wang, Shaoguang Yan, Yunqing Xia, Yuqing Yang, Ting Cao, Hao Sun, Weiwei Deng, Qi Zhang, Mao Yang

Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length.

Model Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.