2 code implementations • 8 Nov 2023 • Rocktim Jyoti Das, MingJie Sun, Liqun Ma, Zhiqiang Shen
GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the pruning metric, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks.
no code implementations • 19 Sep 2023 • Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing
This paper aims to understand the impacts of various data combinations (e. g., web text, wikipedia, github, books) on the training of large language models using SlimPajama.
2 code implementations • EMNLP 2021 • Haitao Lin, Liqun Ma, Junnan Zhu, Lu Xiang, Yu Zhou, Jiajun Zhang, Chengqing Zong
Therefore, in this paper, we introduce a novel Chinese dataset for Customer Service Dialogue Summarization (CSDS).
no code implementations • 16 Aug 2019 • Jianquan Li, Xiaokang Liu, Wenpeng Yin, Min Yang, Liqun Ma, Yaohong Jin
Multi-Task Learning (MTL) aims at boosting the overall performance of each individual task by leveraging useful information contained in multiple related tasks.