1 code implementation • 3 Apr 2024 • Shwai He, Tianlong Chen
Moreover, while parameter-efficient LoRA finetuning has been proposed to repair the performance of sparse models, a significant challenge of weights merging arises due to the incompatibility of dense LoRA modules with sparse models that destroy the sparsity of pruned models.
1 code implementation • 19 Feb 2024 • Run-Ze Fan, Xuefeng Li, Haoyang Zou, Junlong Li, Shwai He, Ethan Chern, Jiewen Hu, PengFei Liu
This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence.
2 code implementations • 15 Feb 2024 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou
Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality.
1 code implementation • 1 Feb 2024 • Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, Tianyi Zhou
Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process.
2 code implementations • 18 Oct 2023 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Heng Huang, Jiuxiang Gu, Tianyi Zhou
Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation.
1 code implementation • 15 Oct 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao
Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e. g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility.
1 code implementation • 31 Aug 2023 • Hongtai Jing, Zhengtao Gao, Sheng Xu, Tao Shen, Zhangzhi Peng, Shwai He, Tao You, Shuang Ye, Wei Lin, Siqi Sun
Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold, and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials.
no code implementations • 30 Aug 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao
Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks.
1 code implementation • 10 Nov 2022 • Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, DaCheng Tao
The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.
1 code implementation • 9 Oct 2022 • Shwai He, Liang Ding, Daize Dong, Miao Zhang, DaCheng Tao
Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning.
1 code implementation • 20 Sep 2022 • Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao
As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.
Ranked #1 on Machine Translation on WMT 2022 English-Russian
no code implementations • 5 Apr 2022 • Shwai He, Chenbo Jiang, Daize Dong, Liang Ding
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.
1 code implementation • 27 Dec 2021 • Shwai He, Shi Gu
Traditionally, the prediction of future stock movements is based on the historical trading record.