Search Results for author: Shwai He

Found 13 papers, 11 papers with code

RESSA: Repair Sparse Vision-Language Models via Sparse Cross-Modality Adaptation

1 code implementation • 3 Apr 2024 • Shwai He, Tianlong Chen

Moreover, while parameter-efficient LoRA finetuning has been proposed to repair the performance of sparse models, a significant challenge of weights merging arises due to the incompatibility of dense LoRA modules with sparse models that destroy the sparsity of pruned models.

Knowledge Distillation

Paper
Code

Reformatted Alignment

1 code implementation • 19 Feb 2024 • Run-Ze Fan, Xuefeng Li, Haoyang Zou, Junlong Li, Shwai He, Ethan Chern, Jiewen Hu, PengFei Liu

This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence.

GSM8K Hallucination +2

Paper
Code

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

2 code implementations • 15 Feb 2024 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou

Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality.

Data Augmentation Instruction Following

Paper
Code

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

1 code implementation • 1 Feb 2024 • Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, Tianyi Zhou

Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process.

Language Modelling

Paper
Code

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

2 code implementations • 18 Oct 2023 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Heng Huang, Jiuxiang Gu, Tianyi Zhou

Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation.

Natural Language Understanding

Paper
Code

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

1 code implementation • 15 Oct 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao

Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e. g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility.

Computational Efficiency

Paper
Code

Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model

1 code implementation • 31 Aug 2023 • Hongtai Jing, Zhengtao Gao, Sheng Xu, Tao Shen, Zhangzhi Peng, Shwai He, Tao You, Shuang Ye, Wei Lin, Siqi Sun

Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold, and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials.

Language Modelling

Paper
Code

MerA: Merging Pretrained Adapters For Few-Shot Learning

no code implementations • 30 Aug 2023 • Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, DaCheng Tao

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks.

Few-Shot Learning MRPC

Paper
Add Code

PAD-Net: An Efficient Framework for Dynamic Networks

1 code implementation • 10 Nov 2022 • Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, DaCheng Tao

The main contributions of our work are challenging the basic commonsense in dynamic networks and proposing a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.

Image Classification

Paper
Code

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

1 code implementation • 9 Oct 2022 • Shwai He, Liang Ding, Daize Dong, Miao Zhang, DaCheng Tao

Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning.

Network Pruning

Paper
Code

Vega-MT: The JD Explore Academy Translation System for WMT22

1 code implementation • 20 Sep 2022 • Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao

As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.

Ranked #1 on Machine Translation on WMT 2022 English-Russian

Data Augmentation Machine Translation +1

Paper
Code

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

no code implementations • 5 Apr 2022 • Shwai He, Chenbo Jiang, Daize Dong, Liang Ding

Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.

Paper
Add Code

Multi-modal Attention Network for Stock Movements Prediction

1 code implementation • 27 Dec 2021 • Shwai He, Shi Gu

Traditionally, the prediction of future stock movements is based on the historical trading record.

Stock Prediction

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.