no code implementations • 9 Apr 2025 • Longguang Zhong, Fanqi Wan, ZiYi Yang, Guosheng Liang, Tianyuan Shi, Xiaojun Quan
Heterogeneous model fusion enhances the performance of LLMs by integrating the knowledge and capabilities of multiple structurally diverse models.
1 code implementation • 6 Mar 2025 • ZiYi Yang, Fanqi Wan, Longguang Zhong, Canbin Huang, Guosheng Liang, Xiaojun Quan
The FuseChat-3. 0 training pipeline consists of two key stages: (1) supervised fine-tuning (SFT) to align the target and source model distributions, and (2) Direct Preference Optimization (DPO) to apply preferences from multiple source LLMs to fine-tune the target model.
1 code implementation • 25 Feb 2025 • Shiping Gao, Fanqi Wan, Jiajian Guo, Xiaojun Quan, Qifan Wang
Instead of directly applying existing alignment techniques to SLMs, we propose to utilize a well-aligned teacher LLM to guide the alignment process for these models, thereby facilitating the transfer of the teacher's knowledge of human preferences to the student model.
4 code implementations • 4 Dec 2024 • ZiYi Yang, Fanqi Wan, Longguang Zhong, Tianyuan Shi, Xiaojun Quan
To address distributional deviations between the source and target LLMs, WRPO introduces a progressive adaptation strategy that gradually shifts reliance on preferred examples from the target LLM to the source LLMs.
3 code implementations • 15 Aug 2024 • Fanqi Wan, Longguang Zhong, ZiYi Yang, Ruijun Chen, Xiaojun Quan
In this work, we propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat.
no code implementations • 9 Aug 2024 • Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang
While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training.
no code implementations • 16 Jun 2024 • Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan
In this paper, we introduce self-evolution fine-tuning (SEFT) for policy optimization, with the aim of eliminating the need for annotated samples while retaining the stability and efficiency of SFT.
1 code implementation • 15 Jun 2024 • Longguang Zhong, Fanqi Wan, Ruijun Chen, Xiaojun Quan, Liangzhi Li
While various layer pruning methods have been developed based on this insight, they generally overlook the finer-grained redundancies within the layers themselves.
2 code implementations • 25 Feb 2024 • Fanqi Wan, ZiYi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi
Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training.
1 code implementation • 19 Jan 2024 • Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi
While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination.
3 code implementations • 19 Jan 2024 • Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi
In this paper, we introduce the notion of knowledge fusion for LLMs, aimed at combining the capabilities of existing LLMs and transferring them into a single LLM.
1 code implementation • 31 Oct 2023 • Tao Yang, Tianyuan Shi, Fanqi Wan, Xiaojun Quan, Qifan Wang, Bingzhe Wu, Jiaxiang Wu
Drawing inspiration from Psychological Questionnaires, which are carefully designed by psychologists to evaluate individual personality traits through a series of targeted items, we argue that these items can be regarded as a collection of well-structured chain-of-thought (CoT) processes.
3 code implementations • 13 Oct 2023 • Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi
Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks.
1 code implementation • 13 Oct 2023 • Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi
The results demonstrate that when combined with meta knowledge, the response generator can effectively leverage high-quality knowledge records from the retriever and enhance the quality of generated responses.
1 code implementation • 17 May 2023 • Jinghao Deng, Fanqi Wan, Tao Yang, Xiaojun Quan, Rui Wang
Contrastive learning has been widely studied in sentence representation learning.
1 code implementation • 17 May 2023 • Fanqi Wan, Weizhou Shen, Ke Yang, Xiaojun Quan, Wei Bi
Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses.