1 code implementation • 12 Mar 2024 • Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng
In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty.
1 code implementation • 27 Feb 2024 • Mingxu Tao, Quzhe Huang, Kun Xu, Liwei Chen, Yansong Feng, Dongyan Zhao
The advancement of Multimodal Large Language Models (MLLMs) has greatly accelerated the development of applications in understanding integrated texts and images.
no code implementations • 26 Feb 2024 • Mingxu Tao, Dongyan Zhao, Yansong Feng
Open-ended question answering requires models to find appropriate evidence to form well-reasoned, comprehensive and helpful answers.
1 code implementation • 14 Nov 2023 • Chen Zhang, Mingxu Tao, Quzhe Huang, Jiuheng Lin, Zhibin Chen, Yansong Feng
However, existing LLMs exhibit limited abilities in understanding low-resource languages, including the minority languages in China, due to a lack of training data.
1 code implementation • 24 May 2023 • Quzhe Huang, Mingxu Tao, Chen Zhang, Zhenwei An, Cong Jiang, Zhibin Chen, Zirui Wu, Yansong Feng
Specifically, we inject domain knowledge during the continual training stage and teach the model to learn professional skills using properly designed supervised fine-tuning tasks.
no code implementations • 8 May 2023 • Mingxu Tao, Yansong Feng, Dongyan Zhao
Since the embeddings of rear positions are updated fewer times than the front position embeddings, the rear ones may not be properly trained.
1 code implementation • 2 Mar 2023 • Mingxu Tao, Yansong Feng, Dongyan Zhao
Large pre-trained language models help to achieve state of the art on a variety of natural language processing (NLP) tasks, nevertheless, they still suffer from forgetting when incrementally learning a sequence of tasks.