1 code implementation • 16 Apr 2025 • Changjiang Gao, Xu Huang, Wenhao Zhu, ShuJian Huang, Lei LI, Fei Yuan
In this paper, we explore the upper bound of harnessing multilingualism in reasoning tasks, suggesting that multilingual reasoning promises significantly (by nearly 10 Acc@$k$ points) and robustly (tolerance for variations in translation quality and language choice) higher upper bounds than English-only reasoning.
1 code implementation • 21 Feb 2025 • Wenhao Zhu, Pinzhen Chen, Hanxu Hu, ShuJian Huang, Fei Yuan, Jiajun Chen, Alexandra Birch
The focus of research into modelling long context has been on how to model position and there has been little investigation into other important aspects of language modelling such as instruction tuning.
1 code implementation • 11 Feb 2025 • Xu Huang, Wenhao Zhu, Hanxu Hu, Conghui He, Lei LI, ShuJian Huang, Fei Yuan
Previous multilingual benchmarks focus primarily on simple understanding tasks, but for large language models(LLMs), we emphasize proficiency in instruction following, reasoning, long context understanding, code generation, and so on.
1 code implementation • 24 Jan 2025 • JIA YU, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, Shasha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He
This paper introduces the open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models.
1 code implementation • 18 Sep 2024 • Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush
Broad textual understanding and in-context learning require language models that utilize full document contexts.
1 code implementation • 18 Sep 2024 • Qidan Zhu, Jing Li, Fei Yuan, Jiaojiao Fan, Quan Gan
The current bottleneck in continuous sign language recognition (CSLR) research lies in the fact that most publicly available datasets are limited to laboratory environments or television program recordings, resulting in a single background environment with uniform lighting, which significantly deviates from the diversity and complexity found in real-life scenarios.
1 code implementation • 8 Jul 2024 • Yinquan Lu, Wenhao Zhu, Lei LI, Yu Qiao, Fei Yuan
Large Language Models (LLMs) demonstrate remarkable translation capabilities in high-resource language tasks, yet their performance in low-resource languages is hindered by insufficient multilingual data during pre-training.
1 code implementation • 27 May 2024 • Zixian Huang, Wenhao Zhu, Gong Cheng, Lei LI, Fei Yuan
In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance.
no code implementations • 2 May 2024 • Wenhao Zhu, ShuJian Huang, Fei Yuan, Cheng Chen, Jiajun Chen, Alexandra Birch
Bridging the significant gap between large language model's English and non-English performance presents a great challenge.
2 code implementations • 21 Mar 2024 • Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, XiaoLi Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu
Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains.
1 code implementation • 29 Feb 2024 • Qidan Zhu, Jing Li, Fei Yuan, Quan Gan
Changes in facial expression, head movement, body movement and gesture movement are remarkable cues in sign language recognition, and most of the current continuous sign language recognition(CSLR) research methods mainly focus on static images in video sequences at the frame-level feature extraction stage, while ignoring the dynamic changes in the images.
no code implementations • 5 Feb 2024 • Fei Yuan, Chang Ma, Shuai Yuan, Qiushi Sun, Lei LI
We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning.
1 code implementation • 15 Jan 2024 • Wenhao Zhu, ShuJian Huang, Fei Yuan, Shuaijie She, Jiajun Chen, Alexandra Birch
A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training.
3 code implementations • 15 Nov 2023 • Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu
Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e. g., chemical molecular formula).
1 code implementation • 15 Nov 2023 • Fei Yuan, Shuai Yuan, Zhiyong Wu, Lei LI
Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages.
2 code implementations • 9 Aug 2023 • Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen, Lei LI
We start from targeting individual languages by performing cross-lingual instruction-tuning (CoIT) on LLaMA, i. e. tuning it with translation task data and cross-lingual general task data to obtain cross-lingual models (x-LLaMAs), and formulate underlying scaling laws to investigate the advantages of using scalable translation data.
no code implementations • 24 May 2023 • Huang Bojun, Fei Yuan
In this perspective, training of the neural network corresponds to a utility learning process.
no code implementations • 22 May 2023 • Bohong Wu, Fei Yuan, Hai Zhao, Lei LI, Jingjing Xu
Considering that encoder-based models have the advantage of efficient generation and self-correction abilities, this paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
1 code implementation • 13 Mar 2023 • Qidan Zhu, Jing Li, Fei Yuan, Quan Gan
It is then used to combine cross-resolution knowledge distillation and traditional knowledge distillation methods to form a CSLR model based on cross-resolution knowledge distillation (CRKD).
1 code implementation • 20 Dec 2022 • Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei LI, Yu Qiao, Jingjing Xu
To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.
1 code implementation • 7 Nov 2022 • Qidan Zhu, Jing Li, Fei Yuan, Quan Gan
The ultimate goal of continuous sign language recognition(CSLR) is to facilitate the communication between special people and normal people, which requires a certain degree of real-time and deploy-ability of the model.
1 code implementation • 3 Jul 2022 • Qidan Zhu, Jing Li, Fei Yuan, Quan Gan
The sparse frame-level features are fused through the features obtained by the two designed branches as the reconstructed dense frame-level feature sequence, and the connectionist temporal classification(CTC) loss is used for training and optimization after the time-series feature extraction part.
no code implementations • 8 Apr 2022 • Qidan Zhu, Jing Li, Fei Yuan, Quan Gan
The time-wise feature extraction part performs temporal feature learning by first extracting temporal receptive field features of different scales using the proposed multi-scale temporal block (MST-block) to improve the temporal modeling capability, and then further encoding the temporal features of different scales by the transformers module to obtain more accurate temporal features.
no code implementations • 13 Mar 2021 • Fei Yuan, Longtu Zhang, Huang Bojun, Yaobo Liang
In most machine learning tasks, we evaluate a model $M$ on a given data population $S$ by measuring a population-level metric $F(S;M)$.
no code implementations • 11 Dec 2020 • Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang
When multiple teacher models are available in distillation, the state-of-the-art methods assign a fixed weight to a teacher model in the whole distillation.
no code implementations • ACL 2020 • Fei Yuan, Linjun Shou, Xuanyu Bai, Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, Daxin Jiang
Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages.