1 code implementation • 20 Jan 2025 • Haoran Sun, Yekun Chai, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang
Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences, but often at the cost of reduced output diversity.
no code implementations • 7 Dec 2024 • Yilong Chen, Junyuan Shang, Zhengyu Zhang, Jiawei Sheng, Tingwen Liu, Shuohuan Wang, Yu Sun, Hua Wu, Haifeng Wang
MOHD offers a new perspective for scaling the model, showcasing the potential of hidden dimension sparsity to boost efficiency
1 code implementation • 3 Oct 2024 • Yekun Chai, Haoran Sun, Huang Fang, Shuohuan Wang, Yu Sun, Hua Wu
However, token-level RLHF suffers from the credit assignment problem over long sequences, where delayed rewards make it challenging for the model to discern which actions contributed to successful outcomes.
no code implementations • 2 Oct 2024 • Tingfeng Hui, Zhenyu Zhang, Shuohuan Wang, Yu Sun, Hua Wu, Sen Su
To ensure that each specialized expert in the MoE model works as expected, we select a small amount of seed data that each expert excels to pre-optimize the router.
2 code implementations • 7 Aug 2024 • Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, dianhai yu, Hua Wu
Large Language Models (LLMs) have ignited an innovative surge of AI applications, marking a new era of exciting possibilities equipped with extended context windows.
no code implementations • 3 Jun 2024 • Yilong Chen, Linhao Zhang, Junyuan Shang, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun
Large language models (LLMs) with billions of parameters demonstrate impressive performance.
no code implementations • 29 Apr 2024 • Tingfeng Hui, Zhenyu Zhang, Shuohuan Wang, Weiran Xu, Yu Sun, Hua Wu
Large language models (LLMs) with one or more fine-tuning phases have become a necessary step to unlock various capabilities, enabling LLMs to follow natural language instructions or align with human preferences.
1 code implementation • 16 Apr 2024 • Yekun Chai, Qingyi Liu, Jingwu Xiao, Shuohuan Wang, Yu Sun, Hua Wu
Our extensive evaluation across a wide range of benchmarks shows that incorporating both visual and textual data significantly improves the performance of pixel-based language models.
2 code implementations • 11 Apr 2024 • Yekun Chai, Qingyi Liu, Shuohuan Wang, Yu Sun, Qiwei Peng, Hua Wu
This paper presents GPTfluence, a novel approach that leverages a featurized simulation to assess the impact of training examples on the training dynamics of GPT models.
1 code implementation • 2 Oct 2023 • Lei LI, Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Ningyu Zhang, Hua Wu
We validate our approach across a wide range of domains, incorporating seven distinct external tools.
no code implementations • 9 Feb 2023 • Pengfei Zhu, Chao Pang, Yekun Chai, Lei LI, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu
In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models.
1 code implementation • 13 Dec 2022 • Yekun Chai, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu
Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation.
no code implementations • SemEval (NAACL) 2022 • Yaqian Han, Yekun Chai, Shuohuan Wang, Yu Sun, Hongyi Huang, Guanghao Chen, Yitong Xu, Yang Yang
Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios.
no code implementations • SemEval (NAACL) 2022 • Junyuan Shang, Shuohuan Wang, Yu Sun, Yanjun Yu, Yue Zhou, Li Xiang, Guixiu Yang
This paper describes our winning system on SemEval 2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts.
no code implementations • 9 Nov 2022 • Bin Shan, Yaqian Han, Weichong Yin, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance.
Ranked #1 on
Multimodal Machine Translation
on Multi30K
2 code implementations • 7 Nov 2022 • Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang, Yu Sun, Hua Wu
In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.
no code implementations • 21 Oct 2022 • Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
Derivative-free prompt learning has emerged as a lightweight alternative to prompt tuning, which only requires model inference to optimize the prompts.
1 code implementation • 19 May 2022 • Yang Xiang, Zhihua Wu, Weibao Gong, Siyu Ding, Xianjie Mo, Yuang Liu, Shuohuan Wang, Peng Liu, Yongshuai Hou, Long Li, Bin Wang, Shaohuai Shi, Yaqian Han, Yue Yu, Ge Li, Yu Sun, Yanjun Ma, dianhai yu
We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning.
3 code implementations • 23 Dec 2021 • Shuohuan Wang, Yu Sun, Yang Xiang, Zhihua Wu, Siyu Ding, Weibao Gong, Shikun Feng, Junyuan Shang, Yanbin Zhao, Chao Pang, Jiaxiang Liu, Xuyi Chen, Yuxiang Lu, Weixin Liu, Xi Wang, Yangfan Bai, Qiuliang Chen, Li Zhao, Shiyong Li, Peng Sun, dianhai yu, Yanjun Ma, Hao Tian, Hua Wu, Tian Wu, Wei Zeng, Ge Li, Wen Gao, Haifeng Wang
A unified framework named ERNIE 3. 0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters.
no code implementations • SEMEVAL 2021 • Chao Pang, Xiaoran Fan, Weiyue Su, Xuyi Chen, Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Shikun Feng, Yu Sun
This paper describes our system participated in Task 7 of SemEval-2021: Detecting and Rating Humor and Offense.
2 code implementations • 5 Jul 2021 • Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, dianhai yu, Hao Tian, Hua Wu, Haifeng Wang
We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.
2 code implementations • EMNLP 2021 • Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance.
Ranked #14 on
Zero-Shot Cross-Lingual Transfer
on XTREME
3 code implementations • ACL 2021 • Siyu Ding, Junyuan Shang, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang
Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption.
Ranked #1000000000 on
Text Classification
on IMDb
no code implementations • SEMEVAL 2020 • Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun
In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages.
no code implementations • SEMEVAL 2020 • Zhengjie Huang, Shikun Feng, Weiyue Su, Xuyi Chen, Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun
This paper describes the system designed by ERNIE Team which achieved the first place in SemEval-2020 Task 10: Emphasis Selection For Written Text in Visual Media.
no code implementations • SEMEVAL 2020 • Jiaxiang Liu, Xuyi Chen, Shikun Feng, Shuohuan Wang, Xuan Ouyang, Yu Sun, Zhengjie Huang, Weiyue Su
Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language.
3 code implementations • 29 Jul 2019 • Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang
Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural language processing.
Ranked #1 on
Chinese Sentence Pair Classification
on LCQMC Dev
Chinese Named Entity Recognition
Chinese Reading Comprehension
+8
no code implementations • SEMEVAL 2019 • Jiaxiang Liu, Shuohuan Wang, Yu Sun
This paper describes our system partici- pated in Task 9 of SemEval-2019: the task is focused on suggestion mining and it aims to classify given sentences into sug- gestion and non-suggestion classes in do- main specific and cross domain training setting respectively.
19 code implementations • 19 Apr 2019 • Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu
We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration).
Ranked #3 on
Natural Language Inference
on XNLI Chinese Dev
Chinese Named Entity Recognition
Chinese Sentence Pair Classification
+8