no code implementations • 30 Nov 2023 • Zhebin Zhang, Xinyu Zhang, Yuanhang Ren, Saijiang Shi, Meng Han, Yongkang Wu, Ruofei Lai, Zhao Cao
In this paper, we propose an Induction-Augmented Generation (IAG) framework that utilizes inductive knowledge along with the retrieved documents for implicit reasoning.
1 code implementation • 2 Oct 2023 • Zijun Wu, Yongkang Wu, Lili Mou
Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks.
1 code implementation • 10 Sep 2023 • Zijun Wu, Anup Anand Deshmukh, Yongkang Wu, Jimmy Lin, Lili Mou
Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks.
no code implementations • 14 Sep 2022 • Jiawen Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu, Zikai Guo, Zhaoye Fei, Ruofei Lai, Yongkang Wu, Zhao Cao, Zhicheng Dou
Hyperlinks, which are commonly used in Web pages, have been leveraged for designing pre-training objectives.
no code implementations • COLING 2022 • Zhaoye Fei, Yu Tian, Yongkang Wu, Xinyu Zhang, Yutao Zhu, Zheng Liu, Jiawen Wu, Dejiang Kong, Ruofei Lai, Zhao Cao, Zhicheng Dou, Xipeng Qiu
Our experiments on 13 benchmark datasets across five natural language understanding tasks demonstrate the superiority of our method.
no code implementations • 14 Oct 2021 • Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao
To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU).
no code implementations • 14 Sep 2021 • Ruizhi Pu, Xinyu Zhang, Ruofei Lai, Zikai Guo, Yinxia Zhang, Hao Jiang, Yongkang Wu, Yantao Jia, Zhicheng Dou, Zhao Cao
Finally, supervisory signal in rear compressor is computed based on condition probability and thus can control sample dynamic and further enhance the model performance.