1 code implementation • 8 Nov 2022 • Wenhao Zhu, ShuJian Huang, Yunzhe Lv, Xin Zheng, Jiajun Chen
kNN-MT presents a new paradigm for domain adaptation by building an external datastore, which usually saves all target language token occurrences in the parallel corpus.
1 code implementation • 27 Feb 2023 • Wenhao Zhu, Qianfeng Zhao, Yunzhe Lv, ShuJian Huang, Siheng Zhao, Sizhe Liu, Jiajun Chen
Augmenting the base neural model with a token-level symbolic datastore is a novel generation paradigm and has achieved promising results in machine translation (MT).
1 code implementation • 15 Jan 2024 • Wenhao Zhu, ShuJian Huang, Fei Yuan, Shuaijie She, Jiajun Chen, Alexandra Birch
A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training.
2 code implementations • 10 Apr 2023 • Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen, Lei LI
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT).
2 code implementations • 9 Aug 2023 • Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen, Lei LI
We start from targeting individual languages by performing cross-lingual instruction-tuning (CoIT) on LLaMA, i. e. tuning it with translation task data and cross-lingual general task data to obtain cross-lingual models (x-LLaMAs), and formulate underlying scaling laws to investigate the advantages of using scalable translation data.
1 code implementation • 12 Jan 2024 • Shuaijie She, Wei Zou, ShuJian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen
To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language.
1 code implementation • 10 Jun 2023 • Wenhao Zhu, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen
We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters.
1 code implementation • 2 Aug 2023 • Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang
Considering that Vision-Language Pre-Training (VLP) models master massive such knowledge from large-scale web-harvested data, it is promising to utilize the generalizability of VLP models to incorporate knowledge into image descriptions.
1 code implementation • 20 Dec 2022 • Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei LI, Yu Qiao, Jingjing Xu
To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.
1 code implementation • LREC 2022 • Wenhao Zhu, ShuJian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen
Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios.
no code implementations • 16 Nov 2020 • Guannan Hu, Wu Zhang, Hu Ding, Wenhao Zhu
Catastrophic forgetting in continual learning is a common destructive phenomenon in gradient-based neural networks that learn sequential tasks, and it is much different from forgetting in humans, who can learn and accumulate knowledge throughout their whole lives.
no code implementations • 4 May 2023 • Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang
Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning.
no code implementations • 23 May 2023 • Wenhao Zhu, Tianyu Wen, Guojie Song, Liang Wang, Bo Zheng
Graph Transformer has recently received wide attention in the research community with its outstanding performance, yet its structural expressive power has not been well analyzed.
no code implementations • 21 Jun 2023 • Chao Yu, Wenhao Zhu, Chaoming Liu, XiaoYu Zhang, Qiuhong zhai
This indicates that different downstream tasks have different levels of sensitivity to sentence components.
no code implementations • 7 Oct 2023 • Zhiying Ma, Jie Hou, Wenhao Zhu, Yaxin Peng, Ying Li
It establishes a temporal iteration scheme based on physical model-driven neural networks which effectively combines deep neural networks (DNNs) with interpolation approximation of fractional derivatives.
no code implementations • 8 Apr 2024 • Qi Li, Xianjun Zeng, Shuliang Wang, Wenhao Zhu, Shijie Ruan, Zhimeng Yuan
Missing datasets, in which some objects have missing values in certain dimensions, are prevalent in the Real-world.