1 code implementation • 15 Jul 2024 • Wenhao Zhu, Sizhe Liu, ShuJian Huang, Shuaijie She, Chris Wendler, Jiajun Chen
Decoding by contrasting layers (DoLa), is designed to improve the generation quality of large language models (LLMs) by contrasting the prediction probabilities between an early exit output (amateur logits) and the final output (expert logits).
1 code implementation • 8 Jul 2024 • Yinquan Lu, Wenhao Zhu, Lei LI, Yu Qiao, Fei Yuan
Large Language Models (LLMs) demonstrate remarkable translation capabilities in high-resource language tasks, yet their performance in low-resource languages is hindered by insufficient multilingual data during pre-training.
1 code implementation • 27 May 2024 • Zixian Huang, Wenhao Zhu, Gong Cheng, Lei LI, Fei Yuan
In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance.
2 code implementations • 22 May 2024 • Shimao Zhang, Changjiang Gao, Wenhao Zhu, Jiajun Chen, Xin Huang, Xue Han, Junlan Feng, Chao Deng, ShuJian Huang
Recently, Large Language Models (LLMs) have shown impressive language capabilities.
no code implementations • 6 May 2024 • Wenhao Zhu, Guojie Song, Liang Wang, Shaoguo Liu
Graph Transformers (GTs) have significantly advanced the field of graph representation learning by overcoming the limitations of message-passing graph neural networks (GNNs) and demonstrating promising performance and expressive power.
no code implementations • 2 May 2024 • Wenhao Zhu, ShuJian Huang, Fei Yuan, Cheng Chen, Jiajun Chen, Alexandra Birch
Bridging the significant gap between large language model's English and non-English performance presents a great challenge.
no code implementations • 8 Apr 2024 • Qi Li, Xianjun Zeng, Shuliang Wang, Wenhao Zhu, Shijie Ruan, Zhimeng Yuan
Missing datasets, in which some objects have missing values in certain dimensions, are prevalent in the Real-world.
1 code implementation • 15 Jan 2024 • Wenhao Zhu, ShuJian Huang, Fei Yuan, Shuaijie She, Jiajun Chen, Alexandra Birch
A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training.
1 code implementation • 12 Jan 2024 • Shuaijie She, Wei Zou, ShuJian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen
To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language.
no code implementations • 7 Oct 2023 • Zhiying Ma, Jie Hou, Wenhao Zhu, Yaxin Peng, Ying Li
It establishes a temporal iteration scheme based on physical model-driven neural networks which effectively combines deep neural networks (DNNs) with interpolation approximation of fractional derivatives.
2 code implementations • 9 Aug 2023 • Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen, Lei LI
We start from targeting individual languages by performing cross-lingual instruction-tuning (CoIT) on LLaMA, i. e. tuning it with translation task data and cross-lingual general task data to obtain cross-lingual models (x-LLaMAs), and formulate underlying scaling laws to investigate the advantages of using scalable translation data.
1 code implementation • 2 Aug 2023 • Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang
Considering that Vision-Language Pre-Training (VLP) models master massive such knowledge from large-scale web-harvested data, it is promising to utilize the generalizability of VLP models to incorporate knowledge into image descriptions.
no code implementations • 21 Jun 2023 • Chao Yu, Wenhao Zhu, Chaoming Liu, XiaoYu Zhang, Qiuhong zhai
This indicates that different downstream tasks have different levels of sensitivity to sentence components.
1 code implementation • 10 Jun 2023 • Wenhao Zhu, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen
We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters.
no code implementations • 23 May 2023 • Wenhao Zhu, Tianyu Wen, Guojie Song, Liang Wang, Bo Zheng
Graph Transformer has recently received wide attention in the research community with its outstanding performance, yet its structural expressive power has not been well analyzed.
no code implementations • 4 May 2023 • Wenhao Zhu, Tianyu Wen, Guojie Song, Xiaojun Ma, Liang Wang
Graph Transformer is gaining increasing attention in the field of machine learning and has demonstrated state-of-the-art performance on benchmarks for graph representation learning.
2 code implementations • 10 Apr 2023 • Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen, Lei LI
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT).
1 code implementation • 27 Feb 2023 • Wenhao Zhu, Qianfeng Zhao, Yunzhe Lv, ShuJian Huang, Siheng Zhao, Sizhe Liu, Jiajun Chen
Augmenting the base neural model with a token-level symbolic datastore is a novel generation paradigm and has achieved promising results in machine translation (MT).
1 code implementation • 20 Dec 2022 • Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei LI, Yu Qiao, Jingjing Xu
To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.
1 code implementation • 8 Nov 2022 • Wenhao Zhu, ShuJian Huang, Yunzhe Lv, Xin Zheng, Jiajun Chen
kNN-MT presents a new paradigm for domain adaptation by building an external datastore, which usually saves all target language token occurrences in the parallel corpus.
1 code implementation • LREC 2022 • Wenhao Zhu, ShuJian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu, Wei Chen, Yanfeng Wang, Jiajun Chen
Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios.
no code implementations • 16 Nov 2020 • Guannan Hu, Wu Zhang, Hu Ding, Wenhao Zhu
Catastrophic forgetting in continual learning is a common destructive phenomenon in gradient-based neural networks that learn sequential tasks, and it is much different from forgetting in humans, who can learn and accumulate knowledge throughout their whole lives.