1 code implementation • Findings (NAACL) 2022 • Huan Lin, Baosong Yang, Liang Yao, Dayiheng Liu, Haibo Zhang, Jun Xie, Min Zhang, Jinsong Su
Diverse NMT aims at generating multiple diverse yet faithful translations given a source sentence.
no code implementations • WMT (EMNLP) 2021 • Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao
After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model with massive synthetic data pairs, and 3) fine-tuning the model with data denoising strategy.
no code implementations • Findings (ACL) 2022 • Kexin Yang, Dayiheng Liu, Wenqiang Lei, Baosong Yang, Haibo Zhang, Xue Zhao, Wenqing Yao, Boxing Chen
Under GCPG, we reconstruct commonly adopted lexical condition (i. e., Keywords) and syntactical conditions (i. e., Part-Of-Speech sequence, Constituent Tree, Masked Template and Sentential Exemplar) and study the combination of the two types.
1 code implementation • Findings (ACL) 2022 • Xingzhang Ren, Baosong Yang, Dayiheng Liu, Haibo Zhang, Xiaoyu Lv, Liang Yao, Jun Xie
Recognizing the language of ambiguous texts has become a main challenge in language identification (LID).
no code implementations • CL (ACL) 2022 • Yu Wan, Baosong Yang, Derek Fai Wong, Lidia Sam Chao, Liang Yao, Haibo Zhang, Boxing Chen
After empirically investigating the rationale behind this, we summarize two challenges in NMT for STs associated with translation error types above, respectively: (1) the imbalanced length distribution in training set intensifies model inference calibration over STs, leading to more over-translation cases on STs; and (2) the lack of contextual information forces NMT to have higher data uncertainty on short sentences, and thus NMT model is troubled by considerable mistranslation errors.
no code implementations • 10 Jan 2025 • Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou
Previous models for voice interactions are categorized as native and aligned.
6 code implementations • 19 Dec 2024 • Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, TianHao Li, Tianyi Tang, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu
In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2. 5-Turbo and Qwen2. 5-Plus, both available from Alibaba Cloud Model Studio.
Ranked #7 on
on GPQA
no code implementations • 14 Nov 2024 • Yidan Zhang, Boyi Deng, Yu Wan, Baosong Yang, Haoran Wei, Bowen Yu, Junyang Lin, Fei Huang, Jingren Zhou
Recent advancements in large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
1 code implementation • 9 Nov 2024 • Yikang Liu, Yeting Shen, Hongao Zhu, Lilong Xu, Zhiheng Qian, Siyuan Song, Kejia Zhang, Jialong Tang, Pei Zhang, Baosong Yang, Rui Wang, Hai Hu
Whether and how language models (LMs) acquire the syntax of natural languages has been widely evaluated under the minimal pair paradigm.
1 code implementation • 29 Oct 2024 • Suhang Wu, Jialong Tang, Baosong Yang, Ante Wang, Kaidi Jia, Jiawei Yu, Junfeng Yao, Jinsong Su
Experimental results reveal linguistic inequalities: 1) high-resource languages stand out in Monolingual Knowledge Extraction; 2) Indo-European languages lead RALMs to provide answers directly from documents, alleviating the challenge of expressing answers across languages; 3) English benefits from RALMs' selection bias and speaks louder in multilingual knowledge selection.
1 code implementation • 17 Oct 2024 • Yiming Wang, Pei Zhang, Baosong Yang, Derek F. Wong, Rui Wang
LLM self-evaluation relies on the LLM's own ability to estimate response correctness, which can greatly improve its deployment reliability.
no code implementations • 3 Oct 2024 • Tianxiang Hu, Pei Zhang, Baosong Yang, Jun Xie, Derek F. Wong, Rui Wang
Achieving consistent high-quality machine translation (MT) across diverse domains remains a significant challenge, primarily due to the limited and imbalanced parallel training data available in various domains.
no code implementations • 29 Jul 2024 • Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang
We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders).
6 code implementations • 15 Jul 2024 • An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, TianHao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, Zhihao Fan
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
Ranked #3 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
no code implementations • 25 Jun 2024 • TianHao Li, Shangjie Li, Binbin Xie, Deyi Xiong, Baosong Yang
The advent of large language models (LLMs) has predominantly catered to high-resource languages, leaving a disparity in performance for low-resource languages.
no code implementations • 17 Jun 2024 • Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, Rongrong Ji
This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI), which includes multilingual text translation and text fusion within images.
1 code implementation • 10 Jun 2024 • Yan Gao, Zhiwei Cao, Zhongjian Miao, Baosong Yang, Shiyu Liu, Min Zhang, Jinsong Su
In this paper, we first conduct a preliminary study to reveal two key limitations of $k$NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of $\lambda$ for determining $k$NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for $k$NN retrieval at different timesteps.
1 code implementation • 22 May 2024 • Yiming Wang, Pei Zhang, Baosong Yang, Derek F. Wong, Zhuosheng Zhang, Rui Wang
Real-world data deviating from the independent and identically distributed (i. i. d.)
1 code implementation • 12 Jul 2023 • Xiangpeng Wei, Haoran Wei, Huan Lin, TianHao Li, Pei Zhang, Xingzhang Ren, Mei Li, Yu Wan, Zhiwei Cao, Binbin Xie, Tianxiang Hu, Shangjie Li, Binyuan Hui, Bowen Yu, Dayiheng Liu, Baosong Yang, Fei Huang, Jun Xie
Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions.
1 code implementation • 30 Jun 2023 • Yiming Wang, Zhuosheng Zhang, Pei Zhang, Baosong Yang, Rui Wang
Neural-symbolic methods have demonstrated efficiency in enhancing the reasoning abilities of large language models (LLMs).
1 code implementation • 26 May 2023 • Zhiwei Cao, Baosong Yang, Huan Lin, Suhang Wu, Xiangpeng Wei, Dayiheng Liu, Jun Xie, Min Zhang, Jinsong Su
$k$-Nearest neighbor machine translation ($k$NN-MT) has attracted increasing attention due to its ability to non-parametrically adapt to new translation domains.
no code implementations • 4 May 2023 • Binbin Xie, Jia Song, Liangying Shao, Suhang Wu, Xiangpeng Wei, Baosong Yang, Huan Lin, Jun Xie, Jinsong Su
In this paper, we comprehensively summarize representative studies from the perspectives of dominant models, datasets and evaluation metrics.
no code implementations • 17 Feb 2023 • Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie
In this paper, we propose Fine-Grained Translation Error Detection (FG-TED) task, aiming at identifying both the position and the type of translation errors on given source-hypothesis sentence pairs.
1 code implementation • 25 Nov 2022 • Pei Zhang, Baosong Yang, Haoran Wei, Dayiheng Liu, Kai Fan, Luo Si, Jun Xie
The lack of competency awareness makes NMT untrustworthy.
1 code implementation • 13 Nov 2022 • Binbin Xie, Xiangpeng Wei, Baosong Yang, Huan Lin, Jun Xie, Xiaoli Wang, Min Zhang, Jinsong Su
Keyphrase generation aims to automatically generate short phrases summarizing an input document.
1 code implementation • 18 Oct 2022 • Yu Wan, Keqin Bao, Dayiheng Liu, Baosong Yang, Derek F. Wong, Lidia S. Chao, Wenqiang Lei, Jun Xie
In this report, we present our submission to the WMT 2022 Metrics Shared Task.
1 code implementation • 18 Oct 2022 • Keqin Bao, Yu Wan, Dayiheng Liu, Baosong Yang, Wenqiang Lei, Xiangnan He, Derek F. Wong, Jun Xie
In this paper, we present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE (Unified Translation Evaluation).
no code implementations • 11 Aug 2022 • Kexin Yang, Dayiheng Liu, Wenqiang Lei, Baosong Yang, Qian Qu, Jiancheng Lv
To address this challenge, we explore a new draft-command-edit manner in description generation, leading to the proposed new task-controllable text editing in E-commerce.
1 code implementation • NAACL 2022 • Yiwei Wang, Muhao Chen, Wenxuan Zhou, Yujun Cai, Yuxuan Liang, Dayiheng Liu, Baosong Yang, Juncheng Liu, Bryan Hooi
In this paper, we propose the CORE (Counterfactual Analysis based Relation Extraction) debiasing method that guides the RE models to focus on the main effects of textual context without losing the entity information.
no code implementations • Findings (NAACL) 2022 • Juncheng Liu, Zequn Sun, Bryan Hooi, Yiwei Wang, Dayiheng Liu, Baosong Yang, Xiaokui Xiao, Muhao Chen
We study dangling-aware entity alignment in knowledge graphs (KGs), which is an underexplored but important problem.
1 code implementation • Findings (ACL) 2022 • Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo Zhang, Boxing Chen, Lidia S. Chao
Attention mechanism has become the dominant module in natural language processing models.
2 code implementations • ACL 2022 • Yu Wan, Dayiheng Liu, Baosong Yang, Haibo Zhang, Boxing Chen, Derek F. Wong, Lidia S. Chao
Translation quality evaluation plays a crucial role in machine translation.
no code implementations • 28 Apr 2022 • Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao
After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model with massive synthetic data pairs, and 3) fine-tuning the model with data denoising strategy.
no code implementations • 28 Apr 2022 • Kexin Yang, Dayiheng Liu, Wenqiang Lei, Baosong Yang, Mingfeng Xue, Boxing Chen, Jun Xie
We experimentally find that these prompts can be simply concatenated as a whole to multi-attribute CTG without any re-training, yet raises problems of fluency decrease and position sensitivity.
no code implementations • 1 Mar 2022 • Yidan Zhang, Yu Wan, Dayiheng Liu, Baosong Yang, Zhenan He
Recently, Minimum Bayes Risk (MBR) decoding has been proposed to improve the quality for NMT, which seeks for a consensus translation that is closest on average to other candidates from the n-best list.
no code implementations • 29 Dec 2021 • Tong Zhang, Wei Ye, Baosong Yang, Long Zhang, Xingzhang Ren, Dayiheng Liu, Jinan Sun, Shikun Zhang, Haibo Zhang, Wen Zhao
Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective.
1 code implementation • 15 Dec 2021 • Xin Liu, Dayiheng Liu, Baosong Yang, Haibo Zhang, Junwei Ding, Wenqing Yao, Weihua Luo, Haiying Zhang, Jinsong Su
Generative commonsense reasoning requires machines to generate sentences describing an everyday scenario given several concepts, which has attracted much attention recently.
no code implementations • 3 Nov 2021 • Linlong Xu, Baosong Yang, Xiaoyu Lv, Tianchi Bi, Dayiheng Liu, Haibo Zhang
Interactive and non-interactive model are the two de-facto standard frameworks in vector-based cross-lingual information retrieval (V-CLIR), which embed queries and documents in synchronous and asynchronous fashions, respectively.
Computational Efficiency
Cross-Lingual Information Retrieval
+5
no code implementations • ACL 2021 • Xin Liu, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Min Zhang, Haiying Zhang, Jinsong Su
A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary.
1 code implementation • ACL 2021 • Huan Lin, Liang Yao, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Degen Huang, Jinsong Su
Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus.
no code implementations • NAACL 2021 • Long Zhang, Tong Zhang, Haibo Zhang, Baosong Yang, Wei Ye, Shikun Zhang
Document-level neural machine translation (NMT) has proven to be of profound value for its effectiveness on capturing contextual information.
no code implementations • COLING 2020 • Liang Yao, Baosong Yang, Haibo Zhang, Boxing Chen, Weihua Luo
Query translation (QT) serves as a critical factor in successful cross-lingual information retrieval (CLIR).
no code implementations • 26 Oct 2020 • Tianchi Bi, Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen
Query translation (QT) is a key component in cross-lingual information retrieval system (CLIR).
no code implementations • 26 Oct 2020 • Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen
As a crucial role in cross-language information retrieval (CLIR), query translation has three main challenges: 1) the adequacy of translation; 2) the lack of in-domain parallel training data; and 3) the requisite of low latency.
1 code implementation • EMNLP 2020 • Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, Boxing Chen
Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans.
no code implementations • ACL 2020 • Yikai Zhou, Baosong Yang, Derek F. Wong, Yu Wan, Lidia S. Chao
We propose uncertainty-aware curriculum learning, which is motivated by the intuition that: 1) the higher the uncertainty in a translation pair, the more complex and rarer the information it contains; and 2) the end of the decline in model uncertainty indicates the completeness of current training stage.
2 code implementations • 11 Dec 2019 • Yu Wan, Baosong Yang, Derek F. Wong, Lidia S. Chao, Haihua Du, Ben C. H. Ao
As a special machine translation task, dialect translation has two main characteristics: 1) lack of parallel training corpus; and 2) possessing similar grammar between two sides of the translation.
no code implementations • 22 Nov 2019 • Jian Li, Xing Wang, Baosong Yang, Shuming Shi, Michael R. Lyu, Zhaopeng Tu
Starting from this intuition, we propose a novel approach to compose representations learned by different components in neural machine translation (e. g., multi-layer networks or multi-head attention), based on modeling strong interactions among neurons in the representation vectors.
1 code implementation • ACL 2019 • Mingzhou Xu, Derek F. Wong, Baosong Yang, Yue Zhang, Lidia S. Chao
Self-attention networks have received increasing research attention.
1 code implementation • ACL 2019 • Baosong Yang, Long-Yue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu
Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e. g. machine translation.
no code implementations • NAACL 2019 • Baosong Yang, Long-Yue Wang, Derek Wong, Lidia S. Chao, Zhaopeng Tu
Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies.
no code implementations • NAACL 2019 • Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu, Zhaopeng Tu
Multi-head attention is appealing for its ability to jointly extract different types of information from multiple representation subspaces.
no code implementations • NAACL 2019 • Jie Hao, Xing Wang, Baosong Yang, Long-Yue Wang, Jinfeng Zhang, Zhaopeng Tu
In addition to the standard recurrent neural network, we introduce a novel attentive recurrent network to leverage the strengths of both attention and recurrent networks.
no code implementations • 15 Feb 2019 • Baosong Yang, Jian Li, Derek Wong, Lidia S. Chao, Xing Wang, Zhaopeng Tu
Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies.
no code implementations • 31 Oct 2018 • Baosong Yang, Long-Yue Wang, Derek F. Wong, Lidia S. Chao, Zhaopeng Tu
Self-attention network (SAN) has recently attracted increasing interest due to its fully parallelized computation and flexibility in modeling dependencies.
no code implementations • EMNLP 2018 • Baosong Yang, Zhaopeng Tu, Derek F. Wong, Fandong Meng, Lidia S. Chao, Tong Zhang
Self-attention networks have proven to be of profound value for its strength of capturing global dependencies.
Ranked #28 on
Machine Translation
on WMT2014 English-German
no code implementations • EMNLP 2018 • Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, Tong Zhang
Multi-head attention is appealing for the ability to jointly attend to information from different representation subspaces at different positions.
no code implementations • EMNLP 2017 • Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, Jingbo Zhu
This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder.