Based on our success in the last WMT, we continuously employed advanced techniques such as large batch training, data selection and data filtering.
Across an extensive set of experiments on 10 machine translation tasks, we find that RAN models are competitive and outperform their Transformer counterpart in certain scenarios, with fewer parameters and inference time.
This paper describes Tencent Neural Machine Translation systems for the WMT 2020 news translation tasks.
In this paper, we propose a novel method to extract multi-granularity features based solely on the original input sentences.
Confidence estimation aims to quantify the confidence of the model prediction, providing an expectation of success.
Pretrained language models (PLMs) trained on large-scale unlabeled corpus are typically fine-tuned on task-specific downstream datasets, which have produced state-of-the-art results on various NLP tasks.
The key feature of our approach is that it is sparsely activated guided by predefined skills.
(2) how to predict a word via cloze test without knowing the number of wordpieces in advance?
In terms of the local view, we first build a graph structure based on the document where phrases are regarded as vertices and the edges are similarities between vertices.
Attention mechanisms have achieved substantial improvements in neural machine translation by dynamically selecting relevant inputs for different predictions.
Neural models have achieved great success on the task of machine reading comprehension (MRC), which are typically trained on hard labels.
Emotion lexicons have been shown effective for emotion classification (Baziotis et al., 2018).
In addition, by transferring knowledge from other kinds of MRC tasks, our model achieves a new state-of-the-art results in both single and ensemble settings.
Ranked #1 on Reading Comprehension on RACE
Due to the highly parallelizable architecture, Transformer is faster to train than RNN-based models and popularly used in machine translation tasks.
In order to avoid such sophisticated alternate optimization, we propose to learn unsupervised word mapping by directly maximizing the mean discrepancy between the distribution of transferred embedding and target embedding.
Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation process are fed as inputs to the model and can be quickly amplified, harming subsequent sequence generation.
2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou
Machine translation has made rapid advances in recent years.
Ranked #3 on Machine Translation on WMT 2017 English-Chinese
Nowadays a typical Neural Machine Translation (NMT) model generates translations from left to right as a linear sequence, during which latent syntactic structures of the target sentences are not explicitly concerned.