For machine translation part, we pretrained three translation models on WMT21 dataset and fine-tuned them on in-domain corpora.
Mask-predict CMLM (Ghazvininejad et al., 2019) has achieved stunning performance among non-autoregressive NMT models, but we find that the mechanism of predicting all of the target words only depending on the hidden state of [MASK] is not effective and efficient in initial iterations of refinement, resulting in ungrammatical repetitions and slow convergence.
Based on large-scale pretrained networks and the liability to be easily overfitting with limited labelled training data of multimodal translation (MMT) is a critical issue in MMT.
This paper presents our work in the WMT 2020 Word and Sentence-Level Post-Editing Quality Estimation (QE) Shared Task.
Length prediction is a special task in a series of NAT models where target length has to be determined before generation.
This paper presents our work in WMT 2021 Quality Estimation (QE) Shared Task.
The paper presents the HW-TSC’s pipeline and results of Offline Speech to Speech Translation for IWSLT 2022.
The cascade system is composed of a chunking-based streaming ASR model and the SimulMT model used in the T2T track.
The paper presents the submission by HW-TSC in the WMT 2020 Automatic Post Editing Shared Task.
Natural Language Inference (NLI) datasets contain examples with highly ambiguous labels due to its subjectivity.
The proliferation of pretrained models, as a result of advancements in pretraining techniques, has led to the emergence of a vast zoo of publicly available models.
In our implementation, we adopt both the state-of-the-art molecule embedding models under the supervised learning paradigm and the pretraining paradigm as the molecule representation module of PEMP, respectively.
Motivated by this, we design a new group invariant learning method, which constructs groups with statistical independence tests, and reweights samples by group label proportion to meet the criteria.
In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework.
no code implementations • 22 Dec 2021 • Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang, Yuxia Wang, Zongyao Li, Zhengzhe Yu, Zhanglin Wu, Yimeng Chen, Chang Su, Min Zhang, Lizhi Lei, Shimin Tao, Hao Yang
An effective training strategy to improve the performance of AT models is Self-Distillation Mixup (SDM) Training, which pre-trains a model on raw data, generates distilled data by the pre-trained model itself and finally re-trains a model on the combination of raw data and distilled data.
no code implementations • 22 Dec 2021 • Zhengzhe Yu, Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang, Zongyao Li, Zhanglin Wu, Yuxia Wang, Yimeng Chen, Chang Su, Min Zhang, Lizhi Lei, Shimin Tao, Hao Yang
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18.
Ensemble-based debiasing methods have been shown effective in mitigating the reliance of classifiers on specific dataset bias, by exploiting the output of a bias-only model to adjust the learning target.