This paper describes Tencent Neural Machine Translation systems for the WMT 2020 news translation tasks.
This paper describes the Tencent AI Lab submission of the WMT2021 shared task on biomedical translation in eight language directions: English-German, English-French, English-Spanish and English-Russian.
Based on our success in the last WMT, we continuously employed advanced techniques such as large batch training, data selection and data filtering.
This paper describes the Tencent AI Lab submission of the WMT2020 shared task on biomedical translation in four language directions: German<->English, English<->German, Chinese<->English and English<->Chinese.
This paper describes the Tencent AI Lab’s submission of the WMT 2020 shared task on chat translation in English-German.
This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness.
The cosh-attention module reduces the space and time complexity of the attention operation.
Finally, our unconstrained system achieves BLEU scores of 17. 0 and 30. 4 for English to/from Livonian.
In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data.
The approach first classifies the auto-annotated instances into two groups: confident instances and uncertain instances, according to the probabilities predicted by a teacher model.
In this work, we propose a deep-learning-based segmentation method to perform accurate semantic segmentation on fused data from a LiDAR-Camera visual sensor.
Then, Next Hybrid Strategy (NHS) is designed to stack NCB and NTB in an efficient hybrid paradigm, which boosts performance in various downstream tasks.
Ranked #233 on Image Classification on ImageNet
We revisit the existing excellent Transformers from the perspective of practical application.
A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search.
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT).
By carefully designing experiments, we identify two representative characteristics of the data gap in source: (1) style gap (i. e., translated vs. natural text style) that leads to poor generalization capability; (2) content gap that induces the model to produce hallucination content biased towards the target language.
It is vital for robots to recognise and localise fruits before the harvesting in natural orchards.
AMF-STGCN extends GCN by (1) jointly modeling the complex spatial-temporal dependencies in mobile networks, (2) applying attention mechanisms to capture various Receptive Fields of heterogeneous base stations, and (3) introducing an extra decoder based on a fully connected deep network to conquer the error propagation challenge with multi-step forecasting.
Active learning is a subfield of machine learning that is devised for design and modeling of systems with highly expensive sampling costs.
In this work, we propose to improve the sampling procedure by selecting the most informative monolingual sentences to complement the parallel data.
Automatic machine translation is super efficient to produce translations yet their quality is not guaranteed.
Spatial-temporal data forecasting is of great importance for industries such as telecom network operation and transportation management.
In this paper, we propose a novel hierarchical representation via message propagation (HRMP) method for robust model fitting, which simultaneously takes advantages of both the consensus analysis and the preference analysis to estimate the parameters of multiple model instances from data corrupted by outliers, for robust model fitting.
In this paper, we focus on investigating the influence on hydrodynamic factors of different coupled computational models describing the interaction between an incompressible fluid and two symmetric elastic or poroelastic structures.
Fluid Dynamics Numerical Analysis Numerical Analysis
In addition, experimental results demonstrate that our Multi-Task NAT is complementary to knowledge distillation, the standard knowledge transfer method for NAT.
This is done by identifying and updating only the most relevant neurons of the neural network for each training sample in the data.
First, we train an identification model on the original training data, and use it to distinguish inactive examples and active examples by their sentence-level output probabilities.
Existing exploration strategies in reinforcement learning (RL) often either ignore the history or feedback of search, or are complicated to implement.
We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market.
In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep Q-networks where the overestimation is exaggerated by function approximation errors.
In this way, PAD-NAS can automatically design the operations for each layer and achieve a trade-off between search space quality and model diversity.
Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words.
In this paper, we bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table -- an interpretable table of bilingual lexicons.
Starting from this intuition, we propose a novel approach to compose representations learned by different components in neural machine translation (e. g., multi-layer networks or multi-head attention), based on modeling strong interactions among neurons in the representation vectors.
Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information.
Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks (RNNs) outperforms both individual architectures, while not much is known about why the hybrid models work.
Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al., 2018).
Although neural machine translation (NMT) has advanced the state-of-the-art on various language pairs, the interpretability of NMT remains unsatisfactory.
To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering.
In addition to the standard recurrent neural network, we introduce a novel attentive recurrent network to leverage the strengths of both attention and recurrent networks.
Multi-head attention is appealing for its ability to jointly extract different types of information from multiple representation subspaces.
With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation.
Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies.
Neural machine translation (NMT) models generally adopt an encoder-decoder architecture for modeling the entire translation process.
Advanced neural machine translation (NMT) models generally implement encoder and decoder as multiple layers, which allows systems to model complex functions and capture complicated linguistic structures.
In this work, we developed a network inference method from incomplete data ("PathInf") , as massive and non-uniformly distributed missing values is a common challenge in practical problems.
Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years.
Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems.