In this work, we propose to improve the sampling procedure by selecting the most informative monolingual sentences to complement the parallel data.
Automatic machine translation is super efficient to produce translations yet their quality is not guaranteed.
Spatial-temporal data forecasting is of great importance for industries such as telecom network operation and transportation management.
In this paper, we propose a novel hierarchical representation via message propagation (HRMP) method for robust model fitting, which simultaneously takes advantages of both the consensus analysis and the preference analysis to estimate the parameters of multiple model instances from data corrupted by outliers, for robust model fitting.
In this paper, we focus on investigating the influence on hydrodynamic factors of different coupled computational models describing the interaction between an incompressible fluid and two symmetric elastic or poroelastic structures.
Fluid Dynamics Numerical Analysis Numerical Analysis
In addition, experimental results demonstrate that our Multi-Task NAT is complementary to knowledge distillation, the standard knowledge transfer method for NAT.
This is done by identifying and updating only the most relevant neurons of the neural network for each training sample in the data.
First, we train an identification model on the original training data, and use it to distinguish inactive examples and active examples by their sentence-level output probabilities.
Existing exploration strategies in reinforcement learning (RL) often either ignore the history or feedback of search, or are complicated to implement.
We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market.
In this work, we propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods, particularly in the deep Q-networks where the overestimation is exaggerated by function approximation errors.
Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words.
In this paper, we bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table -- an interpretable table of bilingual lexicons.
Starting from this intuition, we propose a novel approach to compose representations learned by different components in neural machine translation (e. g., multi-layer networks or multi-head attention), based on modeling strong interactions among neurons in the representation vectors.
Current state-of-the-art neural machine translation (NMT) uses a deep multi-head self-attention network with no explicit phrase information.
Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks (RNNs) outperforms both individual architectures, while not much is known about why the hybrid models work.
Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al., 2018).
Although neural machine translation (NMT) has advanced the state-of-the-art on various language pairs, the interpretability of NMT remains unsatisfactory.
To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering.
In addition to the standard recurrent neural network, we introduce a novel attentive recurrent network to leverage the strengths of both attention and recurrent networks.
Multi-head attention is appealing for its ability to jointly extract different types of information from multiple representation subspaces.
Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies.
With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation.
Neural machine translation (NMT) models generally adopt an encoder-decoder architecture for modeling the entire translation process.
Advanced neural machine translation (NMT) models generally implement encoder and decoder as multiple layers, which allows systems to model complex functions and capture complicated linguistic structures.
In this work, we developed a network inference method from incomplete data ("PathInf") , as massive and non-uniformly distributed missing values is a common challenge in practical problems.
Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years.
Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems.