Specifically, we focus on the OPT family of models ranging from 125m to 66b parameters and rely only on whether an FFN neuron is activated or not.
Hallucinations in machine translation are translations that contain information completely unrelated to the input.
We propose to use a method that evaluates the percentage of the source contribution to a generated translation.
Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground.
Differently from the traditional statistical MT that decomposes the translation task into distinct separately learned components, neural machine translation uses a single neural network to model the entire translation process.
In this work, we make the first step towards unsupervised discovery of interpretable directions in language latent spaces.
We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature.
We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm.
The dominant approach to sequence generation is to produce a sequence in some predefined order, e. g. left to right.
Subword segmentation is widely used to address the open vocabulary problem in machine translation.
Ranked #1 on Machine Translation on IWSLT2017 English-Arabic
In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and how this process depends on the choice of learning objective.
Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for neural machine translation.
Though machine translation errors caused by the lack of context beyond one sentence have long been acknowledged, the development of context-aware NMT systems is hampered by several problems.
We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set.
Standard machine translation systems process sentences in isolation and hence ignore extra-sentential information, even though extended context can both prevent mistakes in ambiguous cases and improve translation coherence.