no code implementations • • Lucia Specia, Zhenhao Li, Juan Pino, Vishrav Chaudhary, Francisco Guzmán, Graham Neubig, Nadir Durrani, Yonatan Belinkov, Philipp Koehn, Hassan Sajjad, Paul Michel, Xian Li
We report the findings of the second edition of the shared task on improving robustness in Machine Translation (MT).
Invariant Risk Minimization (IRM) is a recently proposed framework for out-of-distribution (o. o. d) generalization.
Among these, the common approach is to use an external probe to rank neurons according to their relevance to some linguistic attribute, and to evaluate the obtained ranking using the same probe that produced it.
Model robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets.
Many natural language inference (NLI) datasets contain biases that allow models to perform well by only using a biased subset of the input, without considering the remainder features.
Moreover, we show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets, and thereby obtains better generalization to out-of-domain datasets.
Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts.
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their training data, impacting how well they generalise to other unseen datasets.
Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing.
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task.
As a case study, we apply this methodology to analyzing gender bias in pre-trained Transformer language models.
We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.
We found small subsets of neurons to predict linguistic tasks, with lower level tasks (such as morphology) localized in fewer neurons, compared to higher level task of predicting syntax.
While deep learning has transformed the natural language processing (NLP) field and impacted the larger computational linguistics community, the rise of neural networks is stained by their opaque nature: It is challenging to interpret the inner workings of neural network models, and explicate their behavior.
The predominant approach to open-domain dialog generation relies on end-to-end training of neural models on chat datasets.
Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability.
We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation.
Although neural models have achieved impressive results on several NLP benchmarks, little is understood about the mechanisms they use to perform language tasks.
Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both.
Transformer-based deep NLP models are trained using hundreds of millions of parameters, limiting their applicability in computationally constrained environments.
We introduce three memory-augmented Recurrent Neural Networks (MARNNs) and explore their capabilities on a series of simple language modeling tasks whose solutions require stack-based mechanisms.
We experiment on large-scale natural language inference and fact verification benchmarks, evaluating on out-of-domain datasets that are specifically designed to assess the robustness of models against known biases in the training data.
Popular Natural Language Inference (NLI) datasets have been shown to be tainted by hypothesis-only biases.
In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise.
End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions.
We share the findings of the first shared task on improving robustness of Machine Translation (MT).
Visual question answering (VQA) models have been shown to over-rely on linguistic biases in VQA datasets, answering questions "blindly" without considering visual context.
In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations.
The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks.
In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase.
Ranked #11 on Language Modelling on WikiText-103
Recent work has shown that contextualized word representations derived from neural machine translation are a viable alternative to such from simple word predictions tasks.
Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language.
Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones.
We present a toolkit to facilitate the interpretation and understanding of neural network models.
We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models?
Neural machine translation (NMT) models learn representations containing substantial linguistic information.
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties.
We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena.
In this paper, we investigate the representations learned at different layers of NMT encoders.
Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems.
End-to-end training makes the neural machine translation (NMT) architecture simpler, yet elegant compared to traditional statistical machine translation (SMT).
In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss.
Word segmentation plays a pivotal role in improving any Arabic NLP application.
Model stacking works best when training begins with the furthest out-of-domain data and the model is incrementally fine-tuned with the next furthest domain and so on.
Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture.
In real-world data, e. g., from Web forums, text is often contaminated with redundant or irrelevant content, which leads to introducing noise in machine learning algorithms.
Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair.
The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.
In this paper, we show that word vector representations can yield significant PP attachment performance gains.