DMU is complementary to the domain-targeted augmentation, and substantially improves performance on SNLI-hard.
We apply our method to the highly challenging ANLI dataset, where our framework improves the performance of both a DeBERTa-base and BERT baseline.
We can further improve model performance and span-level decisions by using the e-SNLI explanations during training.
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their training data, impacting how well they generalise to other unseen datasets.
Natural Language Inference (NLI) datasets contain annotation artefacts resulting in spurious correlations between the natural language utterances and their respective entailment classes.