Modal verbs have different interpretations depending on their context.
We conclude that not only is the question of robustness in NLP as yet unresolved, but even some of the approaches to measure robustness need to be reassessed.
Contrast set consistency is a robustness measurement that evaluates the rate at which a model correctly responds to all instances in a bundle of minimally different examples relying on the same knowledge.
We also have workers make three kinds of edits to the passage -- paraphrasing the negated statement, changing the scope of the negation, and reversing the negation -- resulting in clusters of question-answer pairs that are difficult for models to answer with spurious shortcuts.
Specifically, we evaluate how training self-rationalization models with free-text rationales affects robustness to spurious correlations in fine-tuned encoder-decoder and decoder-only models of six different sizes.
Large neural networks can now generate jokes, but do they really "understand" humor?
Combining the visual modality with pretrained language models has been surprisingly effective for simple descriptive tasks such as image captioning.
We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible.
An attention matrix of a transformer self-attention sublayer can provably be decomposed into two components and only one of them (effective attention) contributes to the model output.
Finally, we conclude with some recommendations for how to created and document web-scale datasets from a scrape of the internet.
Generating text from structured inputs, such as meaning representations or RDF triples, has often involved the use of specialized graph-encoding neural networks.
Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case).
In interpretable NLP, we require faithful rationales that reflect the model's decision-making process for an explained instance.
We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i. e., trust between people).
Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights.
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP.
Machine comprehension of texts longer than a single sentence often requires coreference resolution.
For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towards what?".
Ranked #2 on Fine-Grained Opinion Analysis on MPQA (using extra training data)
We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors.
Ranked #1 on Abstract Anaphora Resolution on The ARRAU Corpus
Modal sense classification (MSC) is a special WSD task that depends on the meaning of the proposition in the modal's scope.