Despite the success of multilingual pre-trained language models, it remains unclear to what extent these models have human-like generalization capacity across languages.
This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions.
We also find that the generalization performance to unseen combinations is better when the form of meaning representations is simpler.
There has been a widely held view that visual representations (e. g., photographs and illustrations) do not depict negation, for example, one that can be expressed by a sentence "the train is not coming".
Despite the recent success of deep neural networks in natural language processing, the extent to which they can demonstrate human-like generalization capacities for natural language understanding remains unclear.
In formal semantics, there are two well-developed semantic frameworks: event semantics, which treats verbs and adverbial modifiers using the notion of event, and degree semantics, which analyzes adjectives and comparatives using the notion of degree.
Comparative constructions pose a challenge in Natural Language Inference (NLI), which is the task of determining whether a text entails a hypothesis.
This indicates that the generalization ability of neural models is limited to cases where the syntactic structures are nearly the same as those in the training set.
Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures.
A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations.
We propose a new domain adaptation method for Combinatory Categorial Grammar (CCG) parsing, based on the idea of automatic generation of CCG corpora exploiting cheaper resources of dependency trees.
The basic idea is to assign the same type to both declarative sentences and interrogative sentences, partly building on the recent proposal in Inquisitive Semantics.
To investigate this issue, we introduce a new dataset, called HELP, for handling entailments with lexical and logical phenomena.
In logic-based approaches to reasoning tasks such as Recognizing Textual Entailment (RTE), it is important for a system to have a large amount of knowledge data.
In this paper, we present a sequence-to-sequence model for generating sentences from logical meaning representations based on event semantics.
How to identify, extract, and use phrasal knowledge is a crucial problem for the task of Recognizing Textual Entailment (RTE).
In formal logic-based approaches to Recognizing Textual Entailment (RTE), a Combinatory Categorial Grammar (CCG) parser is used to parse input premises and hypotheses to obtain their logical formulas.
Only a limited part of the contents in this tutorial is drawn from the previous one.
In the logic approach to Recognizing Textual Entailment, identifying phrase-to-phrase semantic relations is still an unsolved problem.
Determining semantic textual similarity is a core research subject in natural language processing.
We approach the recognition of textual entailment using logical semantic representations and a theorem prover.
This paper proposes a methodology for building a specialized Japanese data set for recognizing temporal relations and discourse relations.