Despite the success of multilingual pre-trained language models, it remains unclear to what extent these models have human-like generalization capacity across languages.
We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles.
This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions.
We also find that the generalization performance to unseen combinations is better when the form of meaning representations is simpler.
Despite the recent success of deep neural networks in natural language processing, the extent to which they can demonstrate human-like generalization capacities for natural language understanding remains unclear.
This indicates that the generalization ability of neural models is limited to cases where the syntactic structures are nearly the same as those in the training set.
Monotonicity reasoning is one of the important reasoning skills for any intelligent natural language inference (NLI) model in that it requires the ability to capture the interaction between lexical and syntactic structures.
A large amount of research about multimodal inference across text and vision has been recently developed to obtain visually grounded word and sentence representations.
To investigate this issue, we introduce a new dataset, called HELP, for handling entailments with lexical and logical phenomena.
In this paper, we present a sequence-to-sequence model for generating sentences from logical meaning representations based on event semantics.
How to identify, extract, and use phrasal knowledge is a crucial problem for the task of Recognizing Textual Entailment (RTE).
Determining semantic textual similarity is a core research subject in natural language processing.