Japanese Realistic Textual Entailment Corpus

LREC 2020  ·  Yuta Hayashibe ·

We perform the textual entailment (TE) corpus construction for the Japanese Language with the following three characteristics: First, the corpus consists of realistic sentences; that is, all sentences are spontaneous or almost equivalent. It does not need manual writing which causes hidden biases. Second, the corpus contains adversarial examples. We collect challenging examples that can not be solved by a recent pre-trained language model. Third, the corpus contains explanations for a part of non-entailment labels. We perform the reasoning annotation where annotators are asked to check which tokens in hypotheses are the reason why the relations are labeled. It makes easy to validate the annotation and analyze system errors. The resulting corpus consists of 48,000 realistic Japanese examples. It is the largest among publicly available Japanese TE corpora. Additionally, it is the first Japanese TE corpus that includes reasons for the annotation as we know. We are planning to distribute this corpus to the NLP community at the time of publication.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here