ChID: A Large-scale Chinese IDiom Dataset for Cloze Test

ACL 2019  ·  Chujie Zheng, Minlie Huang, Aixin Sun ·

Cloze-style reading comprehension in Chinese is still limited due to the lack of various corpora. In this paper we propose a large-scale Chinese cloze test dataset ChID, which studies the comprehension of idiom, a unique language phenomenon in Chinese. In this corpus, the idioms in a passage are replaced by blank symbols and the correct answer needs to be chosen from well-designed candidate idioms. We carefully study how the design of candidate idioms and the representation of idioms affect the performance of state-of-the-art models. Results show that the machine accuracy is substantially worse than that of human, indicating a large space for further research.

PDF Abstract ACL 2019 PDF ACL 2019 Abstract

Datasets


Introduced in the Paper:

ChID

Used in the Paper:

CBT LAMBADA Who-did-What CLOTH CMRC 2017

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here