DuReader is a large-scale open-domain Chinese machine reading comprehension dataset. The dataset consists of 200K questions, 420K answers and 1M documents. The questions and documents are based on Baidu Search and Baidu Zhidao. The answers are manually generated. The dataset additionally provides question type annotations – each question was manually annotated as either Entity, Description or YesNo and one of Fact or Opinion.
39 PAPERS • 4 BENCHMARKS
Delta Reading Comprehension Dataset (DRCD) is an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators.
26 PAPERS • 5 BENCHMARKS