The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. Because the questions and answers are produced by humans through crowdsourcing, it is more diverse than some other question-answering datasets. SQuAD 1.1 contains 107,785 question-answer pairs on 536 articles. SQuAD2.0 (open-domain SQuAD, SQuAD-Open), the latest version, combines the 100,000 questions in SQuAD1.1 with over 50,000 un-answerable questions written adversarially by crowdworkers in forms that are similar to the answerable ones.
1,918 PAPERS • 11 BENCHMARKS
ChatHaruhi is a dataset covering 32 Chinese / English TV / anime characters with over 54k simulated dialogues.
3 PAPERS • NO BENCHMARKS YET
This is a dataset of paraphrases created by ChatGPT.
0 PAPER • NO BENCHMARKS YET