3 dataset results for art AND Machine Reading Comprehension AND Texts

…Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.

71 PAPERS • NO BENCHMARKS YET

AdversarialQA

…The adversarial human annotation paradigm ensures that these datasets consist of questions that current state-of-the-art models (at least the ones used as adversaries in the annotation loop) find challenging

24 PAPERS • 2 BENCHMARKS

Belebele

…While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models.

19 PAPERS • NO BENCHMARKS YET

Datasets

3 dataset results for art AND Machine Reading Comprehension AND Texts