2 dataset results for art AND Natural Language Understanding

…Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.

70 PAPERS • 1 BENCHMARK

Belebele

…While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models.

17 PAPERS • NO BENCHMARKS YET

Datasets

2 dataset results for art AND Natural Language Understanding