4 dataset results for art AND Text Generation

HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).

437 PAPERS • 1 BENCHMARK

LogiQA

…Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.

71 PAPERS • NO BENCHMARKS YET

LitMind Dictionary

…Incorporating state-of-the-art definition generation models, it supports not only Chinese and English, but also Chinese-English cross-lingual queries.

1 PAPER • NO BENCHMARKS YET

HarmfulQA

…Biology, Astronomy, Geology, Computer Science, Engineering, Environmental Science, Neuroscience, Robotics | | History and Culture | Ancient History, Medieval History, Modern History, World History, Art

6 PAPERS • 1 BENCHMARK

Datasets

4 dataset results for art AND Text Generation