55 dataset results for Natural Language Understanding AND Texts

ExPUNations is a humor dataset with such extensive and fine-grained annotations specifically for puns. This dataset is designed for two new tasks namely, explanation generation to aid with pun classification and keyword-conditioned pun generation

1 PAPER • NO BENCHMARKS YET

FewGLUE_64_labeled

FewGLUE_64_labeled (A new version of FewGLUE with 64 training examples)

Introduction The FewGLUE_64_labeled dataset is a new version of FewGLUE dataset. It contains a 64-sample training set, a development set (the original SuperGLUE development set), a test set, and an unlabeled set. It is constructed to facilitate the research of few-shot learning for natural language understanding tasks.

1 PAPER • NO BENCHMARKS YET

IDK-MRC

IDK-MRC is an Indonesian Machine Reading Comprehension (MRC) dataset consists of more than 10K questions in total with over 5K unanswerable questions with diverse question types.

1 PAPER • NO BENCHMARKS YET

MultiWOZ-coref

MultiWOZ-coref, (or MultiWOZ 2.3) is an extension of the MultiWOZ dataset that adds co-reference annotations in addition to corrections of dialogue acts and dialogue states.

1 PAPER • NO BENCHMARKS YET

Phrase-in-Context

Phrase in Context is a curated benchmark for phrase understanding and semantic search, consisting of three tasks of increasing difficulty: Phrase Similarity (PS), Phrase Retrieval (PR) and Phrase Sense Disambiguation (PSD). The datasets are annotated by 13 linguistic experts on Upwork and verified by two groups: ~1000 AMT crowdworkers and another set of 5 linguistic experts. PiC benchmark is distributed under CC-BY-NC 4.0.

1 PAPER • NO BENCHMARKS YET

Pre-trained Transliterated Embeddings for Indian Languages

We release various types of word embeddings for multiple Indian languages. Please note that for a majority of our work, we had transliterated the corpora to the Devanagiri script and the script is changed. Word Embedding models using FastText, ElMo, and cross-lingual models based on an orthogonal alignment of monolingual models for all pairs of these languages.

1 PAPER • NO BENCHMARKS YET

bigscience/P3

bigscience/P3 (bigscience/P3, split='ai2_arc_ARC_Challenge_pick_the_most_correct_option')

This datasets consists of challenging reasoning questions in multiple choice format.

1 PAPER • NO BENCHMARKS YET

Datasets

55 dataset results for Natural Language Understanding AND Texts