ExPUNations is a humor dataset with such extensive and fine-grained annotations specifically for puns. This dataset is designed for two new tasks namely, explanation generation to aid with pun classification and keyword-conditioned pun generation
1 PAPER • NO BENCHMARKS YET
Introduction The FewGLUE_64_labeled dataset is a new version of FewGLUE dataset. It contains a 64-sample training set, a development set (the original SuperGLUE development set), a test set, and an unlabeled set. It is constructed to facilitate the research of few-shot learning for natural language understanding tasks.
IDK-MRC is an Indonesian Machine Reading Comprehension (MRC) dataset consists of more than 10K questions in total with over 5K unanswerable questions with diverse question types.
MultiWOZ-coref, (or MultiWOZ 2.3) is an extension of the MultiWOZ dataset that adds co-reference annotations in addition to corrections of dialogue acts and dialogue states.
Phrase in Context is a curated benchmark for phrase understanding and semantic search, consisting of three tasks of increasing difficulty: Phrase Similarity (PS), Phrase Retrieval (PR) and Phrase Sense Disambiguation (PSD). The datasets are annotated by 13 linguistic experts on Upwork and verified by two groups: ~1000 AMT crowdworkers and another set of 5 linguistic experts. PiC benchmark is distributed under CC-BY-NC 4.0.
We release various types of word embeddings for multiple Indian languages. Please note that for a majority of our work, we had transliterated the corpora to the Devanagiri script and the script is changed. Word Embedding models using FastText, ElMo, and cross-lingual models based on an orthogonal alignment of monolingual models for all pairs of these languages.
This datasets consists of challenging reasoning questions in multiple choice format.