🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

68 dataset results for Natural Language Understanding

XWINO is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities.

3 PAPERS • 1 BENCHMARK

Allegro Reviews

A comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing datasets for named entity recognition, question-answering, textual entailment, and others.

2 PAPERS • NO BENCHMARKS YET

Arabic Dataset for Commonsense Validation¬†

A benchmark Arabic dataset for commonsense understanding and validation as well as a baseline research and models trained using the same dataset.

2 PAPERS • NO BENCHMARKS YET

ChatLog

ChatLog is a coarse-to-fine temporal dataset called ChatLog, consisting of two parts that update monthly and daily:

2 PAPERS • NO BENCHMARKS YET

GeoGLUE (GeoGraphic Language Understanding Evaluation Benchmark)

GeoGLUE is a GeoGraphic Language Understanding Evaluation benchmark, which consists of six geographic text-related tasks, including geographic textual similarity on recall, geotagged geographic elements tagging, geographic composition analysis, geographic where what cut, and geographic entity alignment. All tasks' datasets are collected from open-released resources.

2 PAPERS • NO BENCHMARKS YET

NusaCrowd

NusaCrowd is a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, the authors have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments.

2 PAPERS • NO BENCHMARKS YET

RuMedBench

RuMedBench is a benchmark dataset for Russian medical language understanding.

2 PAPERS • NO BENCHMARKS YET

CANDOR Corpus (CANDOR = Conversation: A Naturalistic Dataset of Online Recordings)

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.

1 PAPER • NO BENCHMARKS YET

CLUES (Constrained Language Understanding Evaluation Standard)

CLUES (Constrained Language Understanding Evaluation Standard) is a benchmark for evaluating the few-shot learning capabilities of NLU models.

1 PAPER • NO BENCHMARKS YET

Dialog-based Language Learning dataset

Dialog-based Language Learning dataset is designed to measure how well models can perform at learning as a student given a teacher’s textual responses to the student’s answer (as well as potentially receiving an external real-valued reward signal).

1 PAPER • NO BENCHMARKS YET

EPIC30M

EPIC30M contains a subset of 26.2 millions tweets related to three general diseases, namely Ebola, Cholera and Swine Flu, and another subset of 4.7 millions tweets of six global epidemic outbreaks, including 2009 H1N1 Swine Flu, 2010 Haiti Cholera, 2012 Middle-East Respiratory Syndrome (MERS), 2013 West African Ebola, 2016 Yemen Cholera and 2018 Kivu Ebola.

1 PAPER • NO BENCHMARKS YET

ExPUNations

ExPUNations is a humor dataset with such extensive and fine-grained annotations specifically for puns. This dataset is designed for two new tasks namely, explanation generation to aid with pun classification and keyword-conditioned pun generation

1 PAPER • NO BENCHMARKS YET

FewGLUE_64_labeled

FewGLUE_64_labeled (A new version of FewGLUE with 64 training examples)

Introduction The FewGLUE_64_labeled dataset is a new version of FewGLUE dataset. It contains a 64-sample training set, a development set (the original SuperGLUE development set), a test set, and an unlabeled set. It is constructed to facilitate the research of few-shot learning for natural language understanding tasks.

1 PAPER • NO BENCHMARKS YET

IDK-MRC

IDK-MRC is an Indonesian Machine Reading Comprehension (MRC) dataset consists of more than 10K questions in total with over 5K unanswerable questions with diverse question types.

1 PAPER • NO BENCHMARKS YET

ImagiFilter

ImagiFilter focusses on photographic and/or natural images, a very common use-case in computer vision research. Annotations for coarse prediction are provided, i.e. photographic vs. non-photographic, and smaller fine-grained prediction tasks where the non-photographic class is broken down into five classes: maps, drawings, graphs, icons, and sketches.

1 PAPER • NO BENCHMARKS YET

MultiWOZ-coref

MultiWOZ-coref, (or MultiWOZ 2.3) is an extension of the MultiWOZ dataset that adds co-reference annotations in addition to corrections of dialogue acts and dialogue states.

1 PAPER • NO BENCHMARKS YET

NLU Evaluation Corpora

This project is a collection of three corpora which can be used for evaluating chatbots or other conversational interfaces. Two of the corpora were extracted from StackExchange, one from a Telegram chatbot.

1 PAPER • NO BENCHMARKS YET

Phrase-in-Context

Phrase in Context is a curated benchmark for phrase understanding and semantic search, consisting of three tasks of increasing difficulty: Phrase Similarity (PS), Phrase Retrieval (PR) and Phrase Sense Disambiguation (PSD). The datasets are annotated by 13 linguistic experts on Upwork and verified by two groups: ~1000 AMT crowdworkers and another set of 5 linguistic experts. PiC benchmark is distributed under CC-BY-NC 4.0.

1 PAPER • NO BENCHMARKS YET

Pre-trained Transliterated Embeddings for Indian Languages

We release various types of word embeddings for multiple Indian languages. Please note that for a majority of our work, we had transliterated the corpora to the Devanagiri script and the script is changed. Word Embedding models using FastText, ElMo, and cross-lingual models based on an orthogonal alignment of monolingual models for all pairs of these languages.

1 PAPER • NO BENCHMARKS YET

bigscience/P3

bigscience/P3 (bigscience/P3, split='ai2_arc_ARC_Challenge_pick_the_most_correct_option')

This datasets consists of challenging reasoning questions in multiple choice format.

1 PAPER • NO BENCHMARKS YET

Datasets

68 dataset results for Natural Language Understanding