🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language

61 dataset results for art AND Texts

ART Dataset

ART Dataset (Abductive Reasoning in narrative Text)

ART consists of over 20k commonsense narrative contexts and 200k explanations.

9 PAPERS • NO BENCHMARKS YET

SemArt

SemArt is a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections It contains 21,384 samples that provides artistic comments along with fine-art paintings and their attributes for studying semantic art understanding.

13 PAPERS • NO BENCHMARKS YET

ArtEmis

ArtEmis is a large-scale dataset aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. This dataset focuses on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. ArtEmis contains 439K emotion attributions and explanations from humans, on 81K artworks from WikiArt. Paper: ArtEmis: Affective Language for Visual Art

28 PAPERS • NO BENCHMARKS YET

CoDraw

…The Teller sees an abstract scene containing multiple clip art pieces in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip art pieces.

12 PAPERS • NO BENCHMARKS YET

Robust Summarization Evaluation Benchmark

Robust Summarization Evaluation Benchmark is a large human evaluation dataset consisting of over 22k summary-level annotations over state-of-the-art systems on three datasets.

1 PAPER • NO BENCHMARKS YET

VIST-Edit

…The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions.

2 PAPERS • NO BENCHMARKS YET

CICERO (Contextualized Commonsense Inference in Dialogues)

…It involves two challenging generative and multi-choice alternative selection tasks for the state-of-the-art NLP models to solve. Download the dataset using this link.

12 PAPERS • 4 BENCHMARKS

LogiQA

…Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.

71 PAPERS • NO BENCHMARKS YET

ANLI (Adversarial NLI)

…Particular, the data is selected to be difficult to the state-of-the-art models, including BERT and RoBERTa.

245 PAPERS • 2 BENCHMARKS

FaithDial

…If trained on FaithDial, state-of-the-art dialogue models are significantly more faithful while also enhancing other dialogue aspects like cooperativeness, creativity and engagement.

12 PAPERS • NO BENCHMARKS YET

FIGER (Fine-Grained Entity Recognition)

The FIGER dataset is an entity recognition dataset where entities are labelled using fine-grained system 112 tags, such as person/doctor, art/written_work and building/hotel.

96 PAPERS • 2 BENCHMARKS

LitMind Dictionary

…Incorporating state-of-the-art definition generation models, it supports not only Chinese and English, but also Chinese-English cross-lingual queries.

1 PAPER • NO BENCHMARKS YET

CVSS

…CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems

18 PAPERS • 1 BENCHMARK

ec-darkpattern

ec-darkpattern is a dataset for dark pattern detection and prepared its baseline detection performance with state-of-the-art machine learning methods.

1 PAPER • NO BENCHMARKS YET

SBU-WSD-Corpus

SBU-WSD-Corpus consists of 19 Persian documents in different domains such as Sports, Science, Arts, etc.

1 PAPER • NO BENCHMARKS YET

AM2iCo (Adversarial and Multilingual Meaning in Context)

…It aims to assess the ability of state-of-the-art representation models to reason over cross-lingual lexical-level concept alignment in context for 14 language pairs.

2 PAPERS • NO BENCHMARKS YET

SPOT (Sentiment Polarity Annotations Dataset)

…Annotations have been gathered on 2 levels of granulatiry: Sentences Elementary Discourse Units (EDUs), i.e. sub-sentence clauses produced by a state-of-the-art RST parser This dataset is intended to

3 PAPERS • NO BENCHMARKS YET

PatentMatch

We address the computer-assisted search for prior art by creating a training dataset for supervised machine learning called PatentMatch.

2 PAPERS • NO BENCHMARKS YET

ASR-RAMC-BIGCCSC: A CHINESE CONVERSATIONAL SPEECH CORPUS

…It covers 15 topics, including humanities, entertainment, sports, military, finance, religion, family life, politics, education, digital devices, environment, science, professional development, art and

1 PAPER • NO BENCHMARKS YET

InfoSeek (Visual Information Seeking)

…Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset

17 PAPERS • 2 BENCHMARKS

BoostCLIR

…Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art

2 PAPERS • NO BENCHMARKS YET

LARC (Language-annotated Abstraction and Reasoning)

…While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI.

4 PAPERS • NO BENCHMARKS YET

WebQuestions

…(bachelor_of_arts, juris_doctor).

201 PAPERS • 3 BENCHMARKS

GitHub Typo Corpus

…Either way, thank you—you contributed to the state-of-the-art in the NLP field.

3 PAPERS • NO BENCHMARKS YET

Hephaestus (Hephaestus: A large scale multitask dataset towards InSAR understanding)

…The goal of this dataset is to boost research on exploitation of interferometric data enabling the application of state-of-the-art computer vision+NLP methods.

2 PAPERS • NO BENCHMARKS YET

Data Science Problems

…See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.

6 PAPERS • NO BENCHMARKS YET

M3KE (Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark)

…Chinese education system, ranging from the primary school to college, as well as a wide variety of subjects, including humanities, history, politics, law, education, psychology, science, technology, art

13 PAPERS • NO BENCHMARKS YET

TBCOV

…Several state-of-the-art deep learning models are used to enrich the data with important attributes, including sentiment labels, named-entities (e.g., mentions of persons, organizations, locations), user

1 PAPER • NO BENCHMARKS YET

XSum

…News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts).

27 PAPERS • 6 BENCHMARKS

DivEMT (Post-Editing Effort Across Typologically-diverse Languages)

…Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity.

1 PAPER • NO BENCHMARKS YET

SPARTQA (SPAtial Reasoning on Textual Question Answering)

…textual question answering benchmark for spatial reasoning on natural language text which contains more realistic spatial phenomena not covered by prior datasets and that is challenging for state-of-the-art

8 PAPERS • NO BENCHMARKS YET

OLPBENCH

OLPBENCH is a large Open Link Prediction benchmark, which was derived from the state-of-the-art Open Information Extraction corpus OPIEC (Gashteovski et al., 2019).

4 PAPERS • NO BENCHMARKS YET

TLUnified-NER

…We also conducted extensive empirical evaluation of state-of-the-art methods across supervised and transfer learning settings.

2 PAPERS • NO BENCHMARKS YET

Nordic Language Identification

…This paper presents a machine-learning approach for automatic language identification for the Nordic languages, which often suffer miscategorization by existing state-of-the-art tools.

1 PAPER • 1 BENCHMARK

AdversarialQA

…The adversarial human annotation paradigm ensures that these datasets consist of questions that current state-of-the-art models (at least the ones used as adversaries in the annotation loop) find challenging

24 PAPERS • 2 BENCHMARKS

WikiGraphs

…This makes it easy to benchmark against other state-of-the-art text generative models that are capable of generating long paragraphs of coherent text.

3 PAPERS • 1 BENCHMARK

STEM

…We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark.

1 PAPER • NO BENCHMARKS YET

Xhate999

…We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHate-999 as a comprehensive

3 PAPERS • NO BENCHMARKS YET

Belebele

…While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models.

17 PAPERS • NO BENCHMARKS YET

NetHack Learning Environment

…It is procedurally generated, rich in entities and dynamics, and overall an extremely challenging environment for current state-of-the-art RL agents, while being much cheaper to run compared to other challenging

19 PAPERS • 1 BENCHMARK

MPII Human Pose Descriptions

…These annotations are generated by various state-of-the-art language models (LLMs) and include detailed descriptions of the activities being performed, the count of people present, and their specific poses

1 PAPER • NO BENCHMARKS YET

NExT-GQA

…With NExT-GQA, we scrutinize a variety of state-of-the-art VLMs.

8 PAPERS • 1 BENCHMARK

TOPv2

TOPv2 (Task Oriented Parsing v2)

…e.g. event <tab> Art fairs this weekend in Detroit <tab> [IN:GET_EVENT [SL:CATEGORY_EVENT Art fairs ] [SL:DATE_TIME this weekend ] in [SL:LOCATION Detroit ] ] The low-resource splits used in our experiments

25 PAPERS • NO BENCHMARKS YET

WinoGAViL

…We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of

4 PAPERS • 2 BENCHMARKS

Chilean Waiting List

…This is for a fair comparison with actual state-of-the-art models.

4 PAPERS • 1 BENCHMARK

SIMMC2.0

…Our baseline model, powered by the state-of-the-art language model, shows promising results, and highlights new challenges and directions for the community to study.

12 PAPERS • 2 BENCHMARKS

BenchLMM (BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models)

…Utilizing BenchLMM, we comprehensively evaluate state-of-the-art LMMs and reveal: 1) LMMs generally suffer performance degradation when working with other styles; 2) An LMM performs better than another

9 PAPERS • 1 BENCHMARK

Multimodal Humor Dataset

Multimodal Humor Dataset (Multimodal Humor Dataset: Predicting Laughter Tracks for Sitcoms)

…We provide detailed analysis for the dataset design and further evaluate various state of the art baselines for solving this task.

0 PAPER • NO BENCHMARKS YET

Datasets

61 dataset results for art AND Texts