ART consists of over 20k commonsense narrative contexts and 200k explanations.
9 PAPERS • NO BENCHMARKS YET
ArtEmis is a large-scale dataset aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. This dataset focuses on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. ArtEmis contains 439K emotion attributions and explanations from humans, on 81K artworks from WikiArt. Paper: ArtEmis: Affective Language for Visual Art
28 PAPERS • NO BENCHMARKS YET
ArtDL is a novel painting data set for iconography classification composed of images collected from online sources. Most of the paintings are from the Renaissance period and depict scenes or characters of Christian art.
1 PAPER • 1 BENCHMARK
…To fill this gap, we build a large dataset, ClimART, with more than \emph{10 million samples from present, pre-industrial, and future climate conditions}, based on the Canadian Earth System Model. ClimART poses several methodological challenges for the ML community, such as multiple out-of-distribution test sets, underlying domain physics, and a trade-off between accuracy and inference speed.
4 PAPERS • NO BENCHMARKS YET
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previous artwork datasets suffer from the long tail class distributions. Thirdly, ArtBench-10 is created with standardized data collection, annotation, filtering, and preprocessing procedures.
7 PAPERS • 1 BENCHMARK
Throughout the history of art, the pose—as the holistic abstraction of the human body's expression—has proven to be a constant in numerous studies. However, due to the enormous amount of data that so far had to be processed by hand, its crucial role to the formulaic recapitulation of art-historical motifs since antiquity could only be highlighted With the Poses of People in Art data set, we introduce the first openly licensed data set for estimating human poses in art and validating human pose estimators. It consists of 2,454 images from 22 art-historical depiction styles, including those that have increasingly turned away from lifelike representations of the body since the 19th century. Each image annotation, in addition to mandatory fields, provides metadata from the art-historical online encyclopedia WikiArt.
3 PAPERS • 1 BENCHMARK
This dataset comprises 1344 expert annotated images of muscle-tendon junctions recorded with 3 ultrasound imaging systems (Aixplorer V6, Esaote MyLab60, Telemed ArtUs), on 2 muscles (Lateral Gastrocnemius
2 PAPERS • NO BENCHMARKS YET
…It involves two challenging generative and multi-choice alternative selection tasks for the state-of-the-art NLP models to solve. Download the dataset using this link.
12 PAPERS • 4 BENCHMARKS
…Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.
72 PAPERS • NO BENCHMARKS YET
V-D4RL provides pixel-based analogues of the popular D4RL benchmarking tasks, derived from the dm_control suite, along with natural extensions of two state-of-the-art online pixel-based continuous control
…If trained on FaithDial, state-of-the-art dialogue models are significantly more faithful while also enhancing other dialogue aspects like cooperativeness, creativity and engagement.
12 PAPERS • NO BENCHMARKS YET
The FIGER dataset is an entity recognition dataset where entities are labelled using fine-grained system 112 tags, such as person/doctor, art/written_work and building/hotel.
96 PAPERS • 2 BENCHMARKS
The 2021 Kidney and Kidney Tumor Segmentation Challenge The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge
…Incorporating state-of-the-art definition generation models, it supports not only Chinese and English, but also Chinese-English cross-lingual queries.
1 PAPER • NO BENCHMARKS YET
…CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems
18 PAPERS • 1 BENCHMARK
…ST models trained with an addition of the corpus obtain new state-of-the-art results on the MuST-C English-German benchmark test set.
5 PAPERS • NO BENCHMARKS YET
…It aims to assess the ability of state-of-the-art representation models to reason over cross-lingual lexical-level concept alignment in context for 14 language pairs.
…locust detection to prevent invasion), and art (e.g., recreational art).
113 PAPERS • 2 BENCHMARKS
The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. The comprehensive benchmark of the proposed dataset utilizing state-of-the-art deepfake detection and localization methods indicates a significant drop in performance compared to previous datasets.
We address the computer-assisted search for prior art by creating a training dataset for supervised machine learning called PatentMatch.
…Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset
17 PAPERS • 2 BENCHMARKS
…While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI.
…(bachelor_of_arts, juris_doctor).
202 PAPERS • 3 BENCHMARKS
…Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art
…We also propose a benchmark of experiments using DemogPairs over state-of-the-art deep face recognition models in order to analyze their cross-demographic behavior and potential demographic biases (see
7 PAPERS • NO BENCHMARKS YET
…This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly.
2 PAPERS • 1 BENCHMARK
…See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.
6 PAPERS • NO BENCHMARKS YET
…News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts).
27 PAPERS • 6 BENCHMARKS
…Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity.
…textual question answering benchmark for spatial reasoning on natural language text which contains more realistic spatial phenomena not covered by prior datasets and that is challenging for state-of-the-art
…In contrast to prior efforts, the proposed database contains genuine and replayed recordings of voice commands obtained in realistic usage scenarios and using state-of-the-art voice assistant development
…The adversarial human annotation paradigm ensures that these datasets consist of questions that current state-of-the-art models (at least the ones used as adversaries in the annotation loop) find challenging
24 PAPERS • 2 BENCHMARKS
…This makes it easy to benchmark against other state-of-the-art text generative models that are capable of generating long paragraphs of coherent text.
…We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark.
…Difficulty of exploration, using states of the art algorithms and imitation to generate data for difficult environments. Real world challenges.
…We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHate-999 as a comprehensive
3 PAPERS • NO BENCHMARKS YET
…dataset is a large-scale image dataset that aims to include a diverse collection of real and synthetic images from multiple categories, including Human/Human Faces, Animal/Animal Faces, Places, Vehicles, Art including 13 GANs, 7 Diffusion, and 5 miscellaneous generators) Number of sources used for real images: 8 Categories included in the dataset: Human/Human Faces, Animal/Animal Faces, Places, Vehicles, Art
…In this paper, we have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.
…While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models.
20 PAPERS • NO BENCHMARKS YET
…These annotations are generated by various state-of-the-art language models (LLMs) and include detailed descriptions of the activities being performed, the count of people present, and their specific poses
…Moreover, eleven state-of-the-art algorithms are evaluated on the benchmark using two evaluation metrics, with detailed analysis provided for the evaluation results.
…e.g. event <tab> Art fairs this weekend in Detroit <tab> [IN:GET_EVENT [SL:CATEGORY_EVENT Art fairs ] [SL:DATE_TIME this weekend ] in [SL:LOCATION Detroit ] ] The low-resource splits used in our experiments
25 PAPERS • NO BENCHMARKS YET
…We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of
4 PAPERS • 2 BENCHMARKS
…We compare our synthetic dataset to state of the art real-world datasets for omnidirectional images.
…Their utility, however, primarily depends on whether the current state-of-the-art models can generalize across various tasks in the CC domain.
…the potential of the dataset and its labeled subset, we have additionally optimized a deep learning model (1D Fully Convolutional Network), achieving superior performance to the current state of the art
…Our baseline model, powered by the state-of-the-art language model, shows promising results, and highlights new challenges and directions for the community to study.
12 PAPERS • 2 BENCHMARKS
…Utilizing BenchLMM, we comprehensively evaluate state-of-the-art LMMs and reveal: 1) LMMs generally suffer performance degradation when working with other styles; 2) An LMM performs better than another
9 PAPERS • 1 BENCHMARK