ART consists of over 20k commonsense narrative contexts and 200k explanations.
9 PAPERS • NO BENCHMARKS YET
ArtEmis is a large-scale dataset aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. This dataset focuses on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. ArtEmis contains 439K emotion attributions and explanations from humans, on 81K artworks from WikiArt. Paper: ArtEmis: Affective Language for Visual Art
28 PAPERS • NO BENCHMARKS YET
…It involves two challenging generative and multi-choice alternative selection tasks for the state-of-the-art NLP models to solve. Download the dataset using this link.
11 PAPERS • 4 BENCHMARKS
…Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.
76 PAPERS • NO BENCHMARKS YET
…If trained on FaithDial, state-of-the-art dialogue models are significantly more faithful while also enhancing other dialogue aspects like cooperativeness, creativity and engagement.
12 PAPERS • NO BENCHMARKS YET
The FIGER dataset is an entity recognition dataset where entities are labelled using fine-grained system 112 tags, such as person/doctor, art/written_work and building/hotel.
96 PAPERS • 2 BENCHMARKS
…Incorporating state-of-the-art definition generation models, it supports not only Chinese and English, but also Chinese-English cross-lingual queries.
1 PAPER • NO BENCHMARKS YET
…CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems
19 PAPERS • 1 BENCHMARK
…It aims to assess the ability of state-of-the-art representation models to reason over cross-lingual lexical-level concept alignment in context for 14 language pairs.
2 PAPERS • NO BENCHMARKS YET
We address the computer-assisted search for prior art by creating a training dataset for supervised machine learning called PatentMatch.
…Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset
17 PAPERS • 2 BENCHMARKS
…(bachelor_of_arts, juris_doctor).
204 PAPERS • 3 BENCHMARKS
…While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI.
3 PAPERS • NO BENCHMARKS YET
…Japanese-English) corpus of patent abstracts, extracted from the MAREC patent data, and the data from the NTCIR PatentMT workshop collections, accompanied with relevance judgements for the task of patent prior-art
…See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.
6 PAPERS • NO BENCHMARKS YET
…News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts).
27 PAPERS • 6 BENCHMARKS
…Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity.
…textual question answering benchmark for spatial reasoning on natural language text which contains more realistic spatial phenomena not covered by prior datasets and that is challenging for state-of-the-art
10 PAPERS • NO BENCHMARKS YET
…The adversarial human annotation paradigm ensures that these datasets consist of questions that current state-of-the-art models (at least the ones used as adversaries in the annotation loop) find challenging
24 PAPERS • 2 BENCHMARKS
…This makes it easy to benchmark against other state-of-the-art text generative models that are capable of generating long paragraphs of coherent text.
3 PAPERS • 1 BENCHMARK
…We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark.
…We conduct a series of domain- and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHate-999 as a comprehensive
…While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models.
23 PAPERS • NO BENCHMARKS YET
…These annotations are generated by various state-of-the-art language models (LLMs) and include detailed descriptions of the activities being performed, the count of people present, and their specific poses
…This dataset aims to facilitate research in multimodal machine learning for oncology by providing embeddings generated using state-of-the-art models such as GatorTron, SeNMo, and UNI.
…e.g. event <tab> Art fairs this weekend in Detroit <tab> [IN:GET_EVENT [SL:CATEGORY_EVENT Art fairs ] [SL:DATE_TIME this weekend ] in [SL:LOCATION Detroit ] ] The low-resource splits used in our experiments
25 PAPERS • NO BENCHMARKS YET
…We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of
4 PAPERS • 2 BENCHMARKS
…Our baseline model, powered by the state-of-the-art language model, shows promising results, and highlights new challenges and directions for the community to study.
12 PAPERS • 2 BENCHMARKS
…Utilizing BenchLMM, we comprehensively evaluate state-of-the-art LMMs and reveal: 1) LMMs generally suffer performance degradation when working with other styles; 2) An LMM performs better than another
9 PAPERS • 1 BENCHMARK
…Qualitative and quantitative experiments demonstrate metrics' validness, ground truth data quality, and baseline's state-of-the-art performance.
39 PAPERS • 1 BENCHMARK
…Furthermore, we provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model, to further advance the field in characterizing the discussions within these communities
…We provide detailed analysis for the dataset design and further evaluate various state of the art baselines for solving this task.
0 PAPER • NO BENCHMARKS YET
…Experiments demonstrate that EMAGE generates holistic gestures with state-of-the-art performance and is flexible in accepting predefined spatial-temporal gesture inputs, generating complete, audio-synchronized
8 PAPERS • 2 BENCHMARKS
…playing Go, generating art, ChatGPT, etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills?
…Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new pathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal
4 PAPERS • NO BENCHMARKS YET
…Relevant footnotes: - The echo is found in tweets written in multiple languages, particularly in East-Asian languages of which the user based is known for heavy use of ascii art and kaomoji (McCulloch
…Support in performing linguistic processing are provided in the form of analyses created by various state-of-the art tools on the dataset texts.
The Dialog State Tracking Challenges 2 & 3 (DSTC2&3) were research challenge focused on improving the state of the art in tracking the state of spoken dialog systems.
29 PAPERS • 2 BENCHMARKS
…Biology, Astronomy, Geology, Computer Science, Engineering, Environmental Science, Neuroscience, Robotics | | History and Culture | Ancient History, Medieval History, Modern History, World History, Art
6 PAPERS • 1 BENCHMARK