7 dataset results for Language Modelling AND Images

The SentiCap dataset contains several thousand images with captions with positive and negative sentiments. These sentimental captions are constructed by the authors by re-writing factual descriptions. In total there are 2000+ sentimental captions.

26 PAPERS • NO BENCHMARKS YET

CASIA-HWDB

CASIA-HWDB is a dataset for handwritten Chinese character recognition. It contains 300 files (240 in HWDB1.1 training set and 60 in HWDB1.1 test set). Each file contains about 3000 isolated gray-scale Chinese character images written by one writer, as well as their corresponding labels.

17 PAPERS • NO BENCHMARKS YET

OVAD benchmark (Open-Vocabulary Attribute Detection)

Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute dete

13 PAPERS • 3 BENCHMARKS

Open-Platypus

Open-Platypus is a family of fine-tuned and merged Large Language Models (LLMs) that achieves the strongest performance and currently stands at first place in HuggingFace's Open LLM Leaderboard.

8 PAPERS • NO BENCHMARKS YET

Tencent ML-Images

Tencent ML-Images is a large open-source multi-label image database, including 17,609,752 training and 88,739 validation image URLs, which are annotated with up to 11,166 categories.

5 PAPERS • NO BENCHMARKS YET

Kite

The Kite database is a multi-modal dataset for the control of unmanned aerial vehicles (UAVs). There are three modalities present in the dataset:

1 PAPER • NO BENCHMARKS YET

SVLD (Social Vision and Language Dataset)

The social vision and language dataset is a large-scale multimodal dataset designed for research into social contextual learning.

1 PAPER • NO BENCHMARKS YET

Datasets

7 dataset results for Language Modelling AND Images