The IMAGE-CHAT dataset is a large collection of (image, style trait for speaker A, style trait for speaker B, dialogue between A & B) tuples that we collected using crowd-workers, Each dialogue consists of consecutive turns by speaker A and B. No particular constraints are placed on the kinds of utterance, only that we ask the speakers to both use the provided style trait, and to respond to the given image and dialogue history in an engaging way. The goal is not just to build a diagnostic dataset but a basis for training models that humans actually want to engage with.
27 PAPERS • 2 BENCHMARKS
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
26 PAPERS • 6 BENCHMARKS
TripClick is a large-scale dataset of click logs in the health domain, obtained from user interactions of the Trip Database health web search engine.
15 PAPERS • NO BENCHMARKS YET
DIOR-RSVG is a large-scale benchmark dataset of remote sensing data (RSVG). It aims to localize the referred objects in remote sensing (RS) images with the guidance of natural language. This new dataset includes image/expression/box triplets for training and evaluating visual grounding models.
7 PAPERS • NO BENCHMARKS YET
DialogCC is a large-scale multi-modal dialogue dataset, which covers diverse real-world topics and various images per dialogue. It contains 651k unique images and is designed for image and text retrieval tasks.
1 PAPER • NO BENCHMARKS YET