The WebVid-CoVR dataset is a collection of video-text-video triplets that can be used for the task of composed video retrieval (CoVR). CoVR is a task that involves searching for videos that match both a query image and a query text. The text typically specifies the desired modification to the query image.
2 PAPERS • 1 BENCHMARK
CIRCO (Composed Image Retrieval on Common Objects in context) is an open-domain benchmarking dataset for Composed Image Retrieval (CIR) based on real-world images from COCO 2017 unlabeled set. It is the first CIR dataset with multiple ground truths and aims to address the problem of false negatives in existing datasets. CIRCO comprises a total of 1020 queries, randomly divided into 220 and 800 for the validation and test set, respectively, with an average of 4.53 ground truths per query.
11 PAPERS • 1 BENCHMARK
Large Scale Composed Image Retrieval (LaSCo) is a new dataset for Composed Image Retrieval (CoIR), x10 times larger than current ones.
Composed Image Retrieval (or, Image Retreival conditioned on Language Feedback) is a relatively new retrieval task, where an input query consists of an image and short textual description of how to modify the image.
31 PAPERS • 3 BENCHMARKS
Fashion IQ support and advance research on interactive fashion image retrieval. Fashion IQ is the first fashion dataset to provide human-generated captions that distinguish similar pairs of garment images together with side-information consisting of real-world product descriptions and derived visual attribute labels for these images.
65 PAPERS • 5 BENCHMARKS