Composed Video Retrieval

Introduced by Ventura et al. in CoVR: Learning Composed Video Retrieval from Web Video Captions

The composed video retrieval (CoVR) task is a new task, where the goal is to find a video that matches both a query image and a query text. The query image represents a visual concept that the user is interested in, and the query text specifies how the concept should be modified or refined. For example, given an image of a fountain and the text during show at night, the CoVR task is to retrieve a video that shows the fountain at night with a show.

Source: CoVR: Learning Composed Video Retrieval from Web Video Captions

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Composed Video Retrieval (CoVR)	2	18.18%
Retrieval	2	18.18%
Video Retrieval	2	18.18%
Composed Image Retrieval (CoIR)	1	9.09%
Image Retrieval	1	9.09%
Language Modelling	1	9.09%
Large Language Model	1	9.09%
Zero-Shot Composed Image Retrieval (ZS-CIR)	1	9.09%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Video-Text Retrieval Models