The composed video retrieval (CoVR) task is a new task, where the goal is to find a video that matches both a query image and a query text. The query image represents a visual concept that the user is interested in, and the query text specifies how the concept should be modified or refined. For example, given an image of a fountain and the text during show at night, the CoVR task is to retrieve a video that shows the fountain at night with a show.
Source: CoVR: Learning Composed Video Retrieval from Web Video CaptionsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Composed Video Retrieval (CoVR) | 2 | 18.18% |
Retrieval | 2 | 18.18% |
Video Retrieval | 2 | 18.18% |
Composed Image Retrieval (CoIR) | 1 | 9.09% |
Image Retrieval | 1 | 9.09% |
Language Modelling | 1 | 9.09% |
Large Language Model | 1 | 9.09% |
Zero-Shot Composed Image Retrieval (ZS-CIR) | 1 | 9.09% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |