VILT is a new benchmark collection of tasks and multimodal video content. The video linking collection includes annotations from 10 (recipe) tasks, which the annotators chose from a random subset of the collection of 2,275 high-quality 'Wholefoods' recipes. There are linking annotations for 61 query steps across these tasks which contain cooking techniques, chosen from the 189 total recipe steps. As each method results in approximately 10 videos to annotate, the collection consists of 831 linking judgments.
Source: VILT: Video Instructions Linking for Complex TasksPaper | Code | Results | Date | Stars |
---|