R2VQ is a dataset designed for testing competence-based comprehension of machines over a multimodal recipe collection, which contains text-video aligned recipes.
A total of 51,331 cooking events are annotated, which contain 19,201 explicit ingredients, 16,338 implicit ingredients, 12,316 explicit props, and 11,868 implicit props.
Paper | Code | Results | Date | Stars |
---|