no code implementations • 16 Jun 2022 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid
Manual annotation of question and answers for videos, however, is tedious and prohibits scalability.
1 code implementation • 10 May 2022 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid
We use our method to generate the WebVidVQA3M dataset from the WebVid dataset, i. e., videos with alt-text annotations, and show its benefits for training VideoQA models.
1 code implementation • CVPR 2022 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid
We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query.
Ranked #1 on
Spatio-Temporal Video Grounding
on VidSTG
Language-Based Temporal Localization
Natural Language Visual Grounding
+5
1 code implementation • ICCV 2021 • Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid
In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision.
Ranked #1 on
Video Question Answering
on iVQA
(using extra training data)
1 code implementation • ICLR 2020 • Antoine Yang, Pedro M. Esperança, Fabio M. Carlucci
As such, and due to the under-use of ablation studies, there is a lack of clarity regarding why certain methods are more effective than others.
no code implementations • 3 Sep 2019 • Fabio Maria Carlucci, Pedro M. Esperança, Marco Singh, Victor Gabillon, Antoine Yang, Hang Xu, Zewei Chen, Jun Wang
The Neural Architecture Search (NAS) problem is typically formulated as a graph search problem where the goal is to learn the optimal operations over edges in order to maximise a graph-level global objective.