2 code implementations • 24 May 2023 • Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang
We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues.
no code implementations • 5 Oct 2022 • Tianwen Qian, Ran Cui, Jingjing Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang
Considering the fact that the question often remains concentrated in a short temporal range, we propose to first locate the question to a segment in the video and then infer the answer using the located segment only.
1 code implementation • 20 Apr 2022 • Ran Cui, Tianwen Qian, Pai Peng, Elena Daskalaki, Jingjing Chen, Xiaowei Guo, Huyang Sun, Yu-Gang Jiang
Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor.