Search Results for author: Junbin Xiao

Found 12 papers, 11 papers with code

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

no code implementations1 Mar 2024 Jianwu Fang, Lei-Lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as accident reasons, prevention advice, and accident categories.

Object object-detection +3

Can I Trust Your Answer? Visually Grounded Video Question Answering

1 code implementation4 Sep 2023 Junbin Xiao, Angela Yao, Yicong Li, Tat Seng Chua

We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding.

Question Answering Video Grounding +2

Contrastive Video Question Answering via Video Graph Transformer

1 code implementation27 Feb 2023 Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning.

Ranked #11 on Video Question Answering on NExT-QA (using extra training data)

Contrastive Learning Question Answering +1

Equivariant and Invariant Grounding for Video Question Answering

1 code implementation26 Jul 2022 Yicong Li, Xiang Wang, Junbin Xiao, Tat-Seng Chua

Specifically, the equivariant grounding encourages the answering to be sensitive to the semantic changes in the causal scene and question; in contrast, the invariant grounding enforces the answering to be insensitive to the changes in the environment scene.

Question Answering Video Question Answering

Video Graph Transformer for Video Question Answering

1 code implementation12 Jul 2022 Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan

VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations, and dynamics for complex spatio-temporal reasoning; and 2) it exploits disentangled video and text Transformers for relevance comparison between the video and text to perform QA, instead of entangled cross-modal Transformer for answer classification.

Ranked #18 on Video Question Answering on NExT-QA (using extra training data)

Question Answering Relation +2

Invariant Grounding for Video Question Answering

1 code implementation CVPR 2022 Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua

At its core is understanding the alignments between visual scenes in video and linguistic semantics in question to yield the answer.

Question Answering Video Question Answering

Video Question Answering: Datasets, Algorithms and Challenges

1 code implementation2 Mar 2022 Yaoyao Zhong, Junbin Xiao, Wei Ji, Yicong Li, Weihong Deng, Tat-Seng Chua

Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.

Question Answering Video Question Answering

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

1 code implementation12 Dec 2021 Junbin Xiao, Angela Yao, Zhiyuan Liu, Yicong Li, Wei Ji, Tat-Seng Chua

To align with the multi-granular essence of linguistic concepts in language queries, we propose to model video as a conditional graph hierarchy which weaves together visual facts of different granularity in a level-wise manner, with the guidance of corresponding textual cues.

Question Answering Video Question Answering +1

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions

1 code implementation CVPR 2021 Junbin Xiao, Xindi Shang, Angela Yao, Tat-Seng Chua

We introduce NExT-QA, a rigorously designed video question answering (VideoQA) benchmark to advance video understanding from describing to explaining the temporal actions.

Question Answering Video Question Answering +2

NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions

2 code implementations18 May 2021 Junbin Xiao, Xindi Shang, Angela Yao, Tat-Seng Chua

We introduce NExT-QA, a rigorously designed video question answering (VideoQA) benchmark to advance video understanding from describing to explaining the temporal actions.

Question Answering Video Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.