Search Results for author: Junbin Xiao

Found 12 papers, 11 papers with code

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

no code implementations • 1 Mar 2024 • Jianwu Fang, Lei-Lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as accident reasons, prevention advice, and accident categories.

Object object-detection +3

Paper
Add Code

Can I Trust Your Answer? Visually Grounded Video Question Answering

1 code implementation • 4 Sep 2023 • Junbin Xiao, Angela Yao, Yicong Li, Tat Seng Chua

We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding.

Question Answering Video Grounding +2

Paper
Code

Discovering Spatio-Temporal Rationales for Video Question Answering

1 code implementation • ICCV 2023 • Yicong Li, Junbin Xiao, Chun Feng, Xiang Wang, Tat-Seng Chua

We then conduct extensive studies to verify the importance of STR as well as the proposed answer interaction mechanism.

Question Answering Video Question Answering

Paper
Code

Contrastive Video Question Answering via Video Graph Transformer

1 code implementation • 27 Feb 2023 • Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning.

Ranked #11 on Video Question Answering on NExT-QA (using extra training data)

Contrastive Learning Question Answering +1

Paper
Code

Equivariant and Invariant Grounding for Video Question Answering

1 code implementation • 26 Jul 2022 • Yicong Li, Xiang Wang, Junbin Xiao, Tat-Seng Chua

Specifically, the equivariant grounding encourages the answering to be sensitive to the semantic changes in the causal scene and question; in contrast, the invariant grounding enforces the answering to be insensitive to the changes in the environment scene.

Question Answering Video Question Answering

Paper
Code

Video Graph Transformer for Video Question Answering

1 code implementation • 12 Jul 2022 • Junbin Xiao, Pan Zhou, Tat-Seng Chua, Shuicheng Yan

VGT's uniqueness are two-fold: 1) it designs a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations, and dynamics for complex spatio-temporal reasoning; and 2) it exploits disentangled video and text Transformers for relevance comparison between the video and text to perform QA, instead of entangled cross-modal Transformer for answer classification.

Ranked #18 on Video Question Answering on NExT-QA (using extra training data)

Question Answering Relation +2

Paper
Code

Invariant Grounding for Video Question Answering

1 code implementation • CVPR 2022 • Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua

At its core is understanding the alignments between visual scenes in video and linguistic semantics in question to yield the answer.

Question Answering Video Question Answering

Paper
Code

Video Question Answering: Datasets, Algorithms and Challenges

1 code implementation • 2 Mar 2022 • Yaoyao Zhong, Junbin Xiao, Wei Ji, Yicong Li, Weihong Deng, Tat-Seng Chua

Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.

Question Answering Video Question Answering

Paper
Code

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

1 code implementation • 12 Dec 2021 • Junbin Xiao, Angela Yao, Zhiyuan Liu, Yicong Li, Wei Ji, Tat-Seng Chua

To align with the multi-granular essence of linguistic concepts in language queries, we propose to model video as a conditional graph hierarchy which weaves together visual facts of different granularity in a level-wise manner, with the guidance of corresponding textual cues.

Ranked #23 on Video Question Answering on NExT-QA

Question Answering Video Question Answering +1

Paper
Code

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions

1 code implementation • CVPR 2021 • Junbin Xiao, Xindi Shang, Angela Yao, Tat-Seng Chua

We introduce NExT-QA, a rigorously designed video question answering (VideoQA) benchmark to advance video understanding from describing to explaining the temporal actions.

Question Answering Video Question Answering +2

Paper
Code

NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions

2 code implementations • 18 May 2021 • Junbin Xiao, Xindi Shang, Angela Yao, Tat-Seng Chua

We introduce NExT-QA, a rigorously designed video question answering (VideoQA) benchmark to advance video understanding from describing to explaining the temporal actions.

Question Answering Video Question Answering +2

Paper
Code

Visual Relation Grounding in Videos

1 code implementation • ECCV 2020 • Junbin Xiao, Xindi Shang, Xun Yang, Sheng Tang, Tat-Seng Chua

In this paper, we explore a novel task named visual Relation Grounding in Videos (vRGV).

Question Answering Relation +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.