1 code implementation • 4 Feb 2023 • Min Peng, Chongyang Wang, Yu Shi, Xiang-Dong Zhou
This paper presents a new method for end-to-end Video Question Answering (VideoQA), aside from the current popularity of using large-scale pre-training with huge feature extractors.
1 code implementation • 9 May 2022 • Min Peng, Chongyang Wang, Yuan Gao, Yu Shi, Xiang-Dong Zhou
With a multiscale sampling, RMI iterates the interaction of appearance-motion information at each scale and the question embeddings to build the multilevel question-guided visual representations.
1 code implementation • 10 Sep 2021 • Min Peng, Chongyang Wang, Yuan Gao, Yu Shi, Xiang-Dong Zhou
Targeting these issues, this paper proposes a novel Temporal Pyramid Transformer (TPT) model with multimodal interaction for VideoQA.
1 code implementation • 25 Mar 2021 • Feng Lu, Baifan Chen, Xiang-Dong Zhou, Dezhen Song
Here we split the holistic mid-layer features into local features, and propose an adaptive dynamic time warping (DTW) algorithm to align local features from the spatial domain while measuring the distance between two images.
1 code implementation • 19 Sep 2020 • Min Peng, Chongyang Wang, Yuan Gao, Tao Bi, Tong Chen, Yu Shi, Xiang-Dong Zhou
As a spontaneous expression of emotion on face, micro-expression reveals the underlying emotion that cannot be controlled by human.