Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

1 code implementation CVPR 2022 Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang

Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module.

Language Modelling Natural Language Queries +1

Glance and Focus Networks for Dynamic Visual Recognition

1 code implementation9 Jan 2022 Gao Huang, Yulin Wang, Kangchen Lv, Haojun Jiang, Wenhui Huang, Pengfei Qi, Shiji Song

Spatial redundancy widely exists in visual recognition tasks, i. e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand.

Image Classification Video Recognition

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

1 code implementation CVPR 2022 Yulin Wang, Yang Yue, Yuanze Lin, Haojun Jiang, Zihang Lai, Victor Kulikov, Nikita Orlov, Humphrey Shi, Gao Huang

Recent works have shown that the computational efficiency of video recognition can be significantly improved by reducing the spatial redundancy.

Video Recognition

Adaptive Focus for Efficient Video Recognition

1 code implementation ICCV 2021 Yulin Wang, Zhaoxi Chen, Haojun Jiang, Shiji Song, Yizeng Han, Gao Huang

In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency.

Video Recognition

