no code implementations • 11 Jul 2022 • Jinbin Bai, Chunhui Liu, Feiyue Ni, Haofan Wang, Mengying Hu, Xiaofeng Guo, Lele Cheng
To overcome the above issue, we present a novel mechanism for learning the translation relationship from a source modality space $\mathcal{S}$ to a target modality space $\mathcal{T}$ without the need for a joint latent space, which bridges the gap between visual and textual domains.
no code implementations • 29 May 2021 • Zhe Wang, Hao Chen, Xinyu Li, Chunhui Liu, Yuanjun Xiong, Joseph Tighe, Charless Fowlkes
However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset.
no code implementations • ICCV 2021 • Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, Joseph Tighe
We first introduce the vanilla video transformer and show that transformer module is able to perform spatio-temporal modeling from raw pixels, but with heavy memory usage.
Ranked #14 on
Action Classification
on Charades
1 code implementation • CVPR 2022 • Jiaojiao Zhao, Yanyi Zhang, Xinyu Li, Hao Chen, Shuai Bing, Mingze Xu, Chunhui Liu, Kaustav Kundu, Yuanjun Xiong, Davide Modolo, Ivan Marsic, Cees G. M. Snoek, Joseph Tighe
We propose TubeR: a simple solution for spatio-temporal video action detection.
no code implementations • ICCV 2021 • Chunhui Liu, Xinyu Li, Hao Chen, Davide Modolo, Joseph Tighe
In this work, we focus on improving the inference efficiency of current action recognition backbones on trimmed videos, and illustrate that one action model can also cover then informative region by dropping non-informative features.
no code implementations • 15 Dec 2020 • Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Hao Chen, Joseph Tighe
In the world of action recognition research, one primary focus has been on how to construct and train networks to model the spatial-temporal volume of an input video.
1 code implementation • 11 Dec 2020 • Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo wu, Zhi Zhang, Joseph Tighe, R. Manmatha, Mu Li
Video action recognition is one of the representative tasks for video understanding.
no code implementations • 24 Feb 2020 • Ye Li, Guangqiang Yin, Chunhui Liu, Xiaoyu Yang, Zhiguo Wang
Triplet loss processes batch construction in a complicated and fussy way and converges slowly.
no code implementations • 29 Nov 2017 • Victor Fragoso, Chunhui Liu, Aayush Bansal, Deva Ramanan
We present compositional nearest neighbors (CompNN), a simple approach to visually interpreting distributed representations learned by a convolutional neural network (CNN) for pixel-level tasks (e. g., image synthesis and segmentation).
no code implementations • 22 Mar 2017 • Chunhui Liu, Yueyu Hu, Yanghao Li, Sijie Song, Jiaying Liu
Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos.