no code implementations • ICCV 2021 • Zhaofan Qiu, Ting Yao, Yan Shu, Chong-Wah Ngo, Tao Mei
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame.
no code implementations • CVPR 2021 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xiao-Ping Zhang, Dong Wu, Tao Mei
Video content is multifaceted, consisting of objects, scenes, interactions or actions.
no code implementations • CVPR 2021 • Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei
For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.
1 code implementation • ICCV 2021 • Rui Li, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei
To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning.
1 code implementation • 11 Jan 2022 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei
In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e. g., learning rate and the length of input clips, in each state.
1 code implementation • ECCV 2020 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.
3 code implementations • 3 Aug 2020 • Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei
In this paper, we compose a trilogy of exploring the basic and generic supervision in the sequence from spatial, spatiotemporal and sequential perspectives.
no code implementations • CVPR 2020 • Yiheng Zhang, Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Dong Liu, Tao Mei
In the view of extremely expensive expert labeling, recent research has shown that the models trained on photo-realistic synthetic data (e. g., computer games) with computer-generated annotations can be adapted to real images.
Ranked #10 on
Domain Adaptation
on SYNTHIA-to-Cityscapes
no code implementations • 31 Mar 2020 • Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei
It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.
no code implementations • 23 Sep 2019 • Zhaofan Qiu, Ting Yao, Yiheng Zhang, Yongdong Zhang, Tao Mei
Moreover, we enlarge the search space of SDAS particularly for video recognition by devising several unique operations to encode spatio-temporal dynamics and demonstrate the impact in affecting the architecture search of SDAS.
1 code implementation • CVPR 2019 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
Temporally localizing actions in a video is a fundamental challenge in video understanding.
no code implementations • CVPR 2019 • Yiheng Zhang, Zhaofan Qiu, Jingen Liu, Ting Yao, Dong Liu, Tao Mei
As a result, our CAS is able to search an optimized architecture with customized constraints.
no code implementations • 20 Jun 2019 • Fuchen Long, Qi Cai, Zhaofan Qiu, Zhijian Hou, Yingwei Pan, Ting Yao, Chong-Wah Ngo
This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019.
no code implementations • 14 Jun 2019 • Zhaofan Qiu, Dong Li, Yehao Li, Qi Cai, Yingwei Pan, Ting Yao
This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.
no code implementations • CVPR 2019 • Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei
Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.
Ranked #5 on
Action Recognition
on UCF101
no code implementations • ECCV 2018 • Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei
The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.
no code implementations • CVPR 2018 • Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei
The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets.
no code implementations • 23 Apr 2018 • Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, Tao Mei
In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions.
no code implementations • 23 Apr 2018 • Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei
Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream.
2 code implementations • ICCV 2017 • Zhaofan Qiu, Ting Yao, Tao Mei
In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.
Ranked #8 on
Action Recognition
on Sports-1M
no code implementations • CVPR 2017 • Zhaofan Qiu, Ting Yao, Tao Mei
In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner.
no code implementations • ICCV 2017 • Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.