no code implementations • 16 Nov 2022 • Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan
This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022.
1 code implementation • 22 Sep 2022 • Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan
This paper tackles an emerging and challenging problem of long video temporal grounding~(VTG) that localizes video moments related to a natural language (NL) query.
no code implementations • ICCV 2021 • Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan
We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models.
1 code implementation • ECCV 2020 • Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou
To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action.
Ranked #5 on Weakly Supervised Action Localization on BEOID
no code implementations • 24 Oct 2019 • Xudong Lin, Zheng Shou, Shih-Fu Chang
The inconsistent strategy makes it hard to explicitly supervise the action localization model with temporal boundary annotations at training time.
2 code implementations • 23 May 2019 • Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang
In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.
no code implementations • CVPR 2019 • Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow.
Ranked #1 on Action Recognition on UCF-101
no code implementations • NeurIPS 2018 • Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Deep neural networks suffer from over-fitting and catastrophic forgetting when trained with small data.
no code implementations • ECCV 2018 • Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
Ranked #16 on Weakly Supervised Action Localization on ActivityNet-1.2 (mAP@0.5 metric)
Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1
1 code implementation • 22 Jul 2018 • Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang
In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization
no code implementations • ECCV 2018 • Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giro-i-Nieto, Shih-Fu Chang
We aim to tackle a novel task in action detection - Online Detection of Action Start (ODAS) in untrimmed, streaming videos.
2 code implementations • 17 Oct 2017 • Tianwei Lin, Xu Zhao, Zheng Shou
The main drawback of this framework is that the boundaries of action instance proposals have been fixed during the classification step.
1 code implementation • 16 Aug 2017 • Du Tran, Jamie Ray, Zheng Shou, Shih-Fu Chang, Manohar Paluri
Learning image representations with ConvNets by pre-training on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning.
Ranked #71 on Action Recognition on HMDB-51
no code implementations • 21 Jul 2017 • Tianwei Lin, Xu Zhao, Zheng Shou
Our approach achieves the state-of-the-art performances on both temporal action proposal task and temporal action localization task.
Ranked #11 on Temporal Action Proposal Generation on ActivityNet-1.3
1 code implementation • CVPR 2017 • Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang
Temporal action localization is an important yet challenging problem.
Ranked #27 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.6 metric)
no code implementations • 24 May 2016 • Dongang Wang, Zheng Shou, Hongyi Liu, Shih-Fu Chang
Finally, EventNet version 1. 1 contains 67, 641 videos, 500 events, and 5, 028 event-specific concepts.
1 code implementation • CVPR 2016 • Zheng Shou, Dongang Wang, Shih-Fu Chang
To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes on the learned classification network to localize each action instance.
Ranked #1 on Temporal Action Localization on MEXaction2