CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

no code implementations22 Sep 2022 Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

Analysis reveals the effectiveness of components and higher efficiency in long video grounding as our system improves the inference speed by 2x on Ego4d-NLQ and 15x on MAD while keeping the SOTA performance of CONE.

Contrastive Learning Video Grounding

Towards Train-Test Consistency for Semi-supervised Temporal Action Localization

no code implementations24 Oct 2019 Xudong Lin, Zheng Shou, Shih-Fu Chang

The inconsistent strategy makes it hard to explicitly supervise the action localization model with temporal boundary annotations at training time.

Multiple Instance Learning Video Classification +2

CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation

1 code implementation23 May 2019 Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang

In order to jointly capture the self-attention across multiple dimensions, including time, location and the sensor measurements, while maintain low computational complexity, we propose a novel approach called Cross-Dimensional Self-Attention (CDSA) to process each dimension sequentially, yet in an order-independent manner.

Imputation Machine Translation +1

AutoLoc: Weakly-supervised Temporal Action Localization

1 code implementation22 Jul 2018 Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang

In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Single Shot Temporal Action Detection

2 code implementations17 Oct 2017 Tianwei Lin, Xu Zhao, Zheng Shou

The main drawback of this framework is that the boundaries of action instance proposals have been fixed during the classification step.

Action Detection General Classification

ConvNet Architecture Search for Spatiotemporal Feature Learning

1 code implementation16 Aug 2017 Du Tran, Jamie Ray, Zheng Shou, Shih-Fu Chang, Manohar Paluri

Learning image representations with ConvNets by pre-training on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning.

Action Classification Action Recognition +5

Temporal Convolution Based Action Proposal: Submission to ActivityNet 2017

no code implementations21 Jul 2017 Tianwei Lin, Xu Zhao, Zheng Shou

Our approach achieves the state-of-the-art performances on both temporal action proposal task and temporal action localization task.

Action Classification General Classification +1

EventNet Version 1.1 Technical Report

no code implementations24 May 2016 Dongang Wang, Zheng Shou, Hongyi Liu, Shih-Fu Chang

Finally, EventNet version 1. 1 contains 67, 641 videos, 500 events, and 5, 028 event-specific concepts.

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

1 code implementation CVPR 2016 Zheng Shou, Dongang Wang, Shih-Fu Chang

To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes on the learned classification network to localize each action instance.

Action Classification Classification +3

