Temporal Localization

Most implemented papers

TALL: Temporal Activity Localization via Language Query

jiyanggao/TALL ICCV 2017

For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

demianzhang/weakly-action-localization CVPR 2018

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.

MAC: Mining Activity Concepts for Language-based Temporal Localization

runzhouge/MAC 21 Nov 2018

Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.

Asynchronous Temporal Fields for Action Recognition

gsig/temporal-fields CVPR 2017

Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it.

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

hangzhaomit/HACS-dataset ICCV 2019

This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.

Audio-Visual Event Localization in Unconstrained Videos

YapengTian/AVE-ECCV18 ECCV 2018

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.

Technical Report of the Video Event Reconstruction and Analysis (VERA) System -- Shooter Localization, Models, Interface, and Beyond

JunweiLiang/VERA_Shooter_Localization 26 May 2019

Among other uses, VERA enables the localization of a shooter from just a few videos that include the sound of gunshots.

Finding Moments in Video Collections Using Natural Language

jayleicn/TVRetrieval 30 Jul 2019

We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.

Egocentric Video-Language Pretraining

showlab/egovlp 3 Jun 2022

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

zhengshou/AutoLoc 4 Apr 2015

To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.