46 papers with code • 0 benchmarks • 3 datasets
These leaderboards are used to track progress in Temporal Localization
Most implemented papers
TALL: Temporal Activity Localization via Language Query
For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
MAC: Mining Activity Concepts for Language-based Temporal Localization
Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.
Asynchronous Temporal Fields for Action Recognition
Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it.
Audio-Visual Event Localization in Unconstrained Videos
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.
Technical Report of the Video Event Reconstruction and Analysis (VERA) System -- Shooter Localization, Models, Interface, and Beyond
Among other uses, VERA enables the localization of a shooter from just a few videos that include the sound of gunshots.
Finding Moments in Video Collections Using Natural Language
We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.
Accelerating COVID-19 Differential Diagnosis with Explainable Ultrasound Image Analysis
Controlling the COVID-19 pandemic largely hinges upon the existence of fast, safe, and highly-available diagnostic tools.
Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images
To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs
To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposal network identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes on the learned classification network to localize each action instance.