Temporal Localization

55 papers with code • 0 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Temporal Localization models and implementations

LITA: Language Instructed Temporal-Localization Assistant

nvlabs/lita 27 Mar 2024

In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task.

105
27 Mar 2024

Skeleton-Based Human Action Recognition with Noisy Labels

xuyizdby/noiseerasar 15 Mar 2024

In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark.

2
15 Mar 2024

Semi-supervised Active Learning for Video Action Detection

akash2907/semi-sup-active-learning 12 Dec 2023

First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21.

0
12 Dec 2023

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

renshuhuai-andy/timechat 4 Dec 2023

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

179
04 Dec 2023

UnLoc: A Unified Framework for Video Localization Tasks

google-research/scenic ICCV 2023

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

2,996
21 Aug 2023

VideoGLUE: Video General Understanding Evaluation of Foundation Models

tensorflow/models 6 Jul 2023

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

76,598
06 Jul 2023

Dense Video Object Captioning from Disjoint Supervision

google-research/scenic 20 Jun 2023

We propose a new task and model for dense video object captioning -- detecting, tracking and captioning trajectories of objects in a video.

2,996
20 Jun 2023

Self-Chained Image-Language Model for Video Localization and Question Answering

yui010206/sevila NeurIPS 2023

SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.

162
11 May 2023

Unsupervised classification to improve the quality of a bird song recording dataset

ear-team/bambird 15 Feb 2023

We first showed that the segmentation of bird songs alone aggregated from 10% to 83% of label noise depending on the species.

18
15 Feb 2023

Multi-Task Learning of Object State Changes from Uncurated Videos

soCzech/MultiTaskObjectStates 24 Nov 2022

We aim to learn to temporally localize object state changes and the corresponding state-modifying actions by observing people interacting with objects in long uncurated web videos.

8
24 Nov 2022