Computer Vision

Temporal Localization

55 papers with code • 0 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Temporal Localization

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Temporal Localization models and implementations

google-research/scenic

2 papers

2,996

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

LITA: Language Instructed Temporal-Localization Assistant

nvlabs/lita • • 27 Mar 2024

In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task.

105

27 Mar 2024

Paper
Code

Skeleton-Based Human Action Recognition with Noisy Labels

xuyizdby/noiseerasar • 15 Mar 2024

In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark.

15 Mar 2024

Paper
Code

Semi-supervised Active Learning for Video Action Detection

akash2907/semi-sup-active-learning • • 12 Dec 2023

First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21.

12 Dec 2023

Paper
Code

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

renshuhuai-andy/timechat • • 4 Dec 2023

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

179

04 Dec 2023

Paper
Code

UnLoc: A Unified Framework for Video Localization Tasks

google-research/scenic • • ICCV 2023

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

2,996

21 Aug 2023

Paper
Code

VideoGLUE: Video General Understanding Evaluation of Foundation Models

tensorflow/models • • 6 Jul 2023

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

76,598

06 Jul 2023

Paper
Code

Dense Video Object Captioning from Disjoint Supervision

google-research/scenic • • 20 Jun 2023

We propose a new task and model for dense video object captioning -- detecting, tracking and captioning trajectories of objects in a video.

2,996

20 Jun 2023

Paper
Code

Self-Chained Image-Language Model for Video Localization and Question Answering

yui010206/sevila • • NeurIPS 2023

SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2.

162

11 May 2023

Paper
Code

Unsupervised classification to improve the quality of a bird song recording dataset

ear-team/bambird • 15 Feb 2023

We first showed that the segmentation of bird songs alone aggregated from 10% to 83% of label noise depending on the species.

15 Feb 2023

Paper
Code

Multi-Task Learning of Object State Changes from Uncurated Videos

soCzech/MultiTaskObjectStates • • 24 Nov 2022

We aim to learn to temporally localize object state changes and the corresponding state-modifying actions by observing people interacting with objects in long uncurated web videos.

24 Nov 2022

Paper
Code

Temporal Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result