Dense Video Captioning

25 papers with code • 4 benchmarks • 7 datasets

Most natural videos contain numerous events. For example, in a video of a “man playing a piano”, the video might also contain “another man dancing” or “a crowd clapping”. The task of dense video captioning involves both detecting and describing events in a video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Dense Video Captioning

Dataset	Best Model	Compare
ActivityNet Captions	Vid2Seq	See all
YouCook2	Vid2Seq (HowTo100M+VidChapters-7M PT)	See all
ViTT	Vid2Seq (VidChapters-7M PT)	See all
VidChapters-7M	Vid2Seq	See all

Datasets

Subtasks

Zero-shot dense video captioning

Most implemented papers

Most implemented Social Latest No code

OmniVid: A Generative Framework for Universal Video Understanding

wangjk666/omnivid • • 26 Mar 2024

The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution.

Paper
Code

Streaming Dense Video Captioning

google-research/scenic • • 1 Apr 2024

An ideal model for dense video captioning -- predicting captions localized temporally in a video -- should be able to handle long input videos, predict rich, detailed textual descriptions, and be able to produce outputs before processing the entire video.

Paper
Code

Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis

ucf-sst-lab/aicity2024cvprw • • 12 Apr 2024

Our solution mainly focuses on the following points: 1) To solve dense video captioning, we leverage the framework of dense video captioning with parallel decoding (PDVC) to model visual-language sequences and generate dense caption by chapters for video.

Paper
Code

TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning

quangminhdinh/trafficvlm • • 14 Apr 2024

Traffic video description and analysis have received much attention recently due to the growing demand for efficient and reliable urban surveillance systems.

Paper
Code

Dense Video Captioning

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

OmniVid: A Generative Framework for Universal Video Understanding

Streaming Dense Video Captioning

Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis

TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning

Content

Benchmarks

Add a Result