Dense Video Captioning

25 papers with code • 4 benchmarks • 7 datasets

Most natural videos contain numerous events. For example, in a video of a “man playing a piano”, the video might also contain “another man dancing” or “a crowd clapping”. The task of dense video captioning involves both detecting and describing events in a video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Dense Video Captioning

Dataset	Best Model	Compare
ActivityNet Captions	Vid2Seq	See all
YouCook2	Vid2Seq (HowTo100M+VidChapters-7M PT)	See all
ViTT	Vid2Seq (VidChapters-7M PT)	See all
VidChapters-7M	Vid2Seq	See all

Datasets

Subtasks

Zero-shot dense video captioning

Latest papers with no code

Most implemented Social Latest No code

The 8th AI City Challenge

no code yet • 15 Apr 2024

The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities.

Paper
Add Code

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

no code yet • 3 Apr 2024

We present Dive Into the BoundarieS (DIBS), a novel pretraining framework for dense video captioning (DVC), that elaborates on improving the quality of the generated event captions and their associated pseudo event boundaries from unlabeled videos.

Paper
Add Code

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

no code yet • 28 Nov 2023

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.

Paper
Add Code

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

no code yet • 5 Nov 2023

Dense Video Captioning (DVC) aims at detecting and describing different events in a given video.

Paper
Add Code

Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges

no code yet • 25 Sep 2023

Furthermore, we benchmark SOTA models for four multimodal tasks on this newly created dataset, which serve as new baselines for surveillance video-and-language understanding.

Paper
Add Code

VidChapters-7M: Video Chapters at Scale

no code yet • NeurIPS 2023

To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total.

Paper
Add Code

Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment

no code yet • 5 Jul 2023

This is accomplished by introducing a soft moment mask that represents a temporal segment in the video and jointly optimizing it with the prefix parameters of a language model.

Paper
Add Code

Visual Transformation Telling

no code yet • 3 May 2023

In this paper, we propose a new visual reasoning task, called Visual Transformation Telling (VTT).

Paper
Add Code

A Review of Deep Learning for Video Captioning

no code yet • 22 Apr 2023

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction.

Paper
Add Code

SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries

no code yet • 10 Apr 2023

By providing broadcasters with a tool to summarize the content of their video with the same level of engagement as a live game, our method could help satisfy the needs of the numerous fans who follow their team but cannot necessarily watch the live game.

Paper
Add Code

Dense Video Captioning

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result