Dense Video Captioning

25 papers with code • 4 benchmarks • 7 datasets

Most natural videos contain numerous events. For example, in a video of a “man playing a piano”, the video might also contain “another man dancing” or “a crowd clapping”. The task of dense video captioning involves both detecting and describing events in a video.

Latest papers with no code

The 8th AI City Challenge

no code yet • 15 Apr 2024

The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities.

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

no code yet • 3 Apr 2024

We present Dive Into the BoundarieS (DIBS), a novel pretraining framework for dense video captioning (DVC), that elaborates on improving the quality of the generated event captions and their associated pseudo event boundaries from unlabeled videos.

Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

no code yet • 28 Nov 2023

We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

no code yet • 5 Nov 2023

Dense Video Captioning (DVC) aims at detecting and describing different events in a given video.

Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges

no code yet • 25 Sep 2023

Furthermore, we benchmark SOTA models for four multimodal tasks on this newly created dataset, which serve as new baselines for surveillance video-and-language understanding.

VidChapters-7M: Video Chapters at Scale

no code yet • NeurIPS 2023

To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total.

Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment

no code yet • 5 Jul 2023

This is accomplished by introducing a soft moment mask that represents a temporal segment in the video and jointly optimizing it with the prefix parameters of a language model.

Visual Transformation Telling

no code yet • 3 May 2023

In this paper, we propose a new visual reasoning task, called Visual Transformation Telling (VTT).

A Review of Deep Learning for Video Captioning

no code yet • 22 Apr 2023

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction.

SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries

no code yet • 10 Apr 2023

By providing broadcasters with a tool to summarize the content of their video with the same level of engagement as a live game, our method could help satisfy the needs of the numerous fans who follow their team but cannot necessarily watch the live game.