Dense Captioning

23 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Dense Captioning

Trend	Dataset	Best Model	Paper	Code	Compare
	Visual Genome	ControlCap			See all

Datasets

Visual Genome

Most implemented papers

Most implemented Social Latest No code

3D-LLM: Injecting the 3D World into Large Language Models

umass-foundation-model/3d-llm • • NeurIPS 2023

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Paper
Code

Dense-Captioning Events in Videos

sangminwoo/explore-and-match • • ICCV 2017

We also introduce ActivityNet Captions, a large-scale benchmark for dense-captioning events.

Paper
Code

A Hierarchical Approach for Generating Descriptive Image Paragraphs

chenxinpeng/im2p • • CVPR 2017

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

Paper
Code

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

jcjohnson/densecap • • CVPR 2016

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language.

Paper
Code

Dense Captioning with Joint Inference and Visual Context

linjieyangsc/densecap • CVPR 2017

The goal is to densely detect visual concepts (e. g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase.

Paper
Code

Joint Event Detection and Description in Continuous Video Streams

VisionLearningGroup/JEDDi-Net • 28 Feb 2018

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Paper
Code

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

ttengwang/dense-video-captioning-pytorch • • 21 Jun 2020

This technical report presents a brief description of our submission to the dense video captioning task of ActivityNet Challenge 2020.

Paper
Code

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

adymaharana/vlcstorygan • • 21 Oct 2021

Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance.

Paper
Code