Dense Captioning

31 papers with code • 1 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

3D-LLM: Injecting the 3D World into Large Language Models

umass-foundation-model/3d-llm NeurIPS 2023

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Dense-Captioning Events in Videos

akoepke/audio-retrieval-benchmark ICCV 2017

We also introduce ActivityNet Captions, a large-scale benchmark for dense-captioning events.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

chenxinpeng/im2p CVPR 2017

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

renshuhuai-andy/timechat CVPR 2024

This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.

ComiCap: A VLMs pipeline for dense captioning of Comic Panels

emanuelevivoli/comicap 24 Sep 2024

The comic domain is rapidly advancing with the development of single- and multi-page analysis and synthesis models.

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

jcjohnson/densecap CVPR 2016

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language.

Dense Captioning with Joint Inference and Visual Context

linjieyangsc/densecap CVPR 2017

The goal is to densely detect visual concepts (e. g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase.

Joint Event Detection and Description in Continuous Video Streams

VisionLearningGroup/JEDDi-Net 28 Feb 2018

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

ttengwang/dense-video-captioning-pytorch 21 Jun 2020

This technical report presents a brief description of our submission to the dense video captioning task of ActivityNet Challenge 2020.

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

adymaharana/vlcstorygan 21 Oct 2021

Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance.