Dense Captioning

23 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

3D-LLM: Injecting the 3D World into Large Language Models

umass-foundation-model/3d-llm NeurIPS 2023

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Dense-Captioning Events in Videos

sangminwoo/explore-and-match ICCV 2017

We also introduce ActivityNet Captions, a large-scale benchmark for dense-captioning events.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

chenxinpeng/im2p CVPR 2017

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

jcjohnson/densecap CVPR 2016

We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language.

Dense Captioning with Joint Inference and Visual Context

linjieyangsc/densecap CVPR 2017

The goal is to densely detect visual concepts (e. g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase.

Joint Event Detection and Description in Continuous Video Streams

VisionLearningGroup/JEDDi-Net 28 Feb 2018

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

ttengwang/dense-video-captioning-pytorch 21 Jun 2020

This technical report presents a brief description of our submission to the dense video captioning task of ActivityNet Challenge 2020.

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization

adymaharana/vlcstorygan 21 Oct 2021

Prior work in this domain has shown that there is ample room for improvement in the generated image sequence in terms of visual quality, consistency and relevance.

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

curryyuan/x-trans2cap CVPR 2022

Thus, a more faithful caption can be generated only using point clouds during the inference.

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

SxJyJay/MORE 10 Mar 2022

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.