Image Paragraph Captioning
5 papers with code • 1 benchmarks • 1 datasets
Image paragraph captioning involves generating a detailed, multi-sentence description of the content of an image.
Latest papers with no code
Enhancing image captioning with depth information using a Transformer-based framework
As a result, we propose a cleaned version of the NYU-v2 dataset that is more consistent and informative.
Bypass Network for Semantics Driven Image Paragraph Captioning
Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences.
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e. g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model.
Interactive Key-Value Memory-augmented Attention for Image Paragraph Captioning
In this paper, we propose an Interactive key-value Memory- augmented Attention model for image Paragraph captioning (IMAP) to keep track of the attention history (salient objects coverage information) along with the update-chain of the decoder state and therefore avoid generating repetitive or incomplete image descriptions.
Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
We propose irredundant attention in SSG-RNN to improve the possibility of abstracting topics from rarely described sub-graphs and inheriting attention in WSG-RNN to generate more grounded sentences with the abstracted topics, both of which give rise to more distinctive paragraphs.
Improving Diversity and Reducing Redundancy in Paragraph Captions
The paragraphs generated from standard image captioning models lack in language diversity and contain redundant information.
Dual-CNN: A Convolutional language decoder for paragraph image captioning
Abstract The task of paragraph image captioning aims to generate a coherent paragraph describing a given image.
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure.
Look Deeper See Richer: Depth-aware Image Paragraph Captioning
Existing image paragraph captioning methods give a series of sentences to represent the objects and regions of interests, where the descriptions are essentially generated by feeding the image fragments containing objects and regions into conventional image single-sentence captioning models.
Diverse and Coherent Paragraph Generation from Images
Paragraph generation from images, which has gained popularity recently, is an important task for video summarization, editing, and support of the disabled.