Image Paragraph Captioning
5 papers with code • 1 benchmarks • 1 datasets
Image paragraph captioning involves generating a detailed, multi-sentence description of the content of an image.
Most implemented papers
A Hierarchical Approach for Generating Descriptive Image Paragraphs
Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.
Training for Diversity in Image Paragraph Captioning
Image paragraph captioning models aim to produce detailed descriptions of a source image.
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extractor with a deep topic model to guide the learning of a language model.
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Multimodal language generation, which leverages the synergy of language and vision, is a rapidly expanding field.