Image Paragraph Captioning
5 papers with code • 1 benchmarks • 1 datasets
Image paragraph captioning involves generating a detailed, multi-sentence description of the content of an image.
Latest papers
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Multimodal language generation, which leverages the synergy of language and vision, is a rapidly expanding field.
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extractor with a deep topic model to guide the learning of a language model.
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.
Training for Diversity in Image Paragraph Captioning
Image paragraph captioning models aim to produce detailed descriptions of a source image.
A Hierarchical Approach for Generating Descriptive Image Paragraphs
Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.