Visual Storytelling

25 papers with code • 1 benchmarks • 4 datasets

( Image credit: No Metrics Are Perfect )

Latest papers with no code

Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

no code yet • 3 May 2023

Recent advances in large language models elicit reasoning in a chain-of-thought that allows models to decompose problems in a human-like fashion.

Visual Transformation Telling

no code yet • 3 May 2023

In this paper, we propose a new visual reasoning task, called Visual Transformation Telling (VTT).

A-CAP: Anticipation Captioning with Commonsense Knowledge

no code yet • CVPR 2023

Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time.

Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

no code yet • 20 Jan 2023

The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence.

A survey on knowledge-enhanced multimodal learning

no code yet • 19 Nov 2022

Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation.

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

no code yet • 28 Oct 2022

To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format.

Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

no code yet • 26 Oct 2022

We present Bloom Library, a linguistically diverse set of multimodal and multilingual datasets for language modeling, image captioning, visual storytelling, and speech synthesis/recognition.

Vision Transformer Based Model for Describing a Set of Images as a Story

no code yet • 6 Oct 2022

Visual Story-Telling is the process of forming a multi-sentence story from a set of images.

Coherent Visual Storytelling via Parallel Top-Down Visual and Topic Attention

no code yet • IEEE Transactions on Circuits and Systems for Video Technology 2022

In this work, a coherent visual storytelling (CoVS) framework is designed to address the above-mentioned problems.

SentiStory: A Multi-Layered Sentiment-Aware Generative Model for Visual Storytelling

no code yet • IEEE Transactions on Circuits and Systems for Video Technology 2022

The visual storytelling (VIST) task aims at generating reasonable, human-like and coherent stories with the image streams as input.