Visual Storytelling

25 papers with code • 1 benchmarks • 4 datasets

( Image credit: No Metrics Are Perfect )

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Storytelling

Trend	Dataset	Best Model	Paper	Code	Compare
	VIST	HEGR			See all

Datasets

Subtasks

Image-guided Story Ending Generation

Latest papers with no code

Most implemented Social Latest No code

Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings

no code yet • 3 May 2023

Recent advances in large language models elicit reasoning in a chain-of-thought that allows models to decompose problems in a human-like fashion.

Paper
Add Code

Visual Transformation Telling

no code yet • 3 May 2023

In this paper, we propose a new visual reasoning task, called Visual Transformation Telling (VTT).

Paper
Add Code

A-CAP: Anticipation Captioning with Commonsense Knowledge

no code yet • CVPR 2023

Humans possess the capacity to reason about the future based on a sparse collection of visual cues acquired over time.

Paper
Add Code

Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences

no code yet • 20 Jan 2023

The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence.

Paper
Add Code

A survey on knowledge-enhanced multimodal learning

no code yet • 19 Nov 2022

Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation.

Paper
Add Code

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

no code yet • 28 Oct 2022

To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format.

Paper
Add Code

Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks

no code yet • 26 Oct 2022

We present Bloom Library, a linguistically diverse set of multimodal and multilingual datasets for language modeling, image captioning, visual storytelling, and speech synthesis/recognition.

Paper
Add Code