TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-to-Image Generation	MS COCO	StackGAN + VICTR	Inception score	10.38	# 23
Text-to-Image Generation	MS COCO	DM-GAN + VICTR	FID	32.37	# 60
Text-to-Image Generation	MS COCO	DM-GAN + VICTR	Inception score	32.37	# 5
Text-to-Image Generation	MS COCO	AttnGAN + VICTR	FID	29.26	# 58
Text-to-Image Generation	MS COCO	AttnGAN + VICTR	Inception score	28.18	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/victr-visual-information-captured-text/text-to-image-generation-on-coco)](https://paperswithcode.com/sota/text-to-image-generation-on-coco?p=victr-visual-information-captured-text)`

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

7 Oct 2020 · Soyeon Caren Han, Siqu Long, Siwen Luo, Kunze Wang, Josiah Poon ·

Text-to-image multimodal tasks, generating/retrieving an image from a given text description, are extremely challenging tasks since raw text descriptions cover quite limited information in order to fully describe visually realistic images. We propose a new visual contextual text representation for text-to-image multimodal tasks, VICTR, which captures rich visual semantic information of objects from the text input. First, we use the text description as initial input and conduct dependency parsing to extract the syntactic structure and analyse the semantic aspect, including object quantities, to extract the scene graph. Then, we train the extracted objects, attributes, and relations in the scene graph and the corresponding geometric relation information using Graph Convolutional Networks, and it generates text representation which integrates textual and visual semantic information. The text representation is aggregated with word-level and sentence-level embedding to generate both visual contextual word and sentence representation. For the evaluation, we attached VICTR to the state-of-the-art models in text-to-image generation.VICTR is easily added to existing models and improves across both quantitative and qualitative aspects.

PDF Abstract

Code

Add Remove Mark official

usydnlp/VICTR official

Tasks

Add Remove

Dependency Parsing

Sentence

Text-to-Image Generation

Datasets

MS COCO

Results from the Paper

Edit

Ranked #24 on Text-to-Image Generation on MS COCO (Inception score metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-to-Image Generation	MS COCO	StackGAN + VICTR	Inception score	10.38	# 23	Compare
Text-to-Image Generation	MS COCO	DM-GAN + VICTR	FID	32.37	# 60	Compare
Text-to-Image Generation	MS COCO	DM-GAN + VICTR	Inception score	32.37	# 5	Compare
Text-to-Image Generation	MS COCO	AttnGAN + VICTR	FID	29.26	# 58	Compare
Text-to-Image Generation	MS COCO	AttnGAN + VICTR	Inception score	28.18	# 11	Compare

Methods

Add Remove

Graph Convolutional Networks

Edit Social Preview

VICTR: Visual Information Captured Text Representation for Text-to-Image Multimodal Tasks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove