TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Human Judgment Correlation	Flickr8k-CF	MID	Kendall's Tau-b	37.3	# 1
Human Judgment Correlation	Flickr8k-Expert	MID	Kendall's Tau-c	54.9	# 1
Hallucination Pair-wise Detection (4-ref)	FOIL	MID	Mean Accuracy	90.5	# 2
Hallucination Pair-wise Detection (1-ref)	FOIL	MID	Mean Accuracy	90.5	# 2
Human Judgment Classification	Pascal-50S	MID	Mean Accuracy	85.2	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mutual-information-divergence-a-unified/human-judgment-correlation-on-flickr8k-cf)](https://paperswithcode.com/sota/human-judgment-correlation-on-flickr8k-cf?p=mutual-information-divergence-a-unified)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mutual-information-divergence-a-unified/human-judgment-correlation-on-flickr8k-expert)](https://paperswithcode.com/sota/human-judgment-correlation-on-flickr8k-expert?p=mutual-information-divergence-a-unified)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mutual-information-divergence-a-unified/human-judgment-classification-on-pascal-50s)](https://paperswithcode.com/sota/human-judgment-classification-on-pascal-50s?p=mutual-information-divergence-a-unified)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mutual-information-divergence-a-unified/hallucination-pair-wise-detection-4-ref-on)](https://paperswithcode.com/sota/hallucination-pair-wise-detection-4-ref-on?p=mutual-information-divergence-a-unified)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mutual-information-divergence-a-unified/hallucination-pair-wise-detection-1-ref-on)](https://paperswithcode.com/sota/hallucination-pair-wise-detection-1-ref-on?p=mutual-information-divergence-a-unified)`

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

25 May 2022 · Jin-Hwa Kim, Yunji Kim, Jiyoung Lee, Kang Min Yoo, Sang-Woo Lee ·

Text-to-image generation and image captioning are recently emerged as a new experimental paradigm to assess machine intelligence. They predict continuous quantity accompanied by their sampling techniques in the generation, making evaluation complicated and intractable to get marginal distributions. Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID). To validate, we extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks. The proposed MID significantly outperforms the competitive methods by having consistency across benchmarks, sample parsimony, and robustness toward the exploited CLIP model. We look forward to seeing the underrepresented implications of the Gaussian cross-mutual information in multimodal representation learning and the future works based on this novel proposition.

PDF Abstract

Code

Add Remove Mark official

naver-ai/mid.metric official

Tasks

Add Remove

Hallucination Pair-wise Detection (1-ref)

Hallucination Pair-wise Detection (4-ref)

Human Judgment Classification

Human Judgment Correlation

Image Captioning

Image Generation

Representation Learning

Text-to-Image Generation

Datasets

CUB-200-2011

Oxford 102 Flower

COCO Captions

Results from the Paper

Edit

Ranked #1 on Human Judgment Classification on Pascal-50S

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Human Judgment Correlation	Flickr8k-CF	MID	Kendall's Tau-b	37.3	# 1	Compare
Human Judgment Correlation	Flickr8k-Expert	MID	Kendall's Tau-c	54.9	# 1	Compare
Hallucination Pair-wise Detection (4-ref)	FOIL	MID	Mean Accuracy	90.5	# 2	Compare
Hallucination Pair-wise Detection (1-ref)	FOIL	MID	Mean Accuracy	90.5	# 2	Compare
Human Judgment Classification	Pascal-50S	MID	Mean Accuracy	85.2	# 1	Compare

Methods

Add Remove

CLIP

Edit Social Preview

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove