Human Judgment Correlation

3 papers with code • 2 benchmarks • 1 datasets

A task where an algorithm should generate the judgment scores correlating with human judgments.

Datasets


Most implemented papers

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

jmhessel/clipscore EMNLP 2021

Image captioning has conventionally relied on reference-based automatic evaluations, where machine captions are compared against captions written by humans.

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

naver-ai/mid.metric 25 May 2022

Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID).

FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing

zhuang-li/factual 27 May 2023

Textual scene graph parsing has become increasingly important in various vision-language applications, including image caption evaluation and image retrieval.