TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS*+	MRR	0.7124	# 1
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS*+	Mean Rank	2.96	# 2
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS*+	R@1	58.28	# 1
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS*+	R@10	94.45	# 1
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS*+	R@5	87.55	# 1
Visual Dialog	VisDial v1.0 test-std	Two-Step	MRR	0.7041	# 2
Visual Dialog	VisDial v1.0 test-std	Two-Step	Mean Rank	3.66	# 1
Visual Dialog	VisDial v1.0 test-std	Two-Step	NDCG	72.16	# 1
Visual Dialog	VisDial v1.0 test-std	Two-Step	R@1	58.18	# 2
Visual Dialog	VisDial v1.0 test-std	Two-Step	R@10	90.83	# 2
Visual Dialog	VisDial v1.0 test-std	Two-Step	R@5	83.85	# 2
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS	NDCG	64.04	# 2
Visual Dialog	Visual Dialog v1.0 test-std	2 Step: Factor Graph Attention + VD-Bert	NDCG (x 100)	72.83	# 16
Visual Dialog	Visual Dialog v1.0 test-std	2 Step: Factor Graph Attention + VD-Bert	MRR (x 100)	69.92	# 4
Visual Dialog	Visual Dialog v1.0 test-std	2 Step: Factor Graph Attention + VD-Bert	R@1	58.3	# 1
Visual Dialog	Visual Dialog v1.0 test-std	2 Step: Factor Graph Attention + VD-Bert	R@5	81.55	# 15
Visual Dialog	Visual Dialog v1.0 test-std	2 Step: Factor Graph Attention + VD-Bert	R@10	89.6	# 29
Visual Dialog	Visual Dialog v1.0 test-std	2 Step: Factor Graph Attention + VD-Bert	Mean	3.84	# 65

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ensemble-of-mrr-and-ndcg-models-for-visual/visual-dialog-on-visdial-v10-test-std)](https://paperswithcode.com/sota/visual-dialog-on-visdial-v10-test-std?p=ensemble-of-mrr-and-ndcg-models-for-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ensemble-of-mrr-and-ndcg-models-for-visual/visual-dialog-on-visual-dialog-v1-0-test-std)](https://paperswithcode.com/sota/visual-dialog-on-visual-dialog-v1-0-test-std?p=ensemble-of-mrr-and-ndcg-models-for-visual)`

Ensemble of MRR and NDCG models for Visual Dialog

NAACL 2021 · Idan Schwartz ·

Assessing an AI agent that can converse in human language and understand visual content is challenging. Generation metrics, such as BLEU scores favor correct syntax over semantics. Hence a discriminative approach is often used, where an agent ranks a set of candidate options. The mean reciprocal rank (MRR) metric evaluates the model performance by taking into account the rank of a single human-derived answer. This approach, however, raises a new challenge: the ambiguity and synonymy of answers, for instance, semantic equivalence (e.g., `yeah' and `yes'). To address this, the normalized discounted cumulative gain (NDCG) metric has been used to capture the relevance of all the correct answers via dense annotations. However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know. Crafting a model that excels on both MRR and NDCG metrics is challenging. Ideally, an AI agent should answer a human-like reply and validate the correctness of any answer. To address this issue, we describe a two-step non-parametric ranking approach that can merge strong MRR and NDCG models. Using our approach, we manage to keep most MRR state-of-the-art performance (70.41% vs. 71.24%) and the NDCG state-of-the-art performance (72.16% vs. 75.35%). Moreover, our approach won the recent Visual Dialog 2020 challenge. Source code is available at https://github.com/idansc/mrr-ndcg.

PDF Abstract NAACL 2021 PDF NAACL 2021 Abstract

Code

Add Remove Mark official

idansc/mrr-ndcg official

Tasks

Add Remove

Visual Dialog

Datasets

VisDial

Results from the Paper

Edit

Ranked #1 on Visual Dialog on VisDial v1.0 test-std

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS*+	MRR	0.7124	# 1	Compare
			Mean Rank	2.96	# 2	Compare
			R@1	58.28	# 1	Compare
			R@10	94.45	# 1	Compare
			R@5	87.55	# 1	Compare
Visual Dialog	VisDial v1.0 test-std	Two-Step	MRR	0.7041	# 2	Compare
			Mean Rank	3.66	# 1	Compare
			NDCG	72.16	# 1	Compare
			R@1	58.18	# 2	Compare
			R@10	90.83	# 2	Compare
			R@5	83.85	# 2	Compare
Visual Dialog	VisDial v1.0 test-std	5xFGA + LS	NDCG	64.04	# 2	Compare
Visual Dialog	Visual Dialog v1.0 test-std	2 Step: Factor Graph Attention + VD-Bert	NDCG (x 100)	72.83	# 16	Compare
			MRR (x 100)	69.92	# 4	Compare
			R@1	58.3	# 1	Compare
			R@5	81.55	# 15	Compare
			R@10	89.6	# 29	Compare
			Mean	3.84	# 65	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Ensemble of MRR and NDCG models for Visual Dialog

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove