TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Retrieval	DiDeMo	PAU	text-to-video R@1	48.6	# 26
Video Retrieval	DiDeMo	PAU	text-to-video R@5	76.0	# 25
Video Retrieval	DiDeMo	PAU	text-to-video R@10	84.5	# 22
Video Retrieval	DiDeMo	PAU	text-to-video Median Rank	2.0	# 9
Video Retrieval	DiDeMo	PAU	text-to-video Mean Rank	12.9	# 7
Video Retrieval	DiDeMo	PAU	video-to-text R@1	48.1	# 10
Video Retrieval	DiDeMo	PAU	video-to-text R@10	85.7	# 6
Video Retrieval	DiDeMo	PAU	video-to-text Median Rank	2.0	# 5
Video Retrieval	DiDeMo	PAU	video-to-text Mean Rank	9.8	# 6
Video Retrieval	DiDeMo	PAU	video-to-text R@5	74.2	# 8
Video Retrieval	MSR-VTT-1kA	PAU	text-to-video Mean Rank	14.0	# 16
Video Retrieval	MSR-VTT-1kA	PAU	text-to-video R@1	48.5	# 24
Video Retrieval	MSR-VTT-1kA	PAU	text-to-video R@5	72.7	# 28
Video Retrieval	MSR-VTT-1kA	PAU	text-to-video R@10	82.5	# 27
Video Retrieval	MSR-VTT-1kA	PAU	text-to-video Median Rank	2.0	# 10
Video Retrieval	MSR-VTT-1kA	PAU	video-to-text R@1	48.3	# 12
Video Retrieval	MSR-VTT-1kA	PAU	video-to-text R@5	73.0	# 18
Video Retrieval	MSR-VTT-1kA	PAU	video-to-text R@10	83.2	# 18
Video Retrieval	MSR-VTT-1kA	PAU	video-to-text Median Rank	2.0	# 7
Video Retrieval	MSR-VTT-1kA	PAU	video-to-text Mean Rank	9.7	# 15
Video Retrieval	MSVD	PAU	text-to-video R@1	47.3	# 16
Video Retrieval	MSVD	PAU	text-to-video R@5	77.4	# 13
Video Retrieval	MSVD	PAU	text-to-video R@10	85.5	# 13
Video Retrieval	MSVD	PAU	text-to-video Median Rank	2.0	# 8
Video Retrieval	MSVD	PAU	text-to-video Mean Rank	9.6	# 10
Video Retrieval	MSVD	PAU	video-to-text R@1	68.9	# 8
Video Retrieval	MSVD	PAU	video-to-text R@5	93.1	# 4
Video Retrieval	MSVD	PAU	video-to-text R@10	97.1	# 1
Video Retrieval	MSVD	PAU	video-to-text Median Rank	1.0	# 1
Video Retrieval	MSVD	PAU	video-to-text Mean Rank	2.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prototype-based-aleatoric-uncertainty-1/video-retrieval-on-msvd)](https://paperswithcode.com/sota/video-retrieval-on-msvd?p=prototype-based-aleatoric-uncertainty-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prototype-based-aleatoric-uncertainty-1/video-retrieval-on-msr-vtt-1ka)](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt-1ka?p=prototype-based-aleatoric-uncertainty-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prototype-based-aleatoric-uncertainty-1/video-retrieval-on-didemo)](https://paperswithcode.com/sota/video-retrieval-on-didemo?p=prototype-based-aleatoric-uncertainty-1)`

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

NeurIPS 2023 · Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen ·

Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.

PDF Abstract NeurIPS 2023 PDF NeurIPS 2023 Abstract

Code

Add Remove Mark official

leolee99/pau official

Tasks

Add Remove

Cross-Modal Retrieval

Image-text matching

Image-to-Text Retrieval

Retrieval

Text Retrieval

Text to Video Retrieval

Uncertainty Quantification

Video Retrieval

Video-Text Retrieval

Video to Text Retrieval

Datasets

MS COCO

MSR-VTT

MSVD

DiDeMo

Results from the Paper

Add Remove

Ranked #16 on Video Retrieval on MSVD

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Retrieval	DiDeMo	PAU	text-to-video R@1	48.6	# 26	Compare
			text-to-video R@5	76.0	# 25	Compare
			text-to-video R@10	84.5	# 22	Compare
			text-to-video Median Rank	2.0	# 9	Compare
			text-to-video Mean Rank	12.9	# 7	Compare
			video-to-text R@1	48.1	# 10	Compare
			video-to-text R@10	85.7	# 6	Compare
			video-to-text Median Rank	2.0	# 5	Compare
			video-to-text Mean Rank	9.8	# 6	Compare
			video-to-text R@5	74.2	# 8	Compare
Video Retrieval	MSR-VTT-1kA	PAU	text-to-video Mean Rank	14.0	# 16	Compare
			text-to-video R@1	48.5	# 24	Compare
			text-to-video R@5	72.7	# 28	Compare
			text-to-video R@10	82.5	# 27	Compare
			text-to-video Median Rank	2.0	# 10	Compare
			video-to-text R@1	48.3	# 12	Compare
			video-to-text R@5	73.0	# 18	Compare
			video-to-text R@10	83.2	# 18	Compare
			video-to-text Median Rank	2.0	# 7	Compare
			video-to-text Mean Rank	9.7	# 15	Compare
Video Retrieval	MSVD	PAU	text-to-video R@1	47.3	# 16	Compare
			text-to-video R@5	77.4	# 13	Compare
			text-to-video R@10	85.5	# 13	Compare
			text-to-video Median Rank	2.0	# 8	Compare
			text-to-video Mean Rank	9.6	# 10	Compare
			video-to-text R@1	68.9	# 8	Compare
			video-to-text R@5	93.1	# 4	Compare
			video-to-text R@10	97.1	# 1	Compare
			video-to-text Median Rank	1.0	# 1	Compare
			video-to-text Mean Rank	2.4	# 1	Compare

Methods

Add Remove

PAU

Edit Social Preview

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove