TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Retrieval	MSR-VTT	LAFF	text-to-video R@1	29.1	# 26
Video Retrieval	MSR-VTT	LAFF	text-to-video R@5	54.9	# 24
Video Retrieval	MSR-VTT	LAFF	text-to-video R@10	65.8	# 23
Video Retrieval	MSR-VTT-1kA	LAFF	text-to-video R@1	45.8	# 32
Video Retrieval	MSR-VTT-1kA	LAFF	text-to-video R@5	71.5	# 31
Video Retrieval	MSR-VTT-1kA	LAFF	text-to-video R@10	82	# 30
Video Retrieval	MSVD	LAFF	text-to-video R@1	45.4	# 20
Video Retrieval	MSVD	LAFF	text-to-video R@5	76.0	# 17
Video Retrieval	MSVD	LAFF	text-to-video R@10	84.6	# 15
Video Retrieval	TGIF	LAFF	text-to-video R@1	24.5	# 2
Video Retrieval	TGIF	LAFF	text-to-video R@5	45.0	# 2
Video Retrieval	TGIF	LAFF	text-to-video R@10	54.5	# 2
Ad-hoc video search	TRECVID-AVS16 (IACC.3)	LAFF	infAP	0.222	# 1
Ad-hoc video search	TRECVID-AVS17 (IACC.3)	LAFF	infAP	0.290	# 1
Ad-hoc video search	TRECVID-AVS18 (IACC.3)	LAFF	infAP	0.147	# 1
Ad-hoc video search	TRECVID-AVS19 (V3C1)	LAFF	infAP	0.192	# 1
Ad-hoc video search	TRECVID-AVS20 (V3C1)	LAFF	infAP	0.265	# 1
Video Retrieval	VATEX	LAFF	text-to-video R@1	59.1	# 8
Video Retrieval	VATEX	LAFF	text-to-video R@50	96.3	# 1
Video Retrieval	VATEX	LAFF	text-to-video R@10	91.7	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/ad-hoc-video-search-on-trecvid-avs16-iacc-3)](https://paperswithcode.com/sota/ad-hoc-video-search-on-trecvid-avs16-iacc-3?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/ad-hoc-video-search-on-trecvid-avs17-iacc-3)](https://paperswithcode.com/sota/ad-hoc-video-search-on-trecvid-avs17-iacc-3?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/ad-hoc-video-search-on-trecvid-avs18-iacc-3)](https://paperswithcode.com/sota/ad-hoc-video-search-on-trecvid-avs18-iacc-3?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/ad-hoc-video-search-on-trecvid-avs19-v3c1)](https://paperswithcode.com/sota/ad-hoc-video-search-on-trecvid-avs19-v3c1?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/ad-hoc-video-search-on-trecvid-avs20-v3c1)](https://paperswithcode.com/sota/ad-hoc-video-search-on-trecvid-avs20-v3c1?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/video-retrieval-on-tgif)](https://paperswithcode.com/sota/video-retrieval-on-tgif?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/video-retrieval-on-vatex)](https://paperswithcode.com/sota/video-retrieval-on-vatex?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/video-retrieval-on-msvd)](https://paperswithcode.com/sota/video-retrieval-on-msvd?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/video-retrieval-on-msr-vtt)](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt?p=lightweight-attentional-feature-fusion-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lightweight-attentional-feature-fusion-for/video-retrieval-on-msr-vtt-1ka)](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt-1ka?p=lightweight-attentional-feature-fusion-for)`

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

3 Dec 2021 · Fan Hu, Aozhu Chen, Ziyue Wang, Fangming Zhou, Jianfeng Dong, Xirong Li ·

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval. Different from previous research that considers feature fusion only at one end, let it be video or text, we aim for feature fusion for both ends within a unified framework. We hypothesize that optimizing the convex combination of the features is preferred to modeling their correlations by computationally heavy multi-head self attention. We propose Lightweight Attentional Feature Fusion (LAFF). LAFF performs feature fusion at both early and late stages and at both video and text ends, making it a powerful method for exploiting diverse (off-the-shelf) features. The interpretability of LAFF can be used for feature selection. Extensive experiments on five public benchmark sets (MSR-VTT, MSVD, TGIF, VATEX and TRECVID AVS 2016-2020) justify LAFF as a new baseline for text-to-video retrieval.

PDF Abstract

Code

Add Remove Mark official

ruc-aimc-lab/laff official

Tasks

Add Remove

Ad-hoc video search

feature selection

Retrieval

Text to Video Retrieval

Video Retrieval

Datasets

MSR-VTT

MSVD

VATEX

TGIF

TRECVID TRECVID-AVS16 (IACC.3) TRECVID-AVS17 (IACC.3) TRECVID-AVS18 (IACC.3) TRECVID-AVS19 (V3C1) TRECVID-AVS20 (V3C1)

Results from the Paper

Edit

Ranked #1 on Ad-hoc video search on TRECVID-AVS20 (V3C1) (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Retrieval	MSR-VTT	LAFF	text-to-video R@1	29.1	# 26	Compare
			text-to-video R@5	54.9	# 24	Compare
			text-to-video R@10	65.8	# 23	Compare
Video Retrieval	MSR-VTT-1kA	LAFF	text-to-video R@1	45.8	# 32	Compare
			text-to-video R@5	71.5	# 31	Compare
			text-to-video R@10	82	# 30	Compare
Video Retrieval	MSVD	LAFF	text-to-video R@1	45.4	# 20	Compare
			text-to-video R@5	76.0	# 17	Compare
			text-to-video R@10	84.6	# 15	Compare
Video Retrieval	TGIF	LAFF	text-to-video R@1	24.5	# 2	Compare
			text-to-video R@5	45.0	# 2	Compare
			text-to-video R@10	54.5	# 2	Compare
Ad-hoc video search	TRECVID-AVS16 (IACC.3)	LAFF	infAP	0.222	# 1	Compare
Ad-hoc video search	TRECVID-AVS17 (IACC.3)	LAFF	infAP	0.290	# 1	Compare
Ad-hoc video search	TRECVID-AVS18 (IACC.3)	LAFF	infAP	0.147	# 1	Compare
Ad-hoc video search	TRECVID-AVS19 (V3C1)	LAFF	infAP	0.192	# 1	Compare
Ad-hoc video search	TRECVID-AVS20 (V3C1)	LAFF	infAP	0.265	# 1	Compare
Video Retrieval	VATEX	LAFF	text-to-video R@1	59.1	# 8	Compare
			text-to-video R@50	96.3	# 1	Compare
			text-to-video R@10	91.7	# 8	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove