TASK |
DATASET |
MODEL |
METRIC NAME |
METRIC VALUE |
GLOBAL RANK |
REMOVE |
Zero-Shot Video Retrieval
|
LSMDC
|
HowToCaption
|
text-to-video R@1
|
17.3
|
# 9
|
|
Zero-Shot Video Retrieval
|
LSMDC
|
HowToCaption
|
text-to-video R@5
|
31.7
|
# 10
|
|
Zero-Shot Video Retrieval
|
LSMDC
|
HowToCaption
|
text-to-video R@10
|
38.6
|
# 11
|
|
Zero-Shot Video Retrieval
|
LSMDC
|
HowToCaption
|
text-to-video Median Rank
|
29
|
# 4
|
|
Zero-Shot Video Retrieval
|
LSMDC
|
VAST, HowToCaption-finetuned
|
text-to-video R@1
|
27.7
|
# 3
|
|
Zero-Shot Video Retrieval
|
LSMDC
|
VAST, HowToCaption-finetuned
|
text-to-video R@5
|
46.5
|
# 3
|
|
Zero-Shot Video Retrieval
|
LSMDC
|
VAST, HowToCaption-finetuned
|
text-to-video R@10
|
54.6
|
# 3
|
|
Zero-Shot Video Retrieval
|
LSMDC
|
VAST, HowToCaption-finetuned
|
text-to-video Median Rank
|
7
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video+audio R@1
|
13.2
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video+audio R@5
|
30.3
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video+audio R@10
|
41.5
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video+audio Median Rank
|
17
|
# 1
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video R@1
|
37.6
|
# 15
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video R@5
|
62
|
# 14
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video R@10
|
73.3
|
# 12
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
HowToCaption
|
text-to-video Median Rank
|
3
|
# 4
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
VAST, HowToCaption-finetuned
|
text-to-video R@1
|
50
|
# 4
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
VAST, HowToCaption-finetuned
|
text-to-video R@5
|
73.2
|
# 3
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
VAST, HowToCaption-finetuned
|
text-to-video R@10
|
81.4
|
# 4
|
|
Zero-Shot Video Retrieval
|
MSR-VTT
|
VAST, HowToCaption-finetuned
|
text-to-video Median Rank
|
1
|
# 1
|
|
Video Captioning
|
MSR-VTT
|
HowToCaption
|
CIDEr
|
65.3
|
# 10
|
|
Video Captioning
|
MSR-VTT
|
HowToCaption
|
METEOR
|
32.2
|
# 6
|
|
Video Captioning
|
MSR-VTT
|
HowToCaption
|
ROUGE-L
|
66.3
|
# 6
|
|
Video Captioning
|
MSR-VTT
|
HowToCaption
|
BLEU-4
|
49.8
|
# 8
|
|
Video Captioning
|
MSVD
|
HowToCaption
|
CIDEr
|
154.2
|
# 6
|
|
Video Captioning
|
MSVD
|
HowToCaption
|
BLEU-4
|
70.4
|
# 6
|
|
Video Captioning
|
MSVD
|
HowToCaption
|
METEOR
|
46.4
|
# 4
|
|
Video Captioning
|
MSVD
|
HowToCaption
|
ROUGE-L
|
83.2
|
# 4
|
|
Zero-Shot Video Retrieval
|
MSVD
|
HowToCaption
|
text-to-video R@1
|
44.5
|
# 8
|
|
Zero-Shot Video Retrieval
|
MSVD
|
HowToCaption
|
text-to-video R@5
|
73.3
|
# 10
|
|
Zero-Shot Video Retrieval
|
MSVD
|
HowToCaption
|
text-to-video R@10
|
82.1
|
# 10
|
|
Zero-Shot Video Retrieval
|
MSVD
|
HowToCaption
|
text-to-video Median Rank
|
2
|
# 4
|
|
Zero-Shot Video Retrieval
|
MSVD
|
VAST, HowToCaption-finetuned
|
text-to-video R@1
|
54.8
|
# 3
|
|
Zero-Shot Video Retrieval
|
MSVD
|
VAST, HowToCaption-finetuned
|
text-to-video R@5
|
80.9
|
# 4
|
|
Zero-Shot Video Retrieval
|
MSVD
|
VAST, HowToCaption-finetuned
|
text-to-video R@10
|
87.2
|
# 5
|
|
Zero-Shot Video Retrieval
|
MSVD
|
VAST, HowToCaption-finetuned
|
text-to-video Median Rank
|
1
|
# 1
|
|
Video Captioning
|
YouCook2
|
HowToCaption
|
BLEU-4
|
8.8
|
# 10
|
|
Video Captioning
|
YouCook2
|
HowToCaption
|
METEOR
|
15.9
|
# 7
|
|
Video Captioning
|
YouCook2
|
HowToCaption
|
ROUGE-L
|
37.3
|
# 8
|
|
Video Captioning
|
YouCook2
|
HowToCaption
|
CIDEr
|
116.4
|
# 1
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video R@1
|
13.4
|
# 8
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video R@5
|
33.1
|
# 8
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video R@10
|
44.1
|
# 9
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video Median Rank
|
15
|
# 2
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
VAST, HowToCaption-finetuned
|
text-to-video R@1
|
19.7
|
# 6
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
VAST, HowToCaption-finetuned
|
text-to-video R@5
|
43.6
|
# 4
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
VAST, HowToCaption-finetuned
|
text-to-video R@10
|
53.9
|
# 5
|
|
Zero-Shot Video Retrieval
|
YouCook2
|
VAST, HowToCaption-finetuned
|
text-to-video Median Rank
|
8
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video+audio R@1
|
25.5
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video+audio R@5
|
51.1
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video+audio R@10
|
63.6
|
# 1
|
|
Zero-Shot Video-Audio Retrieval
|
YouCook2
|
HowToCaption
|
text-to-video+audio Median Rank
|
5
|
# 1
|
|