Badge
Markdown
[](https://paperswithcode.com/sota/text-to-audiovisual-retrieval-on-valor-32k?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-captioning-on-vatex-1?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-retrieval-on-activitynet?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/audio-captioning-on-audiocaps?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/audio-captioning-on-clotho?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-retrieval-on-msr-vtt?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-question-answering-on-msrvtt-qa?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/audio-visual-question-answering-on-music-avqa?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/audio-visual-captioning-on-valor-32k?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-retrieval-on-vatex?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-question-answering-on-activitynet-qa?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/text-to-audio-retrieval-on-audiocaps?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/text-to-audio-retrieval-on-clotho?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/image-captioning-on-coco-captions?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-retrieval-on-didemo?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-captioning-on-msvd-1?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/visual-question-answering-on-msvd-qa-1?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/tgif-frame-on-tgif-qa?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-retrieval-on-lsmdc?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/video-captioning-on-msr-vtt-1?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-std?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/cross-modal-retrieval-on-coco-2014?p=valor-vision-audio-language-omni-perception)
[](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-dev?p=valor-vision-audio-language-omni-perception)