1 code implementation • 15 Jun 2022 • Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
Audio-visual automatic speech recognition (AV-ASR) is an extension of ASR that incorporates visual cues, often from the movements of a speaker's mouth.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Nov 2021 • Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, Cordelia Schmid
Our proposal is to pre-train a video encoder using all the available video modalities as supervision, namely, appearance, sound, and transcribed speech.
1 code implementation • 3 Aug 2020 • Samuel Albanie, Yang Liu, Arsha Nagrani, Antoine Miech, Ernesto Coto, Ivan Laptev, Rahul Sukthankar, Bernard Ghanem, Andrew Zisserman, Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid, Shi-Zhe Chen, Yida Zhao, Qin Jin, Kaixu Cui, Hui Liu, Chen Wang, Yudong Jiang, Xiaoshuai Hao
This report summarizes the results of the first edition of the challenge together with the findings of the participants.
1 code implementation • ECCV 2020 • Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid
In this paper, we present a multi-modal transformer to jointly encode the different modalities in video, which allows each of them to attend to the others.
Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT (text-to-video Mean Rank metric, using extra training data)
no code implementations • ICCV 2019 • Valentin Gabeur, Jean-Sebastien Franco, Xavier Martin, Cordelia Schmid, Gregory Rogez
In this paper, we tackle the problem of 3D human shape estimation from single RGB images.