no code implementations • 28 May 2023 • Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, Bernard Ghanem
Studying continual learning in the video domain poses even more challenges, as video data contains a large number of frames, which places a higher burden on the replay memory.
1 code implementation • 27 Mar 2022 • Juan Leon Alcazar, Moritz Cordes, Chen Zhao, Bernard Ghanem
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process: feature extraction and spatio-temporal context aggregation.
Ranked #4 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker (using extra training data)
1 code implementation • 3 Jun 2021 • Juan Leon Alcazar, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem, Fabian Caba Heilbron
To showcase the potential of our new dataset, we propose an audiovisual baseline and benchmark for person retrieval.
1 code implementation • CVPR 2020 • Juan Leon Alcazar, Fabian Caba Heilbron, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem
Current methods for active speak er detection focus on modeling short-term audiovisual information from a single speaker.
no code implementations • 11 Apr 2019 • Juan Leon Alcazar, Maria A. Bravo, Ali K. Thabet, Guillaume Jeanneret, Thomas Brox, Pablo Arbelaez, Bernard Ghanem
Instance-level video segmentation requires a solid integration of spatial and temporal information.