no code implementations • Interspeech 2023 • Baptiste Pouthier, Laurent Pilati, Giacomo Valenti, Charles Bouveyron, Frederic Precioso
Standard Visual Speech Recognition (VSR) systems directly process images as input features without any apriori link between raw pixel data and facial traits.
Ranked #3 on Landmark-based Lipreading on LRW
no code implementations • 7 Jun 2021 • Baptiste Pouthier, Laurent Pilati, Leela K. Gudupudi, Charles Bouveyron, Frederic Precioso
It is now well established from a variety of studies that there is a significant benefit from combining video and audio data in detecting active speakers.
Active Speaker Detection Audio-Visual Active Speaker Detection