Lip Reading
45 papers with code • 3 benchmarks • 5 datasets
Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.
Source: Mutual Information Maximization for Effective Lip Reading
Most implemented papers
XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification
Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data.
Lip2AudSpec: Speech reconstruction from silent lip movements video
In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos.
End-to-End Speech-Driven Facial Animation with Temporal GANs
To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio.
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers.
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).
Deformation Flow Based Two-Stream Network for Lip Reading
Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region.
Mutual Information Maximization for Effective Lip Reading
By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading.
Synchronous Bidirectional Learning for Multilingual Lip Reading
Based on this idea, we try to explore the synergized learning of multilingual lip reading in this paper, and further propose a synchronous bidirectional learning (SBL) framework for effective synergy of multilingual lip reading.
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.