Lipreading
30 papers with code • 7 benchmarks • 6 datasets
Lipreading is a process of extracting speech by watching lip movements of a speaker in the absence of sound. Humans lipread all the time without even noticing. It is a big part in communication albeit not as dominant as audio. It is a very helpful skill to learn especially for those who are hard of hearing.
Deep Lipreading is the process of extracting speech from a video of a silent talking face using deep neural networks. It is also known by few other names: Visual Speech Recognition (VSR), Machine Lipreading, Automatic Lipreading etc.
The primary methodology involves two stages: i) Extracting visual and temporal features from a sequence of image frames from a silent talking video ii) Processing the sequence of features into units of speech e.g. characters, words, phrases etc. We can find several implementations of this methodology either done in two separate stages or trained end-to-end in one go.
Libraries
Use these libraries to find Lipreading models and implementationsMost implemented papers
Lip Reading Sentences in the Wild
The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.
Deep word embeddings for visual speech recognition
In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers.
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).
Deformation Flow Based Two-Stream Network for Lip Reading
Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region.
Mutual Information Maximization for Effective Lip Reading
By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading.
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
The experiments show that our proposed model outperforms various state-of-the-art models and incorporating the memory augmented lateral transformers makes a 3. 7% improvement to the SpotFast networks.
Towards Practical Lipreading with Distilled and Efficient Models
However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.
Learn an Effective Lip Reading Model without Pains
Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.