Lipreading

30 papers with code • 7 benchmarks • 6 datasets

Lipreading is a process of extracting speech by watching lip movements of a speaker in the absence of sound. Humans lipread all the time without even noticing. It is a big part in communication albeit not as dominant as audio. It is a very helpful skill to learn especially for those who are hard of hearing.

Deep Lipreading is the process of extracting speech from a video of a silent talking face using deep neural networks. It is also known by few other names: Visual Speech Recognition (VSR), Machine Lipreading, Automatic Lipreading etc.

The primary methodology involves two stages: i) Extracting visual and temporal features from a sequence of image frames from a silent talking video ii) Processing the sequence of features into units of speech e.g. characters, words, phrases etc. We can find several implementations of this methodology either done in two separate stages or trained end-to-end in one go.

Libraries

Use these libraries to find Lipreading models and implementations

Most implemented papers

Lip Reading Sentences in the Wild

parambadiger/Lip-Reading CVPR 2017

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Deep word embeddings for visual speech recognition

tstafylakis/Lipreading-ResNet 30 Oct 2017

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

around-star/Speech-Recognition 8 Nov 2019

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

zju-vipa/KamalEngine 26 Nov 2019

In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers.

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

sailordiary/deep-face-vsr 6 Mar 2020

Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).

Deformation Flow Based Two-Stream Network for Lip Reading

jingyunx/Deformation-Flow-Based-Two-stream-Network 12 Mar 2020

Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region.

Mutual Information Maximization for Effective Lip Reading

xing96/MIM-lipreading 13 Mar 2020

By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading.

SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading

perathambkk/lipreading 21 May 2020

The experiments show that our proposed model outperforms various state-of-the-art models and incorporating the memory augmented lateral transformers makes a 3. 7% improvement to the SpotFast networks.

Towards Practical Lipreading with Distilled and Efficient Models

mpc001/Lipreading_using_Temporal_Convolutional_Networks 13 Jul 2020

However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Learn an Effective Lip Reading Model without Pains

Fengdalu/learn-an-effective-lip-reading-model-without-pains 15 Nov 2020

Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.