Lipreading

30 papers with code • 7 benchmarks • 6 datasets

Lipreading is a process of extracting speech by watching lip movements of a speaker in the absence of sound. Humans lipread all the time without even noticing. It is a big part in communication albeit not as dominant as audio. It is a very helpful skill to learn especially for those who are hard of hearing.

Deep Lipreading is the process of extracting speech from a video of a silent talking face using deep neural networks. It is also known by few other names: Visual Speech Recognition (VSR), Machine Lipreading, Automatic Lipreading etc.

The primary methodology involves two stages: i) Extracting visual and temporal features from a sequence of image frames from a silent talking video ii) Processing the sequence of features into units of speech e.g. characters, words, phrases etc. We can find several implementations of this methodology either done in two separate stages or trained end-to-end in one go.

Benchmarks

Add a Result

These leaderboards are used to track progress in Lipreading

Dataset	Best Model	Compare
Lip Reading in the Wild	3D Conv + ResNet-18 + DC-TCN + KD (Ensemble) (Word Boundary)	See all
LRS2	CTC/Attention	See all
LRS3-TED	CTC/Attention	See all
CAS-VSR-W1k (LRW-1000)	3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR (Word Boundary)	See all
GRID corpus (mixed-speech)	CTC/Attention	See all
CMLR	CTC/Attention	See all
LRW-1000	3D Conv + ResNet-34 + Bi-GRU	See all

Libraries

Use these libraries to find Lipreading models and implementations

facebookresearch/av_hubert

2 papers

778

Datasets

Most implemented papers

Most implemented Social Latest No code

Lip Reading Sentences in the Wild

parambadiger/Lip-Reading • CVPR 2017

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Paper
Code

Deep word embeddings for visual speech recognition

tstafylakis/Lipreading-ResNet • • 30 Oct 2017

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

Paper
Code

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

around-star/Speech-Recognition • 8 Nov 2019

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.

Paper
Code

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

zju-vipa/KamalEngine • • 26 Nov 2019

In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers.

Paper
Code

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

sailordiary/deep-face-vsr • • 6 Mar 2020

Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).

Paper
Code

Deformation Flow Based Two-Stream Network for Lip Reading

jingyunx/Deformation-Flow-Based-Two-stream-Network • • 12 Mar 2020

Observing on the continuity in adjacent frames in the speaking process, and the consistency of the motion patterns among different speakers when they pronounce the same phoneme, we model the lip movements in the speaking process as a sequence of apparent deformations in the lip region.

Paper
Code

Mutual Information Maximization for Effective Lip Reading

xing96/MIM-lipreading • • 13 Mar 2020

By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading.

Paper
Code

SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading

perathambkk/lipreading • • 21 May 2020

The experiments show that our proposed model outperforms various state-of-the-art models and incorporating the memory augmented lateral transformers makes a 3. 7% improvement to the SpotFast networks.

Paper
Code

Towards Practical Lipreading with Distilled and Efficient Models

mpc001/Lipreading_using_Temporal_Convolutional_Networks • • 13 Jul 2020

However, our most promising lightweight models are on par with the current state-of-the-art while showing a reduction of 8. 2x and 3. 9x in terms of computational cost and number of parameters, respectively, which we hope will enable the deployment of lipreading models in practical applications.

Paper
Code

Learn an Effective Lip Reading Model without Pains

Fengdalu/learn-an-effective-lip-reading-model-without-pains • • 15 Nov 2020

Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.

Paper
Code

Lipreading

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result