Lipreading

30 papers with code • 7 benchmarks • 6 datasets

Lipreading is a process of extracting speech by watching lip movements of a speaker in the absence of sound. Humans lipread all the time without even noticing. It is a big part in communication albeit not as dominant as audio. It is a very helpful skill to learn especially for those who are hard of hearing.

Deep Lipreading is the process of extracting speech from a video of a silent talking face using deep neural networks. It is also known by few other names: Visual Speech Recognition (VSR), Machine Lipreading, Automatic Lipreading etc.

The primary methodology involves two stages: i) Extracting visual and temporal features from a sequence of image frames from a silent talking video ii) Processing the sequence of features into units of speech e.g. characters, words, phrases etc. We can find several implementations of this methodology either done in two separate stages or trained end-to-end in one go.

Benchmarks

Add a Result

These leaderboards are used to track progress in Lipreading

Dataset	Best Model	Compare
Lip Reading in the Wild	3D Conv + ResNet-18 + DC-TCN + KD (Ensemble) (Word Boundary)	See all
LRS2	CTC/Attention	See all
LRS3-TED	CTC/Attention	See all
CAS-VSR-W1k (LRW-1000)	3D-ResNet + Bi-GRU + MixUp + Label Smooth + Cosine LR (Word Boundary)	See all
GRID corpus (mixed-speech)	CTC/Attention	See all
CMLR	CTC/Attention	See all
LRW-1000	3D Conv + ResNet-34 + Bi-GRU	See all

Libraries

Use these libraries to find Lipreading models and implementations

facebookresearch/av_hubert

2 papers

779

Datasets

Latest papers with no code

Most implemented Social Latest No code

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

no code yet • 8 Apr 2024

Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.

Paper
Add Code

Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading

no code yet • 18 Feb 2024

Lipreading involves using visual data to recognize spoken words by analyzing the movements of the lips and surrounding area.

Paper
Add Code

Analysis of Visual Features for Continuous Lipreading in Spanish

no code yet • 21 Nov 2023

In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task.

Paper
Add Code

Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding

no code yet • 14 Jun 2023

Along with the release of this dataset, a benchmark will be reported for word-level recognition, a novelty in the automatic recognition of French CS.

Paper
Add Code

Audio-Visual Speech Enhancement with Score-Based Generative Models

no code yet • 2 Jun 2023

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information.

Paper
Add Code

Word-level Persian Lipreading Dataset

no code yet • 8 Apr 2023

Lip-reading has made impressive progress in recent years, driven by advances in deep learning.

Paper
Add Code

LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers

no code yet • 4 Feb 2023

However, generalizing these methods to unseen speakers incurs catastrophic performance degradation due to the limited number of speakers in training bank and the evident visual variations caused by the shape/color of lips for different speakers.

Paper
Add Code

Visual Speech Recognition in a Driver Assistance System

no code yet • 30th European Signal Processing Conference (EUSIPCO) 2022

After a comprehensive evaluation, we adapt the developed method and test it on the collected RUSAVIC corpus we recorded in-the-wild for vehicle driver.

Paper
Add Code

Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale

no code yet • 21 Aug 2022

Because of the manual pipeline, such platforms are also limited in vocabulary, supported languages, accents, and speakers and have a high usage cost.

Paper
Add Code

Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models

no code yet • 5 Jun 2022

In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training.

Paper
Add Code

Lipreading

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result