Lipreading

30 papers with code • 7 benchmarks • 6 datasets

Lipreading is a process of extracting speech by watching lip movements of a speaker in the absence of sound. Humans lipread all the time without even noticing. It is a big part in communication albeit not as dominant as audio. It is a very helpful skill to learn especially for those who are hard of hearing.

Deep Lipreading is the process of extracting speech from a video of a silent talking face using deep neural networks. It is also known by few other names: Visual Speech Recognition (VSR), Machine Lipreading, Automatic Lipreading etc.

The primary methodology involves two stages: i) Extracting visual and temporal features from a sequence of image frames from a silent talking video ii) Processing the sequence of features into units of speech e.g. characters, words, phrases etc. We can find several implementations of this methodology either done in two separate stages or trained end-to-end in one go.

Libraries

Use these libraries to find Lipreading models and implementations

Latest papers with no code

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

no code yet • 8 Apr 2024

Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.

Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading

no code yet • 18 Feb 2024

Lipreading involves using visual data to recognize spoken words by analyzing the movements of the lips and surrounding area.

Analysis of Visual Features for Continuous Lipreading in Spanish

no code yet • 21 Nov 2023

In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task.

Investigating the dynamics of hand and lips in French Cued Speech using attention mechanisms and CTC-based decoding

no code yet • 14 Jun 2023

Along with the release of this dataset, a benchmark will be reported for word-level recognition, a novelty in the automatic recognition of French CS.

Audio-Visual Speech Enhancement with Score-Based Generative Models

no code yet • 2 Jun 2023

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information.

Word-level Persian Lipreading Dataset

no code yet • 8 Apr 2023

Lip-reading has made impressive progress in recent years, driven by advances in deep learning.

LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers

no code yet • 4 Feb 2023

However, generalizing these methods to unseen speakers incurs catastrophic performance degradation due to the limited number of speakers in training bank and the evident visual variations caused by the shape/color of lips for different speakers.

Visual Speech Recognition in a Driver Assistance System

no code yet • 30th European Signal Processing Conference (EUSIPCO) 2022

After a comprehensive evaluation, we adapt the developed method and test it on the collected RUSAVIC corpus we recorded in-the-wild for vehicle driver.

Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale

no code yet • 21 Aug 2022

Because of the manual pipeline, such platforms are also limited in vocabulary, supported languages, accents, and speakers and have a high usage cost.

Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models

no code yet • 5 Jun 2022

In this work, we propose a technique to transfer speech recognition capabilities from audio speech recognition systems to visual speech recognizers, where our goal is to utilize audio data during lipreading model training.