Lip Reading

34 papers with code • 3 benchmarks • 6 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Most implemented papers

Combining Residual Networks with LSTMs for Lipreading

tstafylakis/Lipreading-ResNet 12 Mar 2017

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Fengdalu/Lipreading-DenseNet3D 16 Oct 2018

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Lipreading using Temporal Convolutional Networks

mpc001/Lipreading_using_Temporal_Convolutional_Networks 23 Jan 2020

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements

Mohit-Mithra/Face-Recognition-Systems-with-lip-movement-pattern 4 Dec 2020

Biometric systems based on Machine learning and Deep learning are being extensively used as authentication mechanisms in resource-constrained environments like smartphones and other small computing devices.

End-to-end Audio-visual Speech Recognition with Conformers

zziz/pwc 12 Feb 2021

In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

facebookresearch/av_hubert ICLR 2022

The lip-reading WER is further reduced to 26. 9% when using all 433 hours of labeled data from LRS3 and combined with self-training.

Estimating speech from lip dynamics

Dirivian/dynamic_lips 3 Aug 2017

The goal of this project is to develop a limited lip reading algorithm for a subset of the English language.

XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification

catalina17/XFlow 2 Sep 2017

Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data.

Lip2AudSpec: Speech reconstruction from silent lip movements video

hassanhub/LipReading 26 Oct 2017

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos.