Lip Reading

26 papers with code • 3 benchmarks • 6 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Most implemented papers

Combining Residual Networks with LSTMs for Lipreading

tstafylakis/Lipreading-ResNet 12 Mar 2017

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Fengdalu/Lipreading-DenseNet3D 16 Oct 2018

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Lipreading using Temporal Convolutional Networks

mpc001/Lipreading_using_Temporal_Convolutional_Networks 23 Jan 2020

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements

Mohit-Mithra/Face-Recognition-Systems-with-lip-movement-pattern 4 Dec 2020

Biometric systems based on Machine learning and Deep learning are being extensively used as authentication mechanisms in resource-constrained environments like smartphones and other small computing devices.

Estimating speech from lip dynamics

Dirivian/dynamic_lips 3 Aug 2017

The goal of this project is to develop a limited lip reading algorithm for a subset of the English language.

XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification

catalina17/XFlow 2 Sep 2017

Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data.

Lip2AudSpec: Speech reconstruction from silent lip movements video

hassanhub/LipReading 26 Oct 2017

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos.

End-to-End Speech-Driven Facial Animation with Temporal GANs

PrashanthaTP/wav2mov 23 May 2018

To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio.

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

Hangz-nju-cuhk/Talking-Face-Generation-DAVS 20 Jul 2018

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.