Lipreading
20 papers with code • 6 benchmarks • 6 datasets
Most implemented papers
LipNet: End-to-End Sentence-level Lipreading
Lipreading is the task of decoding text from the movement of a speaker's mouth.
Combining Residual Networks with LSTMs for Lipreading
We propose an end-to-end deep learning architecture for word-level visual speech recognition.
End-to-end Audiovisual Speech Recognition
In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.
Deep Audio-Visual Speech Recognition
The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.
LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild
It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.
Lipreading using Temporal Convolutional Networks
We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.
Discriminative Multi-modality Speech Recognition
Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.
Deep word embeddings for visual speech recognition
In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).