20 papers with code • 6 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

LipNet: End-to-End Sentence-level Lipreading

rizkiarm/LipNet 5 Nov 2016

Lipreading is the task of decoding text from the movement of a speaker's mouth.

Combining Residual Networks with LSTMs for Lipreading

tstafylakis/Lipreading-ResNet 12 Mar 2017

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

End-to-end Audiovisual Speech Recognition

mpc001/end-to-end-Lipreading 18 Feb 2018

In presence of high levels of noise, the end-to-end audiovisual model significantly outperforms both audio-only models.

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Fengdalu/Lipreading-DenseNet3D 16 Oct 2018

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Lipreading using Temporal Convolutional Networks

mpc001/Lipreading_using_Temporal_Convolutional_Networks 23 Jan 2020

We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively.

Discriminative Multi-modality Speech Recognition

JackSyu/Discriminative-Multi-modality-Speech-Recognition CVPR 2020

Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.

Deep word embeddings for visual speech recognition

tstafylakis/Lipreading-ResNet 30 Oct 2017

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

around-star/Speech-Recognition 8 Nov 2019

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

sailordiary/deep-face-vsr 6 Mar 2020

Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).