Visual Speech Recognition

18 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Combining Residual Networks with LSTMs for Lipreading

tstafylakis/Lipreading-ResNet 12 Mar 2017

We propose an end-to-end deep learning architecture for word-level visual speech recognition.

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild

Fengdalu/Lipreading-DenseNet3D 16 Oct 2018

It has shown a large variation in this benchmark in several aspects, including the number of samples in each class, video resolution, lighting conditions, and speakers' attributes such as pose, age, gender, and make-up.

Deep word embeddings for visual speech recognition

tstafylakis/Lipreading-ResNet 30 Oct 2017

In this paper we present a deep learning architecture for extracting word embeddings for visual speech recognition.

Zero-shot keyword spotting for visual speech recognition in-the-wild

lilianemomeni/KWS-Net ECCV 2018

Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information.

Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition

midas-research/DECA 29 Jan 2019

To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases.

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

around-star/Speech-Recognition 8 Nov 2019

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

sailordiary/deep-face-vsr 6 Mar 2020

Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

georgesterpu/Sigmedia-AVSR 17 Apr 2020

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

georgesterpu/Taris 19 May 2020

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.