Visual Speech Recognition

40 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition

midas-research/DECA 29 Jan 2019

To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases.

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

around-star/Speech-Recognition 8 Nov 2019

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

sailordiary/deep-face-vsr 6 Mar 2020

Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

georgesterpu/Sigmedia-AVSR 17 Apr 2020

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

georgesterpu/Taris 19 May 2020

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.

Learn an Effective Lip Reading Model without Pains

Fengdalu/learn-an-effective-lip-reading-model-without-pains 15 Nov 2020

Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

ahaliassos/lipforensics CVPR 2021

Extensive experiments show that this simple approach significantly surpasses the state-of-the-art in terms of generalisation to unseen manipulations and robustness to perturbations, as well as shed light on the factors responsible for its performance.

AV Taris: Online Audio-Visual Speech Recognition

georgesterpu/Taris 14 Dec 2020

In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.

Robust Self-Supervised Audio-Visual Speech Recognition

facebookresearch/av_hubert 5 Jan 2022

Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe.

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

hltchkust/ci-avsr 11 Jan 2022

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.