Visual Speech Recognition

40 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Speech Recognition

Trend	Dataset	Best Model	Paper	Code	Compare
	LRS3-TED	CTC/Attention			See all
	LRS2	VTP with more data			See all

Datasets

Subtasks

Lip to Speech Synthesis

Most implemented papers

Most implemented Social Latest No code

Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition

midas-research/DECA • • 29 Jan 2019

To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases.

Paper
Code

Recurrent Neural Network Transducer for Audio-Visual Speech Recognition

around-star/Speech-Recognition • 8 Nov 2019

This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture.

Paper
Code

Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition

sailordiary/deep-face-vsr • • 6 Mar 2020

Recent advances in deep learning have heightened interest among researchers in the field of visual speech recognition (VSR).

Paper
Code

How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

georgesterpu/Sigmedia-AVSR • • 17 Apr 2020

A recently proposed multimodal fusion strategy, AV Align, based on state-of-the-art sequence to sequence neural networks, attempts to model this relationship by explicitly aligning the acoustic and visual representations of speech.

Paper
Code

Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

georgesterpu/Taris • • 19 May 2020

The audio-visual speech fusion strategy AV Align has shown significant performance improvements in audio-visual speech recognition (AVSR) on the challenging LRS2 dataset.

Paper
Code

Learn an Effective Lip Reading Model without Pains

Fengdalu/learn-an-effective-lip-reading-model-without-pains • • 15 Nov 2020

Considering the non-negligible effects of these strategies and the existing tough status to train an effective lip reading model, we perform a comprehensive quantitative study and comparative analysis, for the first time, to show the effects of several different choices for lip reading.

Paper
Code

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

ahaliassos/lipforensics • • CVPR 2021

Extensive experiments show that this simple approach significantly surpasses the state-of-the-art in terms of generalisation to unseen manipulations and robustness to perturbations, as well as shed light on the factors responsible for its performance.

Paper
Code

AV Taris: Online Audio-Visual Speech Recognition

georgesterpu/Taris • • 14 Dec 2020

In recent years, Automatic Speech Recognition (ASR) technology has approached human-level performance on conversational speech under relatively clean listening conditions.

Paper
Code

Robust Self-Supervised Audio-Visual Speech Recognition

facebookresearch/av_hubert • • 5 Jan 2022

Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe.

Paper
Code

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

hltchkust/ci-avsr • • 11 Jan 2022

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Paper
Code

Visual Speech Recognition

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result