Visual Speech Recognition

40 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Speech Recognition

Trend	Dataset	Best Model	Paper	Code	Compare
	LRS3-TED	CTC/Attention			See all
	LRS2	VTP with more data			See all

Datasets

Subtasks

Lip to Speech Synthesis

Latest papers with no code

Most implemented Social Latest No code

Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

no code yet • 21 Nov 2023

Different studies have shown the importance of visual cues throughout the speech perception process.

Paper
Add Code

Analysis of Visual Features for Continuous Lipreading in Spanish

no code yet • 21 Nov 2023

In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task.

Paper
Add Code

End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition

no code yet • 7 Oct 2023

Lip reading or visual speech recognition has gained significant attention in recent years, particularly because of hardware development and innovations in computer vision.

Paper
Add Code

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

no code yet • 29 Sep 2023

Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR).

Paper
Add Code

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

no code yet • 15 Sep 2023

Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.

Paper
Add Code

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code yet • 15 Sep 2023

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Paper
Add Code

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

no code yet • 15 Aug 2023

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements.

Paper
Add Code

Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping

no code yet • ICCV 2023

Visual Speech Recognition (VSR) differs from the common perception tasks as it requires deeper reasoning over the video sequence, even by human experts.

Paper
Add Code

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

no code yet • 10 Jul 2023

We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.

Paper
Add Code

Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey

no code yet • 14 Jun 2023

We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence.

Paper
Add Code

Visual Speech Recognition

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result