Visual Speech Recognition

40 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

no code yet • 21 Nov 2023

Different studies have shown the importance of visual cues throughout the speech perception process.

Analysis of Visual Features for Continuous Lipreading in Spanish

no code yet • 21 Nov 2023

In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task.

End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition

no code yet • 7 Oct 2023

Lip reading or visual speech recognition has gained significant attention in recent years, particularly because of hardware development and innovations in computer vision.

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

no code yet • 29 Sep 2023

Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR).

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

no code yet • 15 Sep 2023

Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code yet • 15 Sep 2023

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

no code yet • 15 Aug 2023

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements.

Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping

no code yet • ICCV 2023

Visual Speech Recognition (VSR) differs from the common perception tasks as it requires deeper reasoning over the video sequence, even by human experts.

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

no code yet • 10 Jul 2023

We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.

Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey

no code yet • 14 Jun 2023

We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence.