Visual Speech Recognition
40 papers with code • 2 benchmarks • 5 datasets
Latest papers with no code
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
Different studies have shown the importance of visual cues throughout the speech perception process.
Analysis of Visual Features for Continuous Lipreading in Spanish
In this paper, we propose an analysis of different speech visual features with the intention of identifying which of them is the best approach to capture the nature of lip movements for natural Spanish and, in this way, dealing with the automatic visual speech recognition task.
End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition
Lip reading or visual speech recognition has gained significant attention in recent years, particularly because of hardware development and innovations in computer vision.
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR).
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements.
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
Visual Speech Recognition (VSR) differs from the common perception tasks as it requires deeper reasoning over the video sequence, even by human experts.
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent.
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey
We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence.