Visual Speech Recognition

40 papers with code • 2 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

no code yet • 21 Mar 2024

It is designed to maximize the benefits of limited multilingual AV pre-training data, by building on top of audio-only multilingual pre-training and simplifying existing pre-training schemes.

JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition

no code yet • 4 Mar 2024

Visual Speech Recognition (VSR) tasks are generally recognized to have a lower theoretical performance ceiling than Automatic Speech Recognition (ASR), owing to the inherent limitations of conveying semantic information visually.

Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition

no code yet • 20 Feb 2024

Thanks to the rise of deep learning and the availability of large-scale audio-visual databases, recent advances have been achieved in Visual Speech Recognition (VSR).

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

no code yet • 8 Feb 2024

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

no code yet • 18 Jan 2024

By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.

SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition

no code yet • 18 Jan 2024

Audio-visual speech recognition (AVSR) is a multimodal extension of automatic speech recognition (ASR), using video as a complement to audio.

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

no code yet • 7 Jan 2024

While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data

no code yet • 15 Dec 2023

This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model.

The GUA-Speech System Description for CNVSRC Challenge 2023

no code yet • 12 Dec 2023

This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023.

Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

no code yet • 21 Nov 2023

Different studies have shown the importance of visual cues throughout the speech perception process.