Lip Reading

46 papers with code • 3 benchmarks • 5 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

sally-sh/vsp-llm 23 Feb 2024

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements.

271
23 Feb 2024

Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism

g-milis/NEUTART 11 Dec 2023

Our method, which we call NEUral Text to ARticulate Talk (NEUTART), is a talking face generator that uses a joint audiovisual feature space, as well as speech-informed 3D facial reconstructions and a lip-reading loss for visual supervision.

22
11 Dec 2023

Do VSR Models Generalize Beyond LRS3?

yasserdahouml/vsr_test_set 23 Nov 2023

The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years.

7
23 Nov 2023

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

jinchiniao/LSHUC 8 Oct 2023

For deep layers where both the speaker's features and the speech content features are all expressed well, we introduce the speaker-adaptive features to learn for suppressing the speech content irrelevant noise for robust lip reading.

4
08 Oct 2023

SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces

psyai-net/SelfTalk_release 19 Jun 2023

To enhance the visual accuracy of generated lip movement while reducing the dependence on labeled data, we propose a novel framework SelfTalk, by involving self-supervision in a cross-modals network system to learn 3D talking faces.

106
19 Jun 2023

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment

exgc/opensr 10 Jun 2023

We demonstrate that OpenSR enables modality transfer from one to any in three different settings (zero-, few- and full-shot), and achieves highly competitive zero-shot performance compared to the existing few-shot and full-shot lip-reading methods.

14
10 Jun 2023

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

yochaiye/LipVoicer 5 Jun 2023

We then condition a diffusion model on the video and use the extracted text through a classifier-guidance mechanism where a pre-trained ASR serves as the classifier.

16
05 Jun 2023

A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer Corpus

lufei321/resync-cs 5 Jun 2023

Cued Speech (CS) is a multi-modal visual coding system combining lip reading with several hand cues at the phonetic level to make the spoken language visible to the hearing impaired.

0
05 Jun 2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

sxjdwang/talklip CVPR 2023

To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results.

351
29 Mar 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

rongjiehuang/transpeech ICCV 2023

However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech.

157
09 Mar 2023