Lip Reading

46 papers with code • 3 benchmarks • 5 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Latest papers with no code

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

no code yet • 19 Jul 2023

We evaluate our approach on various tasks, including word-level and sentence-level lip reading, and audiovisual speech recognition using the Arman-AV dataset, a largescale Persian corpus.

Emotional Speech-Driven Animation with Content-Emotion Disentanglement

no code yet • 15 Jun 2023

While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the impact of emotions on facial expressions.

Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition

no code yet • 30 Apr 2023

In low-resource computing contexts, such as smartphones and other tiny devices, Both deep learning and machine learning are being used in a lot of identification systems.

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

no code yet • 11 Apr 2023

Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing.

Word-level Persian Lipreading Dataset

no code yet • 8 Apr 2023

Lip-reading has made impressive progress in recent years, driven by advances in deep learning.

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code yet • CVPR 2023

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

A large-scale multimodal dataset of human speech recognition

no code yet • 15 Mar 2023

The dataset has been validated and has potential for the research of lip reading and multimodal speech recognition.

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

no code yet • Sensors 2023

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise.

A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset

no code yet • 21 Jan 2023

In addition, we have proposed a technique to detect visemes (a visual equivalent of a phoneme) in Persian.

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

no code yet • 10 Jan 2023

Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing using a denoising diffusion model.