Lip Reading

46 papers with code • 3 benchmarks • 5 datasets

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Benchmarks

Add a Result

These leaderboards are used to track progress in Lip Reading

Dataset	Best Model	Compare
GRID corpus (mixed-speech)	Lip2Wav	See all
TCD-TIMIT corpus (mixed-speech)	Lip2Wav	See all
LRW	Lip2Wav	See all

Datasets

Subtasks

Lip password classification

Latest papers with no code

Most implemented Social Latest No code

Leveraging Visemes for Better Visual Speech Representation and Lip Reading

no code yet • 19 Jul 2023

We evaluate our approach on various tasks, including word-level and sentence-level lip reading, and audiovisual speech recognition using the Arman-AV dataset, a largescale Persian corpus.

Paper
Add Code

Emotional Speech-Driven Animation with Content-Emotion Disentanglement

no code yet • 15 Jun 2023

While the best recent methods generate 3D animations that are synchronized with the input audio, they largely ignore the impact of emotions on facial expressions.

Paper
Add Code

Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition

no code yet • 30 Apr 2023

In low-resource computing contexts, such as smartphones and other tiny devices, Both deep learning and machine learning are being used in a lot of identification systems.

Paper
Add Code

PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

no code yet • 11 Apr 2023

Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing.

Paper
Add Code

Word-level Persian Lipreading Dataset

no code yet • 8 Apr 2023

Lip-reading has made impressive progress in recent years, driven by advances in deep learning.

Paper
Add Code

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

no code yet • CVPR 2023

Furthermore, when combined with large-scale pseudo-labeled audio-visual data SynthVSR yields a new state-of-the-art VSR WER of 16. 9% using publicly available data only, surpassing the recent state-of-the-art approaches trained with 29 times more non-public machine-transcribed video data (90, 000 hours).

Paper
Add Code

A large-scale multimodal dataset of human speech recognition

no code yet • 15 Mar 2023

The dataset has been validated and has potential for the research of lip reading and multimodal speech recognition.

Paper
Add Code

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

no code yet • Sensors 2023

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise.

Paper
Add Code

A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset

no code yet • 21 Jan 2023

In addition, we have proposed a technique to detect visemes (a visual equivalent of a phoneme) in Persian.

Paper
Add Code

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

no code yet • 10 Jan 2023

Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing using a denoising diffusion model.

Paper
Add Code

Lip Reading

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result