Lip to Speech Synthesis

5 papers with code • 1 benchmarks • 2 datasets

Given a silent video of a speaker, generate the corresponding speech that matches the lip movements.

Benchmarks

Add a Result

These leaderboards are used to track progress in Lip to Speech Synthesis

Trend	Dataset	Best Model	Paper	Code	Compare
	LRW	Lip2Wav			See all

Datasets

LRW
GLips

Subtasks

Speaker-Specific Lip to Speech Synthesis

Most implemented papers

Most implemented Social Latest No code

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

ms-dot-k/Lip-to-Speech-Synthesis-in-the-Wild • • 17 Feb 2023

To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.

Paper
Code

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

Rudrabha/Lip2Wav • • CVPR 2020

In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.

Paper
Code

Lip to Speech Synthesis with Visual Context Attentional GAN

ms-dot-k/Visual-Context-Attentional-GAN • • NeurIPS 2021

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.

Paper
Code

Show Me Your Face, And I'll Tell You How You Speak

chris10m/lip2speech • • 28 Jun 2022

When we speak, the prosody and content of the speech can be inferred from the movement of our lips.

Paper
Code

Intelligible Lip-to-Speech Synthesis with Speech Units

choijeongsoo/lip2speech-unit • • 31 May 2023

Therefore, the proposed L2S model is trained to generate multiple targets, mel-spectrogram and speech units.

Paper
Code

Lip to Speech Synthesis

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

Lip to Speech Synthesis with Visual Context Attentional GAN

Show Me Your Face, And I'll Tell You How You Speak

Intelligible Lip-to-Speech Synthesis with Speech Units

Content

Benchmarks

Add a Result