Speaker-Specific Lip to Speech Synthesis

3 papers with code • 7 benchmarks • 2 datasets

How accurately can we infer an individual’s speech style and content from his/her lip movements? [1]

In this task, the model is trained on a specific speaker, or a very limited set of speakers.

[1] Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis, CVPR 2020.

Benchmarks

Add a Result

These leaderboards are used to track progress in Speaker-Specific Lip to Speech Synthesis

Dataset	Best Model	Compare
GRID corpus (mixed-speech)	Visual Voice Memory	See all
Lip2Wav (EH)	Visual Voice Memory	See all
Lip2Wav (HS)	Visual Voice Memory	See all
Lip2Wav (DL)	Visual Voice Memory	See all
Lip2Wav (Chess)	Visual Voice Memory	See all
Lip2Wav (Chem)	Visual Voice Memory	See all
TCD-TIMIT corpus (mixed-speech)	Lip2Wav	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Densely Connected Convolutional Networks

liuzhuang13/DenseNet • • CVPR 2017

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output.

143

Paper
Code

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

Rudrabha/Lip2Wav • • CVPR 2020

In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.

Paper
Code

Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

joannahong/Speech-Reconstruction-with-Reminiscent-Sound-via-Visual-Voice-Memory • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021

Our key contributions are: (1) proposing the Visual Voice memory that brings rich information of audio that complements the visual features, thus producing high-quality speech from silent video, and (2) enabling multi-speaker and unseen speaker training by memorizing auditory features and the corresponding visual features.

Paper
Code

Speaker-Specific Lip to Speech Synthesis

Benchmarks Add a Result

Datasets

Most implemented papers

Densely Connected Convolutional Networks

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

Content

Benchmarks

Add a Result