Lip to Speech Synthesis
3 papers with code • 1 benchmarks • 2 datasets
Given a silent video of a speaker, generate the corresponding speech that matches the lip movements.
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.
In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.