Lip to Speech Synthesis
6 papers with code • 1 benchmarks • 2 datasets
Given a silent video of a speaker, generate the corresponding speech that matches the lip movements.
Most implemented papers
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.
Lip to Speech Synthesis with Visual Context Attentional GAN
In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.
Show Me Your Face, And I'll Tell You How You Speak
When we speak, the prosody and content of the speech can be inferred from the movement of our lips.
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis
To tackle these problems, we propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency, and has a relatively small model size.
Intelligible Lip-to-Speech Synthesis with Speech Units
Therefore, the proposed L2S model is trained to generate multiple targets, mel-spectrogram and speech units.