Speaker-Specific Lip to Speech Synthesis
3 papers with code • 7 benchmarks • 2 datasets
How accurately can we infer an individual’s speech style and content from his/her lip movements? [1]
In this task, the model is trained on a specific speaker, or a very limited set of speakers.
[1] Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis, CVPR 2020.
Most implemented papers
Densely Connected Convolutional Networks
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output.
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.
Speech Reconstruction with Reminiscent Sound via Visual Voice Memory
Our key contributions are: (1) proposing the Visual Voice memory that brings rich information of audio that complements the visual features, thus producing high-quality speech from silent video, and (2) enabling multi-speaker and unseen speaker training by memorizing auditory features and the corresponding visual features.