Unconstrained Lip-synchronization

4 papers with code • 3 benchmarks • 3 datasets

Given a video of an arbitrary person, and an arbitrary driving speech, the task is to generate a lip-synced video that matches the given speech.

This task requires the approach to not be constrained by identity, voice, or language.

Most implemented papers

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Rudrabha/Wav2Lip 23 Aug 2020

However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.

You said that?

joonson/yousaidthat 8 May 2017

To achieve this we propose an encoder-decoder CNN model that uses a joint embedding of the face and audio to generate synthesised talking face video frames.

Towards Automatic Face-to-Face Translation

Rudrabha/LipGAN ACM Multimedia, 2019 2019

As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.

MARLIN: Masked Autoencoder for facial video Representation LearnINg

ControlNet/MARLIN CVPR 2023

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).