Unconstrained Lip-synchronization

4 papers with code • 3 benchmarks • 3 datasets

Given a video of an arbitrary person, and an arbitrary driving speech, the task is to generate a lip-synced video that matches the given speech.

This task requires the approach to not be constrained by identity, voice, or language.

Benchmarks

Add a Result

These leaderboards are used to track progress in Unconstrained Lip-synchronization

Dataset	Best Model	Compare
LRS2	Wav2Lip + ViT + MARLIN	See all
LRW	Wav2Lip + GAN	See all
LRS3	Wav2Lip + GAN	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Rudrabha/Wav2Lip • • 23 Aug 2020

However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.

Paper
Code

You said that?

joonson/yousaidthat • 8 May 2017

To achieve this we propose an encoder-decoder CNN model that uses a joint embedding of the face and audio to generate synthesised talking face video frames.

Paper
Code

Towards Automatic Face-to-Face Translation

Rudrabha/LipGAN • ACM Multimedia, 2019 2019

As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.

Paper
Code

MARLIN: Masked Autoencoder for facial video Representation LearnINg

ControlNet/MARLIN • • CVPR 2023

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).

Paper
Code

Unconstrained Lip-synchronization

Benchmarks Add a Result

Datasets

Most implemented papers

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

You said that?

Towards Automatic Face-to-Face Translation

MARLIN: Masked Autoencoder for facial video Representation LearnINg

Content

Benchmarks

Add a Result