A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

23 Aug 2020K R PrajwalRudrabha MukhopadhyayVinay NamboodiriC V Jawahar

In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary identity to match a target speech segment. Current works excel at producing accurate lip movements on a static image or videos of specific people seen during the training phase... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Unconstrained Lip-synchronization LRS2 Wav2Lip LSE-D 6.386 # 2
LSE-C 7.789 # 1
FID 4.887 # 2
Unconstrained Lip-synchronization LRS2 Wav2Lip + GAN LSE-D 6.469 # 1
FID 4.446 # 1
Unconstrained Lip-synchronization LRS3 Wav2Lip + GAN LSE-D 6.986 # 1
LSE-C 7.574 # 2
FID 4.35 # 1
Unconstrained Lip-synchronization LRS3 Wav2Lip LSE-D 6.652 # 2
LSE-C 7.887 # 1
FID 4.844 # 2
Unconstrained Lip-synchronization LRW Wav2Lip LSE-D 6.512 # 2
LSE-C 7.49 # 1
FID 3.189 # 2
Unconstrained Lip-synchronization LRW Wav2Lip + GAN LSE-D 6.774 # 1
LSE-C 7.263 # 2
FID 2.475 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet