This task deals with lip-syncing a video (or) an image to the desired target speech. Approaches in this task work only for a specific (limited set) of identities, languages, speech/voice. See also: Unconstrained lip-synchronization - https://paperswithcode.com/task/lip-sync
The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8. 97% equal error rate on high quality Deepfakes.