Audio-Visual Synchronization

8 papers with code • 0 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code yet • 22 Jan 2024

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

Comparative Analysis of Deep-Fake Algorithms

no code yet • 6 Sep 2023

We examine the various deep learning-based approaches used for creating deepfakes, as well as the techniques used for detecting them.

Audio-driven Talking Face Generation by Overcoming Unintended Information Flow

no code yet • 18 Jul 2023

Specifically, this involves unintended flow of lip, pose and other information from the reference to the generated image, as well as instabilities during model training.

On the Audio-visual Synchronization for Lip-to-Speech Synthesis

no code yet • ICCV 2023

Most lip-to-speech (LTS) synthesis models are trained and evaluated under the assumption that the audio-video pairs in the dataset are perfectly synchronized.

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

no code yet • 2 Nov 2022

It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time.

Rethinking Audio-visual Synchronization for Active Speaker Detection

no code yet • 21 Jun 2022

This clarification of definition is motivated by our extensive experiments, through which we discover that existing ASD methods fail in modeling the audio-visual synchronization and often classify unsynchronized videos as active speaking.

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation

no code yet • CVPR 2021

In this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing.

Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

no code yet • 13 Aug 2020

When watching videos, the occurrence of a visual event is often accompanied by an audio event, e. g., the voice of lip motion, the music of playing instruments.

Identity-Preserving Realistic Talking Face Generation

no code yet • 25 May 2020

The necessary attributes of having a realistic face animation are 1) audio-visual synchronization (2) identity preservation of the target individual (3) plausible mouth movements (4) presence of natural eye blinks.

Realistic Speech-Driven Facial Animation with GANs

no code yet • 14 Jun 2019

We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features.