Audio

Audio-Visual Synchronization

8 papers with code • 0 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio-Visual Synchronization

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Most implemented papers

Most implemented Social Latest No code

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

v-iashin/sparsesync • • 13 Oct 2022

This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.

Paper
Code

Multimodal Transformer Distillation for Audio-Visual Synchronization

vskadandale/vocalist • • 27 Oct 2022

This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss.

Paper
Code

Synchformer: Efficient Synchronization from Sparse Cues

v-iashin/synchformer • • 29 Jan 2024

Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse.

Paper
Code

Solos: A Dataset for Audio-Visual Music Analysis

JuanFMontesinos/Solos • • 14 Jun 2020

In this paper, we present a new dataset of music performance videos which can be used for training machine learning methods for multiple tasks such as audio-visual blind source separation and localization, cross-modal correspondences, cross-modal generation and, in general, any audio-visual selfsupervised task.

Paper
Code

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

maxrmorrison/clpcnet • 5 Oct 2021

Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis.

Paper
Code

VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

vskadandale/vocalist • • 5 Apr 2022

Finally, we use the frozen visual features learned by our lip synchronisation model in the singing voice separation task to outperform a baseline audio-visual model which was trained end-to-end.

Paper
Code

Target Active Speaker Detection with Audio-visual Cues

jiang-yidi/ts-talknet • • 22 May 2023

To benefit from both facial cue and reference speech, we propose the Target Speaker TalkNet (TS-TalkNet), which leverages a pre-enrolled speaker embedding to complement the audio-visual synchronization cue in detecting whether the target speaker is speaking.

Paper
Code

PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

amazon-science/avgen-eval-toolkit • • 10 Apr 2024

Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks.

Paper
Code

Audio-Visual Synchronization

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result