Introduced by Chen et al. in Audio-Visual Synchronisation in the wild

VGG-Sound Sync is an audio-visual synchronisation benchmark based on videos collected from YouTube. VGG-Sound Sync contains over 100k video clips, spanning 160 classes and can be downloaded here.

Note, only the test clips are included here, please use the training clips in the original VGG-Sound to train your models ( classes are same with the ones in the test clips). Each line in the json file has been defined by:

# YouTube ID, start seconds, label


