ACAV100M (Automatically Curated Audio-Visual)

Introduced by Lee et al. in ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

ACAV100M processes 140 million full-length videos (total duration 1,030 years) which are used to produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence. This is two orders of magnitude larger than the current largest video dataset used in the audio-visual learning literature, i.e., AudioSet (8 months), and twice as large as the largest video dataset in the literature, i.e., HowTo100M (15 years).

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


Modalities


Languages