Videos

ACAV100M (Automatically Curated Audio-Visual)

Introduced by Lee et al. in ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning

ACAV100M processes 140 million full-length videos (total duration 1,030 years) which are used to produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence. This is two orders of magnitude larger than the current largest video dataset used in the audio-visual learning literature, i.e., AudioSet (8 months), and twice as large as the largest video dataset in the literature, i.e., HowTo100M (15 years).

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

sangho-vision/acav100m

Tasks

Self-Supervised Learning

ACAV100M (Automatically Curated Audio-Visual)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

UnAV-100

VGG-Sound

Usage

License

Modalities

Languages

ACAV100M (Automatically Curated Audio-Visual)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

UnAV-100

VGG-Sound

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages