AVSBench is a pixel-level audio-visual segmentation benchmark that provides ground truth labels for sounding objects. Accordingly, three settings are studied: 1) semi-supervised audio-visual segmentation with a single sound source 2) fully-supervised audio-visual segmentation with multiple sound sources 3) fully-supervised audio-visual semantic segmentation
10 PAPERS • NO BENCHMARKS YET