A large-scale reference dataset for bioacoustics. MeerKAT is a 1068h large-scale dataset containing data from audio-recording collars worn by free-ranging meerkats (Suricata suricatta) at the Kalahari Research Centre, South Africa, of which 184h are labeled with twelve time-resolved vocalization-type ground truth target classes, each with millisecond resolution. The labeled 184h MeerKAT subset exhibits realistic sparsity conditions for a bioacoustic dataset (96% background-noise or other signals and 4% vocalizations), dispersed across 66398 10-second samples, spanning 251562 labeled events and showcasing significant spectral and temporal variability, making it the first large-scale reference point with real-world conditions for benchmarking pretraining and finetune approaches in bioacoustics deep learning.
Paper | Code | Results | Date | Stars |
---|