Datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae)

This dataset contains recordings of 32 sound producing insect species with a total 335 files and a length of 57 minutes. The dataset was compiled for training neural networks to automatically identify insect species while comparing adaptive, waveform-based frontends to conventional mel-spectrogram frontends for audio feature extraction. This work will be submitted for publication in the future and this dataset can be used to replicate the results, as well as other uses. The scripts for audio processing and the machine learning implementations will be published on Github.

The recordings are split into two datasets. Roughly half of the recordings (147) are of nine species belonging to the order Orthoptera. These recordings stem from a dataset that was originally compiled by Baudewijn Odé (unpublished).

The remaining recordings (188) are of 23 species in the family Cicadidae. These recordings were selected from the Global Cicada Sound Collection hosted on Bioacoustica (doi.org/10.1093/database/bav054), including recordings published in doi.org/10.3897/BDJ.3.e5792 & doi.org/10.11646/zootaxa.4340.1. Many recordings from this collection included speech annotations in the beginning of the recordings, therefore the last ten seconds of audio were extracted and used in this dataset.

All files were manually inspected and files with strong noise interference or with sounds of multiple species were removed. Between species, the number of files ranges from four to 22 files and the length from 40 seconds to almost nine minutes of audio material for a single species. The files range in length from less than one second to several minutes. All original files were available with sample rates of at least 44.1 kHz or higher but were resampled to 44.1 kHz mono WAV files for consistency. The annotation files contain information for each recording, including the file name, species name and identifier, as well as the data subset they were included in for training the neural network (training, test, validation).

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Creative Commons Attribution 4.0 International

Modalities


Languages