PapioVoc (Guinea baboon vocalizations dataset automatically extracted with a deep neural network from natural audio recordings)

Introduced by Bonafos et al. in Detection and classification of vocal productions in large scale audio recordings

Abstract

The data collection process consisted of continuously recording during one month a group of Guinea baboons living in semi-liberty at the CNRS primatology center in Rousset-sur-Arc (France). Two microphones we placed nearby their enclosure to continuously record the sounds produced by the group. A convolutional neural network (CNN) was used on these large and noisy audio recordings to automatically extract segments of sound containing a baboon vocal production by following the method of Bonafos et al. (2023). The resulting dataset consists of one-second to several-minute wav files of automatically detected vocalizations segments. The dataset thus provides a wide range of baboon vocalizations produced at all times of the day. It can be used to study vocal productions of non-human primates, their repertoire, their distribution over the day, their frequency, and their heterogeneity. In addition to the analysis of animal communication, the dataset can also be used as a learning base for sound classification models.

Data acquisition

The data are audio recordings of baboons. The recordings were made with a H6 Zoom recorder, using the included XYH-6 stereo microphone. The sample size is 44100 Hertz, 16 bits. The microphones were placed in the vicinity of the enclosure for one month and recorded continuously on a PC computer. A CNN passed over the data with a sliding window of 1 second and an overlap of 80% to detect the vocal productions of the baboons. The dataset consists of the segments predicted by the CNN to contain a baboon vocalization. Windows containing signal less than one second apart were merged into a single vocalization.

Data source location

Institution: CNRS, Primate Facility

City/Town/Region: Rousset-sur-Arc

Country: France

Latitude and longitude for collected samples/data: 43.47033535251509, 5.6514732876668905

Value of the data

This dataset is relatively unique in terms of the quantity of vocalizations available.

This massive dataset can be very useful to two types of scientific communities: experts in primatology who study the vocal productions of non-human primates, and experts in data science and audio signal processing.

The machine learning research community has at its disposal a database of several dozen hours of animal vocalizations, which will make it possible to build up a large learning base, very useful for Environemental Sound Recognition tasks, for example.

Objective

This dataset is a follow-up of two studies on the vocal productions of Guinea baboons (Papio papio) in which we carried out analyses of their vocal productions on the basis of a relatively large vocalization sample containing around 1300 vocalizations (Boë, Berthommier, Legou, Captier, Kemp, Sawallis, Becker, Rey, & Fagot, 2017; Kemp, Rey, Legou, Boë, Berthommier, Becker, & Fagot, 2017). The aim was to collect a larger database using the technique of deep convolutional neural networks in order to 1) automatically detect vocal productions in a large continuous audio recording and 2) perform a categorization of these vocalizations on a more massive sample. A description of the pipeline that enabled these automatic detections and categorizations is given in Bonafos, Pudlo, Freyermuth, Legou, Fagot, Tronçon, & Rey (2023).

Data description

The data is a set of audio files in wav format. They are at least one second long (the size of the window), up to several minutes, if several windows are consecutively predicted as containing signal. Moreover, we add the labeled data we used to train the CNN which did the prediction. We also provide two hours of the continuous recordings to have an idea of the continuous recordings and test the code of the paper provided on gitlab.

In addition, there is a database in csv format listing all the vocalizations, the day and time of their production, and the prediction probabilities of the model.

Experimental design, materials and methods

The original recordings represent one month of continuous audio recording. Seven hours of this month were manually labelled. They were segmented and labelled according to whether or not there was a monkey vocalization (i.e., noise or vocalization) and, if there was a vocalization, according to the type of vocalization (6 possible classes: bark, copulation grunt, grunt, scream, yak, wahoo). These manually labelled data were used as a training set for a CNN, which was automatically trained following the pipeline of Bonafos et al. (2023). This model was then used to automatically detect and classify vocalization during the whole month of audio recording. It processes the data in the same way when predicting new data as it does when training. It uses a sliding window of one second with an overlap of 80%. It does not take into account information from previous predictions, but calculates the probability of a vocalization in each one-second window independently. It then iterates through the month. For each window, the model predicts two outputs: the probability that there is a vocalization and the probability of each class of vocalization.

For the purpose of generating the wav files, if a window has a probability of a vocalization greater than 0.5, it is considered to contain a vocalization. If it is the first one, a vocalization is started at that moment. If the time windows that follow a vocalization also contain a vocalization, then the signal they contain is added to the first segment for which a vocalization has been detected. As soon as a one-second segment no longer contains a signal corresponding to a vocalization, the wav file is closed. If windows are predicted to contain no vocalizations, but are between two windows that contain vocalizations within 1 second of each other, then all windows are merged.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages