The DiCOVA Challenge dataset is derived from the Coswara dataset, a crowd-sourced dataset of sound recordings from COVID-19 positive and non-COVID-19 individuals. The Coswara data is collected using a web-application2, launched in April-2020, accessible through the internet by anyone around the globe. The volunteering subjects are advised to record their respiratory sounds in a quiet environment.
Each subject provides 9 audio recordings, namely, (a) shallow and deep breathing (2 nos.), (b) shallow and heavy cough (2 nos.), (c) sustained phonation of vowels [æ] (as in bat), [i] (as in beet), and [u] (as in boot) (3 nos.), and (d) fast and normal pace 1 to 20 number counting (2 nos.).
The DiCOVA Challenge has two tracks. The participants also provided metadata corresponding to their current health status (includes COVID19 status, any other respiratory ailments, and symptoms), demographic information, age and gender. From this Coswara dataset, two datasets have been created:
(a) Track-1 dataset: composed of cough sound recordings. It t is composed of cough audio data from 1040 subjects. (b) Track-2 dataset: composed of deep breathing, vowel [i], and number counting (normal pace) speech recordings. It is composed of audio data from 1199 subjects.