Auto-KWS is a dataset for customized keyword spotting, the task of detecting spoken keywords. The dataset closely resembles real world scenarios, as each recorder is assigned with an unique wake-up word and can choose their recording environment and familiar dialect freely.
All data is recorded by near-field mobile phones, (located in front of the speakers at around 0.2m distance). Each sample is recorded in single channel, 16-bit streams at a 16kHz sampling rate. There are 4 datasets: training dataset, practice dataset, feedback dataset, and private dataset. Training dataset, recorded from around 100 recorders, is used for participants to develop Auto-KWS solutions. Practice dataset contains 5 speakers data, each with 5 enrollment audio data and seveal test audio. Practice dataset together with the downloadable docker provides an example of how platform would call the participants' code. Both Training and practice dataset can be downloaded for local debugging. The feedback dataset and private dataset have the same format of practice dataset and are used for final evaluation and thus will be hiden from participants.