SiDi KWS: A Large-Scale Multilingual Dataset for Keyword Spotting

Interspeech 2022 · Michel Cardoso Meneses, Rafael Bérgamo Holanda, Luis Vasconcelos Peres, Gabriela Dantas Rocha ·

Keyword spotting (KWS) has become a hot topic in speech processing due to the rise of commercial applications based on voice command detection, such as voice assistants. Like tasks in computer vision, natural language processing, and even speech processing, most current successful approaches for KWS rely on deep learning. However, differently from all those tasks, there is a lack of large-scale datasets designed for training and evaluating deep learning models for KWS. The current work presents SiDi KWS, a public large-scale multilingual dataset currently composed of 24.3 million audio recordings of labeled single-spoken keywords. It intends to boost the development of new KWS systems, especially those based on deep learning. That dataset has been created by applying automatic forced alignment on public datasets of transcribed speech. This work introduces SiDi KWS and KeywordMiner, an open-source framework used to generate that dataset, to benefit the speech processing research community.

PDF Abstract