SiDi KWS: A Large-Scale Multilingual Dataset for Keyword Spotting

Keyword spotting (KWS) has become a hot topic in speech processing due to the rise of commercial applications based on voice command detection, such as voice assistants. Like tasks in computer vision, natural language processing, and even speech processing, most current successful approaches for KWS rely on deep learning. However, differently from all those tasks, there is a lack of large-scale datasets designed for training and evaluating deep learning models for KWS. The current work presents SiDi KWS, a public large-scale multilingual dataset currently composed of 24.3 million audio recordings of labeled single-spoken keywords. It intends to boost the development of new KWS systems, especially those based on deep learning. That dataset has been created by applying automatic forced alignment on public datasets of transcribed speech. This work introduces SiDi KWS and KeywordMiner, an open-source framework used to generate that dataset, to benefit the speech processing research community.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here