We show that, using these features, the CNN-DTW keyword spotter performs almost as well as the DTW keyword spotter while outperforming a baseline CNN trained only on the keyword templates.
The automatic transcriptions from the best performing pass were used for language model augmentation.
We compare features for dynamic time warping (DTW) when used to bootstrap keyword spotting (KWS) in an almost zero-resource setting.
We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting.
We present our first efforts in building an automatic speech recognition system for Somali, an under-resourced language, using 1. 57 hrs of annotated speech for acoustic model training.
While the resulting CNN keyword spotter cannot match the performance of the DTW-based system, it substantially outperforms a CNN classifier trained only on the keywords, improving the area under the ROC curve from 0. 54 to 0. 64.