A Lightweight dynamic filter for keyword spotting

Keyword Spotting (KWS) from speech signals is widely applied to perform fully hands-free speech recognition. The KWS network is designed as a small-footprint model so it can continuously be active. Recent efforts have explored dynamic filter-based models in deep learning frameworks to enhance the system's robustness or accuracy. However, as a dynamic filter framework requires high computational costs, the implementation is limited to the computational condition of the device. In this paper, we propose a lightweight dynamic filter to improve the performance of KWS. Our proposed model divides the dynamic filter into two branches to reduce computational complexity: pixel level and instance level. The proposed lightweight dynamic filter is applied to the front end of KWS to enhance the separability of the input data. The experimental results show that our model is robustly working on unseen noise and small training data environments by using a small computational resource.

