Towards on-Device Keyword Spotting using Low-Footprint Quaternion Neural Models

On-device keyword spotting (KWS) is an essential component for wake-up and user interaction on smart edge devices. Existing low-footprint models are mainly based on 2D and 1D convolutions, where the former is better at capturing invariances while the latter enables faster inference times. In this work, we explore Quaternion neural models as an alternative for effective acoustic modeling for the KWS task. Quaternion models can embed various facets of input features within the multiple dimensions of the quaternion space. This leads to smaller & efficient models as compared to their conventional counterparts. We demonstrate this using quaternion versions of the popular KWS models on the Google Command V2 dataset, where our models achieve comparable performance to existing ones. In addition, we also provide an extensive analysis of the learning behavior in the quaternion network to motivate their use in other speech/audio tasks.



Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Keyword Spotting Google Speech Commands QNN Google Speech Commands V2 35 98.60 # 1
Google Speech Commands 98.53 # 1
Keyword Spotting Google Speech Commands (v2) Quaternion Neural Networks Accuracy(10-fold) 98.53 # 1
Keyword Spotting Google Speech Commands V2 35 QuaternionNeuralNetwork Accuracy (10-fold) 98.53 # 1


No methods listed for this paper. Add relevant methods here