Towards on-Device Keyword Spotting using Low-Footprint Quaternion Neural Models
On-device keyword spotting (KWS) is an essential component for wake-up and user interaction on smart edge devices. Existing low-footprint models are mainly based on 2D and 1D convolutions, where the former is better at capturing invariances while the latter enables faster inference times. In this work, we explore Quaternion neural models as an alternative for effective acoustic modeling for the KWS task. Quaternion models can embed various facets of input features within the multiple dimensions of the quaternion space. This leads to smaller & efficient models as compared to their conventional counterparts. We demonstrate this using quaternion versions of the popular KWS models on the Google Command V2 dataset, where our models achieve comparable performance to existing ones. In addition, we also provide an extensive analysis of the learning behavior in the quaternion network to motivate their use in other speech/audio tasks.
PDFTasks
Datasets
Results from the Paper
Ranked #1 on Keyword Spotting on Google Speech Commands (Google Speech Commands V2 35 metric)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Keyword Spotting | Google Speech Commands | QNN | Google Speech Commands V2 35 | 98.60 | # 1 | |
Google Speech Commands | 98.53 | # 1 | ||||
Keyword Spotting | Google Speech Commands (v2) | Quaternion Neural Networks | Accuracy(10-fold) | 98.53 | # 1 | |
Keyword Spotting | Google Speech Commands V2 35 | QuaternionNeuralNetwork | Accuracy (10-fold) | 98.53 | # 1 |