58 papers with code • 7 benchmarks • 7 datasets
In speech processing, keyword spotting deals with the identification of keywords in utterances.
( Image credit: Simon Grest )
The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition.
We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark.
In this paper we explore a TM based keyword spotting (KWS) pipeline to demonstrate low complexity with faster rate of convergence compared to NNs.
Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level.
We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow.
We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states.
We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection.