Browse > Speech > Keyword Spotting

Keyword Spotting

12 papers with code · Speech

In speech processing, keyword spotting deals with the identification of keywords in utterances.

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Hello Edge: Keyword Spotting on Microcontrollers

20 Nov 2017ARM-software/ML-KWS-for-MCU

We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements. We show that it is possible to optimize these neural network architectures to fit within the memory and compute constraints of microcontrollers without sacrificing accuracy.


Deep Residual Learning for Small-Footprint Keyword Spotting

28 Oct 2017castorini/honk

We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google's previous convolutional neural networks in terms of accuracy.


Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting

18 Oct 2017castorini/honk

We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow. These models are useful for recognizing "command triggers" in speech-based interfaces (e.g., "Hey Siri"), which serve as explicit cues for audio recordings of utterances that are sent to the cloud for full speech recognition.


READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

9 May 2017dhlab-epfl/dhSegment

The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level.


Efficient keyword spotting using dilated convolutions and gating

19 Nov 2018snipsco/tract

We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations.


Stochastic Adaptive Neural Architecture Search for Keyword Spotting

16 Nov 2018TomVeniat/SANAS

The problem of keyword spotting i.e. identifying keywords in a real-time audio stream is mainly solved by applying a neural network over successive sliding windows. Due to the difficulty of the task, baseline models are usually large, resulting in a high computational cost and energy consumption level.


Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

9 Apr 2018TomVeniat/SANAS

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Discusses why this task is an interesting challenge, and why it requires a specialized dataset that is different from conventional datasets used for automatic speech recognition of full sentences.


What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision

5 Mar 2015malmaud/whats_cookin

We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe.


JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis

30 Oct 2018castorini/honkling

Ubiquitous as well are web applications, which have grown in popularity and complexity over the last decade with significant improvements in usability under cross-platform conditions. Overall, our robust, cross-device implementation for keyword spotting realizes a new paradigm for serving neural network applications, and one of our slim models reduces latency by 66% with a minimal decrease in accuracy of 4% from 94% to 90%.