Automatic Speech Recognition
641 papers with code • 69 benchmarks • 14 datasets
Libraries
Use these libraries to find Automatic Speech Recognition models and implementationsDatasets
Most implemented papers
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.
Conformer: Convolution-augmented Transformer for Speech Recognition
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices.
A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks
We consider the two related problems of detecting if an example is misclassified or out-of-distribution.
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM
The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.
Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models
To the best of our knowledge, we have achieved state-of-the-art end-to-end Transformer based model performance on Switchboard and AMI.
Neural NILM: Deep Neural Networks Applied to Energy Disaggregation
Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home's electricity demand.