Automatic Speech Recognition

500 papers with code • 158 benchmarks • 11 datasets

This task has no description! Would you like to contribute one?


Use these libraries to find Automatic Speech Recognition models and implementations
13 papers
8 papers
5 papers
See all 17 libraries.

Most implemented papers

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

retrocirce/hts-audio-transformer 9 Apr 2018

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

mozilla/DeepSpeech 18 Apr 2019

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Conformer: Convolution-augmented Transformer for Speech Recognition

PaddlePaddle/PaddleSpeech 16 May 2020

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

snipsco/snips-nlu 25 May 2018

This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices.

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

hendrycks/error-detection 7 Oct 2016

We consider the two related problems of detecting if an example is misclassified or out-of-distribution.

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Alexander-H-Liu/End-to-end-ASR-Pytorch 8 Jun 2017

The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

TensorSpeech/TensorFlowASR 7 May 2020

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Neural NILM: Deep Neural Networks Applied to Energy Disaggregation

JackKelly/neuralnilm_prototype 23 Jul 2015

Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home's electricity demand.

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

yajiemiao/eesen 29 Jul 2015

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs).

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

sooftware/End-to-end-Speech-Recognition 5 Dec 2017

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.