Automatic Speech Recognition (ASR)

480 papers with code • 7 benchmarks • 23 datasets

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.


Use these libraries to find Automatic Speech Recognition (ASR) models and implementations
13 papers
6 papers
See all 23 libraries.

Most implemented papers

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

retrocirce/hts-audio-transformer 9 Apr 2018

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

mozilla/DeepSpeech 18 Apr 2019

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Conformer: Convolution-augmented Transformer for Speech Recognition

PaddlePaddle/PaddleSpeech 16 May 2020

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

snipsco/snips-nlu 25 May 2018

This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices.

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

hendrycks/error-detection 7 Oct 2016

We consider the two related problems of detecting if an example is misclassified or out-of-distribution.

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Alexander-H-Liu/End-to-end-ASR-Pytorch 8 Jun 2017

The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

TensorSpeech/TensorFlowASR 7 May 2020

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Neural NILM: Deep Neural Networks Applied to Energy Disaggregation

JackKelly/neuralnilm_prototype 23 Jul 2015

Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home's electricity demand.

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

yajiemiao/eesen 29 Jul 2015

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs).

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

sooftware/End-to-end-Speech-Recognition 5 Dec 2017

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.