Automatic Speech Recognition (ASR)

481 papers with code • 7 benchmarks • 23 datasets

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Benchmarks

Add a Result

These leaderboards are used to track progress in Automatic Speech Recognition (ASR)

Dataset	Best Model	Compare
LRS2	CTC/Attention	See all
VoxPopuli	Conformer Transducer (German)	See all
HUI speech corpus	Conformer Transducer	See all
The Spoken Wikipedia Corpora	Conformer Transducer	See all
Voxforge German	Conformer Transducer	See all
M-AILabs speech dataset	Conformer Transducer	See all
LRS3-TED	CTC/Attention	See all

Libraries

Use these libraries to find Automatic Speech Recognition (ASR) models and implementations

espnet/espnet

13 papers

7,875

NVIDIA/NeMo

6 papers

10,062

PaddlePaddle/PaddleSpeech

4 papers

10,142

alibaba-damo-academy/FunASR

4 papers

3,299

See all 23 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition

retrocirce/hts-audio-transformer • • 9 Apr 2018

Describes an audio dataset of spoken words designed to help train and evaluate keyword spotting systems.

Paper
Code

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

mozilla/DeepSpeech • • 18 Apr 2019

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Paper
Code

Conformer: Convolution-augmented Transformer for Speech Recognition

PaddlePaddle/PaddleSpeech • • 16 May 2020

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).

Paper
Code

Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces

snipsco/snips-nlu • 25 May 2018

This paper presents the machine learning architecture of the Snips Voice Platform, a software solution to perform Spoken Language Understanding on microprocessors typical of IoT devices.

Paper
Code

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

hendrycks/error-detection • • 7 Oct 2016

We consider the two related problems of detecting if an example is misclassified or out-of-distribution.

Paper
Code

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

Alexander-H-Liu/End-to-end-ASR-Pytorch • • 8 Jun 2017

The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.

Paper
Code

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

TensorSpeech/TensorFlowASR • • 7 May 2020

We demonstrate that on the widely used LibriSpeech benchmark, ContextNet achieves a word error rate (WER) of 2. 1%/4. 6% without external language model (LM), 1. 9%/4. 1% with LM and 2. 9%/7. 0% with only 10M parameters on the clean/noisy LibriSpeech test sets.

Paper
Code

Neural NILM: Deep Neural Networks Applied to Energy Disaggregation

JackKelly/neuralnilm_prototype • • 23 Jul 2015

Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home's electricity demand.

Paper
Code

EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

yajiemiao/eesen • 29 Jul 2015

The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs).

Paper
Code

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

sooftware/End-to-end-Speech-Recognition • • 5 Dec 2017

Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network.

Paper
Code

Automatic Speech Recognition (ASR)

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result