Automatic Speech Recognition (ASR)

481 papers with code • 7 benchmarks • 23 datasets

Automatic Speech Recognition (ASR) involves converting spoken language into written text. It is designed to transcribe spoken words into text in real-time, allowing people to communicate with computers, mobile devices, and other technology using their voice. The goal of Automatic Speech Recognition is to accurately transcribe speech, taking into account variations in accent, pronunciation, and speaking style, as well as background noise and other factors that can affect speech quality.

Benchmarks

Add a Result

These leaderboards are used to track progress in Automatic Speech Recognition (ASR)

Dataset	Best Model	Compare
LRS2	CTC/Attention	See all
VoxPopuli	Conformer Transducer (German)	See all
HUI speech corpus	Conformer Transducer	See all
The Spoken Wikipedia Corpora	Conformer Transducer	See all
Voxforge German	Conformer Transducer	See all
M-AILabs speech dataset	Conformer Transducer	See all
LRS3-TED	CTC/Attention	See all

Libraries

Use these libraries to find Automatic Speech Recognition (ASR) models and implementations

espnet/espnet

13 papers

7,875

NVIDIA/NeMo

6 papers

10,062

PaddlePaddle/PaddleSpeech

4 papers

10,142

alibaba-damo-academy/FunASR

4 papers

3,299

See all 23 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

carlini/audio_adversarial_examples • • 5 Jan 2018

We construct targeted audio adversarial examples on automatic speech recognition.

Paper
Code

Deep Audio-Visual Speech Recognition

lordmartian/deep_avsr • • 6 Sep 2018

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Paper
Code

Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation

qiujiali/lattice_rnn • • 30 Oct 2018

The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word.

Paper
Code

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

MS-Mind/MS-Code-01 • • 9 Nov 2019

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.

Paper
Code

Multi-modal Dense Video Captioning

v-iashin/MDVC • • 17 Mar 2020

We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.

Paper
Code

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

kssteven418/squeezeformer • • 2 Jun 2022

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Paper
Code

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

wnhsu/FactorizedHierarchicalVAE • • NeurIPS 2017

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.

Paper
Code

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

kaldi-asr/kaldi • 12 May 2018

We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.

Paper
Code

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

NVIDIA/OpenSeq2Seq • • 25 May 2018

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

Paper
Code

Quaternion Recurrent Neural Networks

mravanelli/pytorch-kaldi • • ICLR 2019

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.

Paper
Code

Automatic Speech Recognition (ASR)

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result