Automatic Speech Recognition

500 papers with code • 158 benchmarks • 11 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Automatic Speech Recognition

Dataset	Best Model	Compare
FLEURS	SeamlessM4T Medium	See all
LibriSpeech test-clean	MonoBERT	See all
LibriSpeech test-other	MonoBERT	See all
FLEURS-54	SeamlessM4T Medium	See all

Show all 13 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Automatic Speech Recognition models and implementations

espnet/espnet

13 papers

7,867

NVIDIA/NeMo

8 papers

10,045

k2-fsa/icefall

5 papers

771

facebookresearch/fairseq

3 papers

29,224

See all 17 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

carlini/audio_adversarial_examples • • 5 Jan 2018

We construct targeted audio adversarial examples on automatic speech recognition.

Paper
Code

Bi-Directional Lattice Recurrent Neural Networks for Confidence Estimation

qiujiali/lattice_rnn • • 30 Oct 2018

The standard approach to mitigate errors made by an automatic speech recognition system is to use confidence scores associated with each predicted word.

Paper
Code

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

MS-Mind/MS-Code-01 • • 9 Nov 2019

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.

Paper
Code

Multi-modal Dense Video Captioning

v-iashin/MDVC • • 17 Mar 2020

We apply automatic speech recognition (ASR) system to obtain a temporally aligned textual description of the speech (similar to subtitles) and treat it as a separate input alongside video frames and the corresponding audio track.

Paper
Code

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

kssteven418/squeezeformer • • 2 Jun 2022

After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.

Paper
Code

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

wnhsu/FactorizedHierarchicalVAE • • NeurIPS 2017

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.

Paper
Code

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

kaldi-asr/kaldi • 12 May 2018

We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.

Paper
Code

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

NVIDIA/OpenSeq2Seq • • 25 May 2018

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

Paper
Code

Quaternion Recurrent Neural Networks

mravanelli/pytorch-kaldi • • ICLR 2019

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term dependencies between the basic elements of a sequence.

Paper
Code

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

georgesterpu/Sigmedia-AVSR • • 5 Sep 2018

Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise.

Paper
Code

Automatic Speech Recognition

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result