About

Speech recognition is the task of recognising speech within audio and converting it into text.

( Image credit: SpecAugment )

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Subtasks

Datasets

Greatest papers with code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 Dec 2015tensorflow/models

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION NOISY SPEECH RECOGNITION

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

11 Oct 2020huggingface/transformers

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.

END-TO-END SPEECH RECOGNITION MACHINE TRANSLATION MULTI-TASK LEARNING SPEECH RECOGNITION SPEECH-TO-TEXT TRANSLATION

Unsupervised Cross-lingual Representation Learning for Speech Recognition

24 Jun 2020huggingface/transformers

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

QUANTIZATION REPRESENTATION LEARNING SPEECH RECOGNITION

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

NeurIPS 2020 huggingface/transformers

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

 Ranked #1 on Speech Recognition on TIMIT (using extra training data)

QUANTIZATION SELF-SUPERVISED LEARNING SPEECH RECOGNITION

Deep Speech: Scaling up end-to-end speech recognition

17 Dec 2014mozilla/STT

We present a state-of-the-art speech recognition system developed using end-to-end deep learning.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 Apr 2019mozilla/DeepSpeech

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

DATA AUGMENTATION END-TO-END SPEECH RECOGNITION LANGUAGE MODELLING SPEECH RECOGNITION

Self-training and Pre-training are Complementary for Speech Recognition

22 Oct 2020pytorch/fairseq

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.

 Ranked #1 on Speech Recognition on LibriSpeech train-clean-100 test-clean (using extra training data)

SPEECH RECOGNITION UNSUPERVISED PRE-TRAINING

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

ICLR 2020 pytorch/fairseq

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.

Ranked #2 on Speech Recognition on TIMIT (using extra training data)

SELF-SUPERVISED LEARNING SPEECH RECOGNITION

wav2vec: Unsupervised Pre-training for Speech Recognition

11 Apr 2019pytorch/fairseq

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

Ranked #5 on Speech Recognition on TIMIT (using extra training data)

SPEECH RECOGNITION UNSUPERVISED PRE-TRAINING

A Parallelizable Lattice Rescoring Strategy with Neural Language Models

8 Mar 2021kaldi-asr/kaldi

This paper proposes a parallel computation strategy and a posterior-based lattice expansion algorithm for efficient lattice rescoring with neural language models (LMs) for automatic speech recognition.

SPEECH RECOGNITION