Speech Recognition

478 papers with code • 104 benchmarks • 56 datasets

Speech recognition is the task of recognising speech within audio and converting it into text.

( Image credit: SpecAugment )

Greatest papers with code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

tensorflow/models 8 Dec 2015

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

Accented Speech Recognition End-To-End Speech Recognition +1

Unsupervised Cross-lingual Representation Learning for Speech Recognition

huggingface/transformers 24 Jun 2020

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +1

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

huggingface/transformers NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

Quantization Self-Supervised Learning +1

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

mozilla/DeepSpeech 18 Apr 2019

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

Data Augmentation End-To-End Speech Recognition +2

Unsupervised Speech Recognition

pytorch/fairseq 24 May 2021

Despite rapid progress in the recent past, current speech recognition systems still require labeled training data which limits this technology to a small fraction of the languages spoken around the globe.

Speech Recognition Unsupervised Speech Recognition

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

pytorch/fairseq ICLR 2020

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task.

Ranked #2 on Speech Recognition on TIMIT (using extra training data)

General Classification Self-Supervised Learning +1

wav2vec: Unsupervised Pre-training for Speech Recognition

pytorch/fairseq 11 Apr 2019

Our experiments on WSJ reduce WER of a strong character-based log-mel filterbank baseline by up to 36% when only a few hours of transcribed data is available.

Ranked #5 on Speech Recognition on TIMIT (using extra training data)

General Classification Speech Recognition +1

Self-training and Pre-training are Complementary for Speech Recognition

pytorch/fairseq 22 Oct 2020

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data.

 Ranked #1 on Speech Recognition on LibriSpeech train-clean-100 test-clean (using extra training data)

Speech Recognition Unsupervised Pre-training