About

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Datasets

Greatest papers with code

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 Dec 2015tensorflow/models

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION NOISY SPEECH RECOGNITION

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

11 Oct 2020huggingface/transformers

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.

END-TO-END SPEECH RECOGNITION MACHINE TRANSLATION MULTI-TASK LEARNING SPEECH RECOGNITION SPEECH-TO-TEXT TRANSLATION

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 Apr 2019mozilla/DeepSpeech

On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.

DATA AUGMENTATION END-TO-END SPEECH RECOGNITION LANGUAGE MODELLING SPEECH RECOGNITION

Deep Speech: Scaling up end-to-end speech recognition

17 Dec 2014mozilla/DeepSpeech

We present a state-of-the-art speech recognition system developed using end-to-end deep learning.

ACCENTED SPEECH RECOGNITION END-TO-END SPEECH RECOGNITION

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

12 May 2018kaldi-asr/kaldi

We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.

END-TO-END SPEECH RECOGNITION SPEECH RECOGNITION

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

19 Nov 2019facebookresearch/wav2letter

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.

Ranked #8 on Speech Recognition on LibriSpeech test-clean (using extra training data)

END-TO-END SPEECH RECOGNITION LANGUAGE MODELLING SPEECH RECOGNITION

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

5 Apr 2021espnet/espnet

In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.

END-TO-END SPEECH RECOGNITION SPEECH RECOGNITION

Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

22 Apr 2020espnet/espnet

To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech.

DATA AUGMENTATION END-TO-END SPEECH RECOGNITION SPEECH ENHANCEMENT SPEECH RECOGNITION

Jasper: An End-to-End Convolutional Neural Acoustic Model

5 Apr 2019osmr/imgclsmob

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.

END-TO-END SPEECH RECOGNITION LANGUAGE MODELLING SPEECH RECOGNITION