We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.
Ranked #1 on Noisy Speech Recognition on CHiME clean
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.
Ranked #2 on Speech-to-Text Translation on MuST-C EN->DE
On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.
Ranked #1 on Speech Recognition on Hub5'00 SwitchBoard
We present a state-of-the-art speech recognition system developed using end-to-end deep learning.
We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.
Ranked #8 on Speech Recognition on LibriSpeech test-clean (using extra training data)
In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.
Ranked #1 on Speech Recognition on SPGISpeech
To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech.
Ranked #1 on Speech Recognition on CHiME-6 dev_gss12
In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.
Ranked #3 on Speech Recognition on Hub5'00 SwitchBoard
The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder.