LibriSpeech

Introduced by Vassil Panayotov et al. in Librispeech: An ASR corpus based on public domain audio books

The LibriSpeech corpus is a collection of approximately 1,000 hours of audiobooks that are a part of the LibriVox project. Most of the audiobooks come from the Project Gutenberg. The training data is split into 3 partitions of 100hr, 360hr, and 500hr sets while the dev and test data are split into the ’clean’ and ’other’ categories, respectively, depending upon how well or challenging Automatic Speech Recognition systems would perform against. Each of the dev and test sets is around 5hr in audio length. This corpus also provides the n-gram language models and the corresponding texts excerpted from the Project Gutenberg books, which contain 803M tokens and 977K unique words.

Source: State-of-the-art Speech Recognition using Multi-stream Self-attention with Dilated 1D Convolutions

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Speech Recognition	LibriSpeech test-clean	Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light
Speech Recognition	LibriSpeech test-other	parakeet-rnnt-1.1b
Automatic Speech Recognition	Librispeech (clean)	hubert-large-ls960-ft
Automatic Speech Recognition	Librispeech (other)	wav2vec2-large-960h-lv60
Resynthesis	LibriSpeech	CPC
Automatic Speech Recognition	LibriSpeech test-clean	MonoBERT
Automatic Speech Recognition	LibriSpeech test-other	MonoBERT
Voice Conversion	LibriSpeech test-clean	kNN-VC