7 dataset results for Automatic Speech Recognition (ASR) AND English

ESB is a benchmark for evaluating the performance of a single automatic speech recognition (ASR) system across a broad set of speech datasets. It comprises eight English speech recognition datasets, capturing a broad range of domains, acoustic conditions, speaker styles, and transcription requirements.

2 PAPERS • NO BENCHMARKS YET

ArzEn

ArzEn (Corpus of Egyptian Arabic-English Code-switching)

Corpus of Egyptian Arabic-English Code-switching (ArzEn) is a spontaneous conversational speech corpus, obtained through informal interviews held at the German University in Cairo. The participants discussed broad topics, including education, hobbies, work, and life experiences. The corpus currently contains 12 hours of speech, having 6,216 utterances. The recordings were transcribed and translated into monolingual Egyptian Arabic and monolingual English.

1 PAPER • NO BENCHMARKS YET

EdAcc

EdAcc (Edinburgh International Accents of English Corpus)

The Edinburgh International Accents of English Corpus (EdAcc) is a new automatic speech recognition (ASR) dataset composed of 40 hours of English dyadic conversations between speakers with a diverse set of accents. EdAcc includes a wide range of first and second-language varieties of English and a linguistic background profile of each speaker.

1 PAPER • NO BENCHMARKS YET

Jam-ALT

Jam-ALT (JamALT: A Formatting-Aware Lyrics Transcription Benchmark)

JamALT is a revision of the JamendoLyrics dataset (80 songs in 4 languages), adapted for use as an automatic lyrics transcription (ALT) benchmark.

1 PAPER • 5 BENCHMARKS

M-AILabs speech dataset

The M-AILABS Speech Dataset is the first large dataset that we are providing free-of-charge, freely usable as training data for speech recognition and speech synthesis. Most of the data is based on LibriVox and Project Gutenberg. The training data consist of nearly thousand hours of audio and the text-files in prepared format. A transcription is provided for each clip. Clips vary in length from 1 to 20 seconds and have a total length of approximately shown in the list (and in the respective info.txt-files) below. The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded by the LibriVox project and is also in the public domain

1 PAPER • 1 BENCHMARK

The Spoken Wikipedia Corpora

The SWC is a corpus of aligned Spoken Wikipedia articles from the English, German, and Dutch Wikipedia. This corpus has several outstanding characteristics:

1 PAPER • 1 BENCHMARK

ASR-ETeleCSC: An English Telephone Conversational Speech Corpus

This open-source dataset consists of 5.04 hours of transcribed English conversational speech beyond telephony, where 13 conversations were contained.

0 PAPER • NO BENCHMARKS YET

Datasets

7 dataset results for Automatic Speech Recognition (ASR) AND English