10 dataset results for Speech Synthesis AND Speech

LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research. It is derived from the original materials (mp3 audio files from LibriVox and text files from Project Gutenberg) of the LibriSpeech corpus. The main differences from the LibriSpeech corpus are listed below:

187 PAPERS • 1 BENCHMARK

THCHS-30

THCHS-30 is a free Chinese speech database THCHS-30 that can be used to build a full-fledged Chinese speech recognition system.

30 PAPERS • NO BENCHMARKS YET

PromptSpeech

PromptSpeech is a dataset that consists of speech and the corresponding prompts. We synthesize speech with 5 different style factors (gender, pitch, speaking speed, volume, and emotion) from a commercial TTS API. The emotion factor has 5 categories and the gender factor has 2 categories.

7 PAPERS • NO BENCHMARKS YET

SOMOS

SOMOS (The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis)

The SOMOS dataset is a large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of modern synthesizers, and can stimulate advancements in acoustic model evaluation. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset which is a common benchmark for building neural acoustic models and vocoders. Utterances are generated from 200 TTS systems including vanilla neural acoustic models as well as models which allow prosodic variations.

6 PAPERS • NO BENCHMARKS YET

TaL Corpus (The Tongue and Lips Corpus)

The Tongue and Lips (TaL) corpus is a multi-speaker corpus of ultrasound images of the tongue and video images of lips. This corpus contains synchronised imaging data of extraoral (lips) and intraoral (tongue) articulators from 82 native speakers of English.

3 PAPERS • NO BENCHMARKS YET

VocBench

VocBench is a framework that benchmark the performance of state-of-the art neural vocoders. VocBench uses a systematic study to evaluate different neural vocoders in a shared environment that enables a fair comparison between them.

2 PAPERS • NO BENCHMARKS YET

JSS Dataset

JSS Dataset (Jejueo Single Speaker Speech)

The Jejueo Single Speaker Speech (JSS) dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file.

1 PAPER • NO BENCHMARKS YET

JVS-MuSiC

JVS-MuSiC is a Japanese multispeaker singing-voice corpus called "JVS-MuSiC" with the aim to analyze and synthesize a variety of voices. The corpus consists of 100 singers' recordings of the same song, Katatsumuri, which is a Japanese children's song. It also includes another song that is different for each singer.

1 PAPER • NO BENCHMARKS YET

RUSLAN

RUSLAN is a Russian spoken language corpus for text-to-speech task. RUSLAN contains 22,200 audio samples with text annotations – more than 31 hours of high-quality speech of one person – being one of the largest annotated Russian corpus in terms of speech duration for a single speaker.

1 PAPER • NO BENCHMARKS YET

Silent Speech EMG

Facial electromyography recordings during both silent and vocalized speech.

1 PAPER • NO BENCHMARKS YET

Datasets

10 dataset results for Speech Synthesis AND Speech