Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech

LREC 2020 · Atli Sigurgeirsson, Gunnar {\"O}rn{\'o}lfsson, J{\'o}n Gu{\dh}nason ·

Atli {\TH}{\'o}r Sigurgeirsson, atlithors@ru.is, Reykjavik University Gunnar Thor {\"O}rn{\'o}lfsson, gunnarthor@hi.is, {\'A}rni Magn{\'u}sson institute of Icelandic studies Dr. J{\'o}n Gu{\dh}nason, jg@ru.is In this paper we present the work of collecting a large amount of high quality speech synthesis data for Icelandic. 8 speakers will be recorded for 20 hours each. A script design strategy is proposed and three scripts have been generated to maximize diphone coverage, varying in length. The largest reading script contains 14,400 prompts and includes 87.3{\%} of all Icelandic diphones at least once and 81{\%} of all Icelandic diphones at least twenty times. A recording client was developed to facilitate recording sessions. The client supports easily importing scripts and maintaining multiple collections in parallel. The recorded data can be downloaded straight from the client. Recording sessions are carried out in a professional studio under supervision and started October of 2019. As of writing, 58.7 hours of high quality speech data has been collected. The scripts, the recording software and the speech data will later be released under a CC-BY 4.0 license.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Speech Synthesis

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove