Manual Speech Synthesis Data Acquisition - From Script Design to Recording Speech

Atli {\TH}{\'o}r Sigurgeirsson, atlithors@ru.is, Reykjavik University Gunnar Thor {\"O}rn{\'o}lfsson, gunnarthor@hi.is, {\'A}rni Magn{\'u}sson institute of Icelandic studies Dr. J{\'o}n Gu{\dh}nason, jg@ru.is In this paper we present the work of collecting a large amount of high quality speech synthesis data for Icelandic. 8 speakers will be recorded for 20 hours each. A script design strategy is proposed and three scripts have been generated to maximize diphone coverage, varying in length. The largest reading script contains 14,400 prompts and includes 87.3{\%} of all Icelandic diphones at least once and 81{\%} of all Icelandic diphones at least twenty times. A recording client was developed to facilitate recording sessions. The client supports easily importing scripts and maintaining multiple collections in parallel. The recorded data can be downloaded straight from the client. Recording sessions are carried out in a professional studio under supervision and started October of 2019. As of writing, 58.7 hours of high quality speech data has been collected. The scripts, the recording software and the speech data will later be released under a CC-BY 4.0 license.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here