no code implementations • 4 Jul 2022 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
We collect a corpus of utterances containing contrastive focus and we evaluate the accuracy of a BERT model, finetuned to predict quantized acoustic prominence features, on these samples.
no code implementations • 19 Feb 2021 • Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier
The prosody of a spoken word is determined by its surrounding context.
no code implementations • 4 Sep 2020 • Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber
In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i. e. when generating speech output for token n, the system has access to n + k tokens from the text sequence.