1 code implementation • 3 Apr 2018 • Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis.
1 code implementation • 8 Apr 2019 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech.
no code implementations • 25 Apr 2018 • Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i. e., generating speech waveforms from acoustic features.
no code implementations • 18 Aug 2016 • Srikanth Ronanki, Siva Reddy, Bajibabu Bollepalli, Simon King
These methods first convert the ASCII text to a phonetic script, and then learn a Deep Neural Network to synthesize speech from that.
no code implementations • 29 Oct 2018 • Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
Moreover, we experiment with a WaveNet vocoder in synthesis of Lombard speech.
no code implementations • 30 Oct 2018 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
The state-of-the-art in text-to-speech synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet.
no code implementations • LREC 2014 • Maria Koutsombogera, Samer Al Moubayed, Bajibabu Bollepalli, Ahmed Hussen Abdelaziz, Martin Johansson, Jos{\'e} David Aguas Lopes, Jekaterina Novikova, Catharine Oertel, Kalin Stefanov, G{\"u}l Varol
The corpus is targeted and designed towards the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions.
no code implementations • 14 Mar 2019 • Bajibabu Bollepalli, Lauri Juvela, Paavo Alku
The results show that the newly proposed GANs achieve synthesis quality comparable to that of widely-used DNNs, without using an additive noise component.
no code implementations • 29 Jun 2021 • Ammar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangens, Sri Karlapati, Thomas Drugman
We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody.
no code implementations • 5 Jan 2022 • Dhananjaya Gowda, Bajibabu Bollepalli, Sudarsana Reddy Kadiri, Paavo Alku
Formant tracking is investigated in this study by using trackers based on dynamic programming (DP) and deep neural nets (DNNs).
no code implementations • 13 Feb 2022 • Mateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova
This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data.