Search Results for author: Alexis Moinet

Found 19 papers, 1 papers with code

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

no code implementations12 Feb 2024 Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

Echoing the widely-reported "emergent abilities" of large language models when trained on increasing volume of data, we show that BASE TTS variants built with 10K+ hours and 500M+ parameters begin to demonstrate natural prosody on textually complex sentences.

Decoder Disentanglement +1

A Comparative Analysis of Pretrained Language Models for Text-to-Speech

no code implementations4 Sep 2023 Marcel Granero-Moya, Penny Karanasou, Sri Karlapati, Bastian Schnell, Nicole Peinelt, Alexis Moinet, Thomas Drugman

In this study, we aim to address this gap by conducting a comparative analysis of different PLMs for two TTS tasks: prosody prediction and pause prediction.

Natural Language Understanding Prosody Prediction

eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer

no code implementations20 Jun 2023 Ammar Abbas, Sri Karlapati, Bastian Schnell, Penny Karanasou, Marcel Granero Moya, Amith Nagaraj, Ayman Boustati, Nicole Peinelt, Alexis Moinet, Thomas Drugman

We show that eCat statistically significantly reduces the gap in naturalness between CopyCat2 and human recordings by an average of 46. 7% across 2 languages, 3 locales, and 7 speakers, along with better target-speaker similarity in FPT.

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

no code implementations29 Jun 2022 Peter Makarov, Ammar Abbas, Mateusz Łajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou

In this paper, we examine simple extensions to a Transformer-based FastSpeech-like system, with the goal of improving prosody for multi-sentence TTS.

Language Modelling Sentence

Expressive, Variable, and Controllable Duration Modelling in TTS

no code implementations28 Jun 2022 Ammar Abbas, Thomas Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman

First, we propose a duration model conditioned on phrasing that improves the predicted durations and provides better modelling of pauses.

Normalising Flows Speech Synthesis

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

no code implementations27 Jun 2022 Sri Karlapati, Penny Karanasou, Mateusz Lajszczak, Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level between any pair of seen speakers.

Distribution augmentation for low-resource expressive text-to-speech

no code implementations13 Feb 2022 Mateusz Lajszczak, Animesh Prasad, Arent van Korlaar, Bajibabu Bollepalli, Antonio Bonafonte, Arnaud Joly, Marco Nicolis, Alexis Moinet, Thomas Drugman, Trevor Wood, Elena Sokolova

This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data.

Data Augmentation Diversity

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

no code implementations29 Jun 2021 Ammar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangens, Sri Karlapati, Thomas Drugman

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody.

Sentence

Parallel WaveNet conditioned on VAE latent vectors

no code implementations17 Dec 2020 Jonas Rohnke, Tom Merritt, Jaime Lorenzo-Trueba, Adam Gabrys, Vatsal Aggarwal, Alexis Moinet, Roberto Barra-Chicote

In this paper we investigate the use of a sentence-level conditioning vector to improve the signal quality of a Parallel WaveNet neural vocoder.

Sentence Speech Synthesis +1

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

no code implementations4 Nov 2020 Sri Karlapati, Ammar Abbas, Zack Hodari, Alexis Moinet, Arnaud Joly, Penny Karanasou, Thomas Drugman

In Stage II, we propose a novel method to sample from this learnt prosodic distribution using the contextual information available in text.

Graph Attention Representation Learning +2

Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis

no code implementations30 Dec 2019 Thomas Drugman, Alexis Moinet, Thierry Dutoit, Geoffrey Wilfart

The source signal is obtained by concatenating excitation frames picked up from the codebook, based on a selection criterion and taking target residual coefficients as input.

Speech Synthesis

Singing Synthesis: with a little help from my attention

no code implementations12 Dec 2019 Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman

We present UTACO, a singing synthesis model based on an attention-based sequence-to-sequence mechanism and a vocoder based on dilated causal convolutions.

Voice Conversion for Whispered Speech Synthesis

no code implementations11 Dec 2019 Marius Cotescu, Thomas Drugman, Goeric Huybrechts, Jaime Lorenzo-Trueba, Alexis Moinet

We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech.

Speech Synthesis Voice Conversion

Towards achieving robust universal neural vocoding

1 code implementation4 Jul 2019 Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality.

Traditional Machine Learning for Pitch Detection

no code implementations4 Mar 2019 Thomas Drugman, Goeric Huybrechts, Viacheslav Klimkov, Alexis Moinet

In this paper, we consider voicing detection as a classification problem and F0 contour estimation as a regression problem.

BIG-bench Machine Learning Clustering +1

Proceedings of eNTERFACE 2015 Workshop on Intelligent Interfaces

no code implementations19 Jan 2018 Matei Mancas, Christian Frisson, Joëlle Tilmanne, Nicolas D'Alessandro, Petr Barborka, Furkan Bayansar, Francisco Bernard, Rebecca Fiebrink, Alexis Heloir, Edgar Hemery, Sohaib Laraba, Alexis Moinet, Fabrizio Nunnari, Thierry Ravet, Loïc Reboursière, Alvaro Sarasua, Mickaël Tits, Noé Tits, François Zajéga, Paolo Alborno, Ksenia Kolykhalova, Emma Frid, Damiano Malafronte, Lisanne Huis in't Veld, Hüseyin Cakmak, Kevin El Haddad, Nicolas Riche, Julien Leroy, Pierre Marighetto, Bekir Berker Türker, Hossein Khaki, Roberto Pulisci, Emer Gilmartin, Fasih Haider, Kübra Cengiz, Martin Sulir, Ilaria Torre, Shabbir Marzban, Ramazan Yazıcı, Furkan Burak Bâgcı, Vedat Gazi Kılı, Hilal Sezer, Sena Büsra Yenge, Charles-Alexandre Delestage, Sylvie Leleu-Merviel, Muriel Meyer-Chemenska, Daniel Schmitt, Willy Yvart, Stéphane Dupont, Ozan Can Altiok, Aysegül Bumin, Ceren Dikmen, Ivan Giangreco, Silvan Heller, Emre Külah, Gueorgui Pironkov, Luca Rossetto, Yusuf Sahillioglu, Heiko Schuldt, Omar Seddati, Yusuf Setinkaya, Metin Sezgin, Claudiu Tanase, Emre Toyan, Sean Wood, Doguhan Yeke, Françcois Rocca, Pierre-Henri De Deken, Alessandra Bandrabur, Fabien Grisard, Axel Jean-Caurant, Vincent Courboulay, Radhwan Ben Madhkour, Ambroise Moreau

The 11th Summer Workshop on Multimodal Interfaces eNTERFACE 2015 was hosted by the Numediart Institute of Creative Technologies of the University of Mons from August 10th to September 2015.

Cannot find the paper you are looking for? You can Submit a new open access paper.