Search Results for author: Jaime Lorenzo-Trueba

Found 32 papers, 6 papers with code

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

no code implementations • 5 Feb 2024 • Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Iván Vallés-Pérez, Biel Tura-Vecino, Piotr Biliński, Mateusz Lajszczak, Grzegorz Beringer, Roberto Barra-Chicote, Jaime Lorenzo-Trueba

Using speaker-disentangled codes to train LLMs for text-to-speech (TTS) allows the LLM to generate the content and the style of the speech only from the text, similarly to humans, while the speaker identity is provided by the decoder of the VC model.

In-Context Learning Voice Conversion

Paper
Add Code

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

no code implementations • 31 Jul 2023 • Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.

Acoustic Modelling Speech Synthesis +1

Paper
Add Code

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

no code implementations • 31 Jul 2023 • Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba

The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation.

speech-recognition Speech Recognition

Paper
Add Code

Multilingual context-based pronunciation learning for Text-to-Speech

no code implementations • 31 Jul 2023 • Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba

Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end.

Paper
Add Code

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

no code implementations • 29 Jul 2022 • Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo-Trueba

The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming.

Data Augmentation Voice Conversion

Paper
Add Code

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

no code implementations • 2 Jul 2022 • Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Bozena Kostek

We show that these techniques not only improve the accuracy of three machine learning models for detecting pronunciation errors but also help establish a new state-of-the-art in the field.

Speech Synthesis

Paper
Add Code

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

no code implementations • 16 Feb 2022 • Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.

Speech Synthesis Voice Conversion

Paper
Add Code

Cross-speaker style transfer for text-to-speech using data augmentation

no code implementations • 10 Feb 2022 • Manuel Sam Ribeiro, Julian Roth, Giulia Comini, Goeric Huybrechts, Adam Gabrys, Jaime Lorenzo-Trueba

The proposed approach relies on voice conversion to first generate high-quality data from the set of supporting expressive speakers.

Data Augmentation Style Transfer +1

Paper
Add Code

Enhancing audio quality for expressive Neural Text-to-Speech

no code implementations • 13 Aug 2021 • Abdelhamid Ezzerg, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba, Viacheslav Klimkov

Artificial speech synthesis has made a great leap in terms of naturalness as recent Text-to-Speech (TTS) systems are capable of producing speech with similar quality to human recordings.

Acoustic Modelling Speech Synthesis

Paper
Add Code

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

1 code implementation • 16 Jun 2021 • Alejandro Mottini, Jaime Lorenzo-Trueba, Sri Vishnu Kumar Karlapati, Thomas Drugman

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker.

Voice Conversion

Paper
Code

Weakly-supervised word-level pronunciation error detection in non-native English speech

no code implementations • 7 Jun 2021 • Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Shira Calamaro, Bozena Kostek

To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words.

Paper
Add Code

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

1 code implementation • NAACL 2021 • Shubhi Tyagi, Antonio Bonafonte, Jaime Lorenzo-Trueba, Javier Latorre

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on new languages is hard.

Paper
Code

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

no code implementations • 16 Jan 2021 • Daniel Korzekwa, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira Calamaro, Thomas Drugman, Bozena Kostek

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker.

Automatic Phoneme Recognition Sentence +1

Paper
Add Code

EmoCat: Language-agnostic Emotional Voice Conversion

no code implementations • 14 Jan 2021 • Bastian Schnell, Goeric Huybrechts, Bartek Perz, Thomas Drugman, Jaime Lorenzo-Trueba

In this work we propose EmoCat, a language-agnostic emotional voice conversion model.

Voice Conversion

Paper
Add Code

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

no code implementations • 29 Dec 2020 • Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS).

Data Augmentation

Paper
Add Code

Parallel WaveNet conditioned on VAE latent vectors

no code implementations • 17 Dec 2020 • Jonas Rohnke, Tom Merritt, Jaime Lorenzo-Trueba, Adam Gabrys, Vatsal Aggarwal, Alexis Moinet, Roberto Barra-Chicote

In this paper we investigate the use of a sentence-level conditioning vector to improve the signal quality of a Parallel WaveNet neural vocoder.

Sentence Speech Synthesis +1

Paper
Add Code

Low-resource expressive text-to-speech using data augmentation

no code implementations • 11 Nov 2020 • Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba

While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style.

Data Augmentation Voice Conversion

Paper
Add Code

Voice Conversion for Whispered Speech Synthesis

no code implementations • 11 Dec 2019 • Marius Cotescu, Thomas Drugman, Goeric Huybrechts, Jaime Lorenzo-Trueba, Alexis Moinet

We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech.

Speech Synthesis Voice Conversion

Paper
Add Code

Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

no code implementations • 2 Dec 2019 • Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba

Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities when considering isolated sentences.

Speech Synthesis

Paper
Add Code

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

no code implementations • 28 Nov 2019 • Vatsal Aggarwal, Marius Cotescu, Nishant Prateek, Jaime Lorenzo-Trueba, Roberto Barra-Chicote

We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second.

Disentanglement Expressive Speech Synthesis +1

Paper
Add Code

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

1 code implementation • 10 Nov 2019 • Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones.

Sound Audio and Speech Processing

Paper
Code

Towards achieving robust universal neural vocoding

1 code implementation • 4 Jul 2019 • Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality.

234

Paper
Code

In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

1 code implementation • NAACL 2019 • Nishant Prateek, Mateusz Łajszczak, Roberto Barra-Chicote, Thomas Drugman, Jaime Lorenzo-Trueba, Thomas Merritt, Srikanth Ronanki, Trevor Wood

Neural text-to-speech synthesis (NTTS) models have shown significant progress in generating high-quality speech, however they require a large quantity of training data.

Speech Synthesis Text-To-Speech Synthesis +1

Paper
Code

Robust universal neural vocoding

8 code implementations • 15 Nov 2018 • Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote

This paper introduces a robust universal neural vocoder trained with 74 speakers (comprised of both genders) coming from 17 languages.

308

Paper
Code

Effect of data reduction on sequence-to-sequence neural TTS

no code implementations • 15 Nov 2018 • Javier Latorre, Jakub Lachowicz, Jaime Lorenzo-Trueba, Thomas Merritt, Thomas Drugman, Srikanth Ronanki, Klimkov Viacheslav

Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings.

Speech Synthesis

Paper
Add Code

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis

no code implementations • 30 Jul 2018 • Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi

Generating versatile and appropriate synthetic speech requires control over the output expression separate from the spoken text.

Acoustic Modelling Emotional Speech Synthesis +1

Paper
Add Code

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

no code implementations • 23 Apr 2018 • Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling

As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.

Benchmarking Speaker Verification +1

Paper
Add Code

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

no code implementations • 12 Apr 2018 • Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling

We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.

Voice Conversion

Paper
Add Code

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

no code implementations • 7 Apr 2018 • Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches.

Speech Synthesis

Paper
Add Code

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

no code implementations • 2 Apr 2018 • Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba

Although voice conversion (VC) algorithms have achieved remarkable success along with the development of machine learning, superior performance is still difficult to achieve when using nonparallel data.

Generative Adversarial Network Image-to-Image Translation +4

Paper
Add Code

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

no code implementations • 2 Mar 2018 • Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database.

Generative Adversarial Network Speech Enhancement +2

Paper
Add Code

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM

no code implementations • COLING 2016 • Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Ascension Gallardo-Antolin, Junichi Yamagishi, Juan M. Montero

This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text.

Expressive Speech Synthesis Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.