Search Results for author: Iván Vallés-Pérez

Found 6 papers, 3 papers with code

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

no code implementations5 Feb 2024 Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Iván Vallés-Pérez, Biel Tura-Vecino, Piotr Biliński, Mateusz Lajszczak, Grzegorz Beringer, Roberto Barra-Chicote, Jaime Lorenzo-Trueba

Using speaker-disentangled codes to train LLMs for text-to-speech (TTS) allows the LLM to generate the content and the style of the speech only from the text, similarly to humans, while the speaker identity is provided by the decoder of the VC model.

In-Context Learning Voice Conversion

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations4 Nov 2022 Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Approaching sales forecasting using recurrent neural networks and transformers

1 code implementation16 Apr 2022 Iván Vallés-Pérez, Emilio Soria-Olivas, Marcelino Martínez-Sober, Antonio J. Serrano-López, Juan Gómez-Sanchís, Fernando Mateo

Accurate and fast demand forecast is one of the hot topics in supply chain for enabling the precise execution of the corresponding downstream processes (inbound and outbound planning, inventory placement, network planning, etc).

End-to-end Keyword Spotting using Xception-1d

1 code implementation9 Oct 2021 Iván Vallés-Pérez, Juan Gómez-Sanchis, Marcelino Martínez-Sober, Joan Vila-Francés, Antonio J. Serrano-López, Emilio Soria-Olivas

The field of conversational agents is growing fast and there is an increasing need for algorithms that enhance natural interaction.

Keyword Spotting

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows

no code implementations10 Jun 2021 Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

This paper proposes a new neural text-to-speech model that approaches the disentanglement problem by conditioning a Tacotron2-like architecture on flow-normalized speaker embeddings, and by substituting the reference encoder with a new learned latent distribution responsible for modeling the intra-sentence variability due to the prosody.

Disentanglement Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.