no code implementations • 13 Jun 2023 • Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff
To address this issue, we propose the integration of a novel dynamic contextual carry-over mechanism in a state-of-the-art (SOTA) unified ASR system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 18 Apr 2023 • Xilai Li, Goeric Huybrechts, Srikanth Ronanki, Jeff Farris, Sravan Bodapati
Overall, our proposed model reduces the degradation of the streaming mode over the non-streaming full-contextual model from 41. 7% and 45. 7% to 16. 7% and 26. 2% on the LibriSpeech test-clean and test-other datasets respectively, while improving by a relative 15. 5% WER over the previous state-of-the-art unified model.
no code implementations • 29 Jul 2022 • Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo-Trueba
The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming.
no code implementations • 16 Feb 2022 • Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba
It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.
no code implementations • 10 Feb 2022 • Manuel Sam Ribeiro, Julian Roth, Giulia Comini, Goeric Huybrechts, Adam Gabrys, Jaime Lorenzo-Trueba
The proposed approach relies on voice conversion to first generate high-quality data from the set of supporting expressive speakers.
no code implementations • 24 Jun 2021 • Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt
In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker.
no code implementations • 14 Jan 2021 • Bastian Schnell, Goeric Huybrechts, Bartek Perz, Thomas Drugman, Jaime Lorenzo-Trueba
In this work we propose EmoCat, a language-agnostic emotional voice conversion model.
no code implementations • 11 Nov 2020 • Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba
While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style.
no code implementations • 11 Dec 2019 • Marius Cotescu, Thomas Drugman, Goeric Huybrechts, Jaime Lorenzo-Trueba, Alexis Moinet
We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech.
no code implementations • 4 Mar 2019 • Thomas Drugman, Goeric Huybrechts, Viacheslav Klimkov, Alexis Moinet
In this paper, we consider voicing detection as a classification problem and F0 contour estimation as a regression problem.