no code implementations • 13 Feb 2025 • Xin Wang, Héctor Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi, Myeonghun Jeong, Ge Zhu, Yongyi Zang, You Zhang, Soumi Maiti, Florian Lux, Nicolas Müller, Wangyou Zhang, Chengzhe Sun, Shuwei Hou, Siwei Lyu, Sébastien Le Maguer, Cheng Gong, Hanjie Guo, Liping Chen, Vishwanath Singh
The database contains attacks generated with 32 different algorithms, also crowdsourced, and optimised to varying degrees using new surrogate detection models.
1 code implementation • 17 Sep 2024 • Tushar Dhyani, Florian Lux, Michele Mancusi, Giorgio Fabbro, Fritz Hohl, Ngoc Thang Vu
Traditional speech enhancement methods often oversimplify the task of restoration by focusing on a single type of distortion.
1 code implementation • 3 Jul 2024 • Sarina Meyer, Florian Lux, Ngoc Thang Vu
In speaker anonymization, speech recordings are modified in a way that the identity of the speaker remains hidden.
1 code implementation • 10 Jun 2024 • Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do, Matt Coler, Emanuël A. P. Habets, Ngoc Thang Vu
In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development.
1 code implementation • 10 Jun 2024 • Thomas Bott, Florian Lux, Ngoc Thang Vu
In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language.
no code implementations • 26 Oct 2023 • Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu
Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available.
1 code implementation • 26 Oct 2023 • Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu
For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021.
1 code implementation • 21 Oct 2022 • Florian Lux, Julia Koch, Ngoc Thang Vu
While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world's over 6, 000 spoken languages.
no code implementations • 21 Oct 2022 • Florian Lux, Ching-Yi Chen, Ngoc Thang Vu
This pretrained model is then finetuned to a specific task.
1 code implementation • 13 Oct 2022 • Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu
In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.
no code implementations • 11 Jul 2022 • Julia Koch, Florian Lux, Nadja Schauffler, Toni Bernhart, Felix Dieterle, Jonas Kuhn, Sandra Richter, Gabriel Viehhauser, Ngoc Thang Vu
Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech.
1 code implementation • 11 Jul 2022 • Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu
In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
2 code implementations • 24 Jun 2022 • Florian Lux, Julia Koch, Ngoc Thang Vu
The cloning of a speaker's voice using an untranscribed reference sample is one of the great advances of modern neural text-to-speech (TTS) methods.
1 code implementation • ACL 2022 • Florian Lux, Ngoc Thang Vu
While neural text-to-speech systems perform remarkably well in high-resource scenarios, they cannot be applied to the majority of the over 6, 000 spoken languages in the world due to a lack of appropriate training data.
no code implementations • 25 Feb 2021 • Florian Lux, Ngoc Thang Vu
We propose a new method of generating meaningful embeddings for speech, changes to four commonly used meta learning approaches to enable them to perform keyword spotting in continuous signals and an approach of combining their outcomes into an end-to-end automatic speech recognition system to improve rare word recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • ACL 2020 • Chia-Yu Li, Daniel Ortega, Dirk Väth, Florian Lux, Lindsey Vanderlyn, Maximilian Schmidt, Michael Neumann, Moritz Völkel, Pavel Denisov, Sabrina Jenne, Zorica Kacarevic, Ngoc Thang Vu
We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e. g. emotion recognition, engagement level prediction and backchanneling) conversational agents.
no code implementations • WS 2019 • Matthias Damaschk, Tillmann D{\"o}nicke, Florian Lux
This paper discusses methods to improve the performance of text classification on data that is difficult to classify due to a large number of unbalanced classes with noisy examples.