no code implementations • 15 Feb 2024 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa Välimäki, Timo Gerkmann
Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality.
1 code implementation • 18 Sep 2023 • Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann
While the performance of usual generative diffusion algorithms drops dramatically when lowering the number of function evaluations (NFEs) to obtain single-step diffusion, we show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting and also generalizes better than its predictive counterpart.
no code implementations • 5 Jun 2023 • Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Tal Peer, Timo Gerkmann
Since its inception, the field of deep speech enhancement has been dominated by predictive (discriminative) approaches, such as spectral mapping or masking.
no code implementations • 2 Jun 2023 • Julius Richter, Simone Frintrop, Timo Gerkmann
This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information.
1 code implementation • 31 May 2023 • Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments.
no code implementations • 15 Mar 2023 • Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann
In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions.
2 code implementations • 28 Feb 2023 • Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann
Recently, score-based generative models have been successfully employed for the task of speech enhancement.
2 code implementations • 22 Dec 2022 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann
As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions.
1 code implementation • 4 Nov 2022 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann
In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks.
1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023 • Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann
This matches our forward process which moves from clean speech to noisy speech by including a drift term.
Ranked #19 on Speech Enhancement on VoiceBank + DEMAND
1 code implementation • 31 Mar 2022 • Simon Welker, Julius Richter, Timo Gerkmann
Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals.
1 code implementation • 19 May 2021 • Guillaume Carbajal, Julius Richter, Timo Gerkmann
In this work, we propose to use an adversarial training scheme for variational autoencoders to disentangle the label from the other latent variables.
no code implementations • 12 Feb 2021 • Guillaume Carbajal, Julius Richter, Timo Gerkmann
In this paper, we propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech.