no code implementations • 14 Aug 2024 • Jean-Marie Lemercier, Eloi Moliner, Simon Welker, Vesa Välimäki, Timo Gerkmann
This paper presents an unsupervised method for single-channel blind dereverberation and room impulse response (RIR) estimation, called BUDDy.
no code implementations • 22 Jul 2024 • Bunlong Lay, Sebastian Zaczek, Kristina Tesch, Timo Gerkmann
Single-channel speech separation is a crucial task for enhancing speech recognition systems in multi-speaker environments.
2 code implementations • 10 Jun 2024 • Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data.
no code implementations • 5 Jun 2024 • Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann
The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for evaluation.
Ranked #1 on Speech Enhancement on VoiceBank + DEMAND
no code implementations • 7 May 2024 • Eloi Moliner, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann, Vesa Välimäki
In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models.
no code implementations • 15 Feb 2024 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa Välimäki, Timo Gerkmann
Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality.
no code implementations • 1 Feb 2024 • Bunlong Lay, Timo Gerkmann
The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evolution of the mean and the variance along the diffusion processes when adding environmental and Gaussian noise.
Ranked #20 on Speech Enhancement on VoiceBank + DEMAND
1 code implementation • 18 Sep 2023 • Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann
While the performance of usual generative diffusion algorithms drops dramatically when lowering the number of function evaluations (NFEs) to obtain single-step diffusion, we show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting and also generalizes better than its predictive counterpart.
no code implementations • 18 Sep 2023 • Danilo de Oliveira, Timo Gerkmann
Much research effort is being applied to the task of compressing the knowledge of self-supervised models, which are powerful, yet large and memory consuming.
no code implementations • 14 Sep 2023 • Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann
In this work, we demonstrate that the ptychographic phase problem can be solved in a live fashion during scanning, while data is still being collected.
no code implementations • 14 Sep 2023 • Navin Raj Prabhu, Bunlong Lay, Simon Welker, Nale Lehmann-Willenbrock, Timo Gerkmann
Subsequently, at inference, a target emotion embedding is employed to convert the emotion of the input utterance to the given target emotion.
no code implementations • 13 Sep 2023 • Tal Peer, Simon Welker, Johannes Kolhoff, Timo Gerkmann
Several recent contributions in the field of iterative STFT phase retrieval have demonstrated that the performance of the classical Griffin-Lim method can be considerably improved upon.
1 code implementation • 22 Jun 2023 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise.
1 code implementation • 21 Jun 2023 • Jean-Marie Lemercier, Simon Welker, Timo Gerkmann
We present in this paper an informed single-channel dereverberation method based on conditional generation with diffusion models.
no code implementations • 5 Jun 2023 • Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Tal Peer, Timo Gerkmann
Since its inception, the field of deep speech enhancement has been dominated by predictive (discriminative) approaches, such as spectral mapping or masking.
no code implementations • 2 Jun 2023 • Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann
In this work, we specifically focus on in-the-wild emotion conversion where parallel data does not exist, and the problem of disentangling lexical, speaker, and emotion information arises.
no code implementations • 2 Jun 2023 • Julius Richter, Simone Frintrop, Timo Gerkmann
This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information.
1 code implementation • 31 May 2023 • Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments.
no code implementations • 30 May 2023 • Danilo de Oliveira, Navin Raj Prabhu, Timo Gerkmann
In large part due to their implicit semantic modeling, self-supervised learning (SSL) methods have significantly increased the performance of valence recognition in speech emotion recognition (SER) systems.
1 code implementation • 15 May 2023 • Huajian Fang, Dennis Becker, Stefan Wermter, Timo Gerkmann
In this paper, we study the benefits of modeling uncertainty in clean speech estimation.
no code implementations • 24 Apr 2023 • Kristina Tesch, Timo Gerkmann
In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture.
no code implementations • 27 Mar 2023 • Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann
In this paper, we propose a multichannel partially adaptive scheme to jointly model ego-noise and environmental noise utilizing the VAE-NMF framework, where we take advantage of spatially and spectrally structured characteristics of ego-noise by pre-training the ego-noise model, while retaining the ability to adapt to unknown environmental noise.
no code implementations • 15 Mar 2023 • Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann
In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions.
no code implementations • 1 Mar 2023 • Jean-Marie Lemercier, Julian Tobergte, Timo Gerkmann
We demonstrate that the resulting deep subband filtering scheme outperforms multiplicative masking for dereverberation, while leaving the denoising performance virtually the same.
2 code implementations • 28 Feb 2023 • Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann
Recently, score-based generative models have been successfully employed for the task of speech enhancement.
2 code implementations • 22 Dec 2022 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann
As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions.
no code implementations • 9 Dec 2022 • Huajian Fang, Timo Gerkmann
Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy.
1 code implementation • 12 Nov 2022 • Simon Welker, Henry N. Chapman, Timo Gerkmann
In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels.
no code implementations • 8 Nov 2022 • Tal Peer, Simon Welker, Timo Gerkmann
Diffusion probabilistic models have been recently used in a variety of tasks, including speech enhancement and synthesis.
no code implementations • 4 Nov 2022 • Kristina Tesch, Timo Gerkmann
In a scenario with multiple persons talking simultaneously, the spatial characteristics of the signals are the most distinct feature for extracting the target signal.
1 code implementation • 4 Nov 2022 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann
In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks.
1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023 • Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann
This matches our forward process which moves from clean speech to noisy speech by including a drift term.
Ranked #1 on Speech Dereverberation on EARS-Reverb
1 code implementation • 25 Jul 2022 • Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann
To address this, these emotion annotations are typically collected by multiple annotators and averaged across annotators in order to obtain labels for arousal and valence.
1 code implementation • 27 Jun 2022 • Kristina Tesch, Timo Gerkmann
The key advantage of using multiple microphones for speech enhancement is that spatial filtering can be used to complement the tempo-spectral processing.
no code implementations • 23 Jun 2022 • Danilo de Oliveira, Tal Peer, Timo Gerkmann
The SepFormer architecture shows very good results in speech separation.
1 code implementation • 22 Jun 2022 • Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann
Employing deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages over a traditional approach combining a linear spatial filter with an independent tempo-spectral post-filter: 1) non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model and 2) joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.
no code implementations • 11 May 2022 • Tal Peer, Simon Welker, Timo Gerkmann
Phase retrieval is a problem encountered not only in speech and audio processing, but in many other fields such as optics.
no code implementations • 6 Apr 2022 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
By deriving new metrics analyzing the dereverberation performance in various time ranges, we confirm that directly optimizing for a criterion at the output of the multi-channel linear filtering stage results in a more efficient dereverberation as compared to placing the criterion at the output of the DNN to optimize the PSD estimation.
no code implementations • 6 Apr 2022 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
This work focuses on online dereverberation for hearing devices using the weighted prediction error (WPE) algorithm.
no code implementations • 6 Apr 2022 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
In this paper, a neural network-augmented algorithm for noise-robust online dereverberation with a Kalman filtering variant of the weighted prediction error (WPE) method is proposed.
1 code implementation • 31 Mar 2022 • Simon Welker, Julius Richter, Timo Gerkmann
Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals.
no code implementations • 30 Mar 2022 • Tal Peer, Timo Gerkmann
Algorithmic latency in speech processing is dominated by the frame length used for Fourier analysis, which in turn limits the achievable performance of magnitude-centric approaches.
no code implementations • 4 Mar 2022 • Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann
Speech enhancement in the time-frequency domain is often performed by estimating a multiplicative mask to extract clean speech.
no code implementations • 17 Feb 2022 • Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann
One of the most prominent challenges in the field of diffractive imaging is the phase retrieval (PR) problem: In order to reconstruct an object from its diffraction pattern, the inverse Fourier transform must be computed.
1 code implementation • 7 Oct 2021 • Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock, Timo Gerkmann
At training, the network learns a distribution of weights to capture the inherent uncertainty related to subjective arousal annotations.
1 code implementation • 19 May 2021 • Guillaume Carbajal, Julius Richter, Timo Gerkmann
In this work, we propose to use an adversarial training scheme for variational autoencoders to disentangle the label from the other latent variables.
no code implementations • 22 Apr 2021 • Kristina Tesch, Timo Gerkmann
Rather, the MMSE optimal filter is a joint spatial and spectral nonlinear function.
no code implementations • 17 Feb 2021 • Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann
Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics.
no code implementations • 12 Feb 2021 • Guillaume Carbajal, Julius Richter, Timo Gerkmann
In this paper, we propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech.
no code implementations • 11 Nov 2020 • Thilo Fryen, Manfred Eppe, Phuong D. H. Nguyen, Timo Gerkmann, Stefan Wermter
Reinforcement learning is a promising method to accomplish robotic control tasks.
1 code implementation • 10 Jun 2020 • Tobias Knopp, Mirco Grosser, Matthias Graeser, Timo Gerkmann, Martin Möddel
Background signals are a primary source of artifacts in magnetic particle imaging and limit the sensitivity of the method since background signals are often not precisely known and vary over time.
no code implementations • 7 Apr 2020 • Robert Rehr, Timo Gerkmann
In this paper, we address the generalization of deep neural network (DNN) based speech enhancement to unseen noise conditions for the case that training data is limited in size and diversity.
1 code implementation • 29 Feb 2020 • Hongzhuo Liang, Chuangchuang Zhou, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun, Marcus Stoffel, Jianwei Zhang
Both network training results and robot experiments demonstrate that MP-Net is robust against noise and changes to the task and environment.
1 code implementation • 25 Oct 2019 • David Ditter, Timo Gerkmann
In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well be replaced by a deterministic hand-crafted filterbank.
1 code implementation • 2 Mar 2019 • Hongzhuo Liang, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Jianwei Zhang
PouringNet is trained on our collected real-world pouring dataset with multimodal sensing data, which contains more than 3000 recordings of audio, force feedback, video and trajectory data of the human hand that performs the pouring task.
Robotics Sound Audio and Speech Processing
no code implementations • 10 Aug 2017 • Huy Phan, Martin Krawczyk-Becker, Timo Gerkmann, Alfred Mertins
Our proposed systems significantly outperform the challenge baseline, improving F-score from 72. 7% to 90. 0% and reducing detection error rate from 0. 53 to 0. 18 on average on the development data.