Search Results for author: Jesper Jensen

Found 27 papers, 6 papers with code

Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks

no code implementations • 15 Mar 2024 • Peter Leer, Jesper Jensen, Laurel Carney, Zheng-Hua Tan, Jan Østergaard, Lars Bramsløw

In this study, we propose a DNN-based approach for hearing-loss compensation, which is trained on the outputs of hearing-impaired and normal-hearing DNN-based auditory models in response to speech signals.

Music Classification Speaker Identification +2

Paper
Add Code

How to train your ears: Auditory-model emulation for large-dynamic-range inputs and mild-to-severe hearing losses

1 code implementation • 15 Mar 2024 • Peter Leer, Jesper Jensen, Zheng-Hua Tan, Jan Østergaard, Lars Bramsløw

Our results show that this new optimization objective significantly improves the emulation performance of deep neural networks across relevant input sound levels and auditory-model frequency channels, without increasing the computational load during inference.

Speech Enhancement

Paper
Code

Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks

1 code implementation • 8 Mar 2024 • Vikas Tokala, Eric Grinstein, Mike Brookes, Simon Doclo, Jesper Jensen, Patrick A. Naylor

Studies have shown that in noisy acoustic environments, providing binaural signals to the user of an assistive listening device may improve speech intelligibility and spatial awareness.

Speech Enhancement

Paper
Code

On Speech Pre-emphasis as a Simple and Inexpensive Method to Boost Speech Enhancement

no code implementations • 17 Jan 2024 • Iván López-Espejo, Aditya Joglekar, Antonio M. Peinado, Jesper Jensen

Pre-emphasis filtering, compensating for the natural energy decay of speech at higher frequencies, has been considered as a common pre-processing step in a number of speech processing tasks over the years.

Automatic Speech Recognition Speech Enhancement +2

Paper
Add Code

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions

no code implementations • 27 Dec 2023 • Holger Severin Bovbjerg, Jesper Jensen, Jan Østergaard, Zheng-Hua Tan

Our experiments show that self-supervised pretraining not only improves performance in clean conditions, but also yields models which are more robust to adverse conditions compared to purely supervised learning.

Action Detection Activity Detection +1

Paper
Add Code

Investigating the Design Space of Diffusion Models for Speech Enhancement

no code implementations • 7 Dec 2023 • Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May

To address this, we extend this framework to account for the progressive transformation between the clean and noisy speech signals.

Image Generation Speech Enhancement

Paper
Add Code

Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler

no code implementations • 5 Dec 2023 • Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May

We show that the proposed system substantially benefits from using multiple databases for training, and achieves superior performance compared to state-of-the-art discriminative models in both matched and mismatched conditions.

Image Generation Speech Enhancement

Paper
Add Code

Joint Minimum Processing Beamforming and Near-end Listening Enhancement

no code implementations • 20 Sep 2023 • Andreas J. Fuglsig, Jesper Jensen, Zheng-Hua Tan, Lars S. Bertelsen, Jens Christian Lindof, Jan Østergaard

Results show that the joint optimization can further improve performance compared to the concatenated approach.

Speech Enhancement

Paper
Add Code

Speech inpainting: Context-based speech synthesis guided by video

no code implementations • 1 Jun 2023 • Juan F. Montesinos, Daniel Michelsanti, Gloria Haro, Zheng-Hua Tan, Jesper Jensen

Audio and visual modalities are inherently connected in speech signals: lip movements and facial expressions are correlated with speech sounds.

speech-recognition Speech Recognition +1

Paper
Add Code

Distributed Adaptive Norm Estimation for Blind System Identification in Wireless Sensor Networks

1 code implementation • 1 Mar 2023 • Matthias Blochberger, Filip Elvander, Randall Ali, Jan Østergaard, Jesper Jensen, Marc Moonen, Toon van Waterschoot

Distributed signal-processing algorithms in (wireless) sensor networks often aim to decentralize processing tasks to reduce communication cost and computational complexity or avoid reliance on a single device (i. e., fusion center) for processing.

Paper
Code

Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

no code implementations • 19 Nov 2022 • Iván López-Espejo, Ram C. M. C. Shekar, Zheng-Hua Tan, Jesper Jensen, John H. L. Hansen

In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance.

Small-Footprint Keyword Spotting

Paper
Add Code

Minimum Processing Near-end Listening Enhancement

no code implementations • 31 Oct 2022 • Andreas Jonas Fuglsig, Jesper Jensen, Zheng-Hua Tan, Lars Søndergaard Bertelsen, Jens Christian Lindof, Jan Østergaard

The intelligibility and quality of speech from a mobile phone or public announcement system are often affected by background noise in the listening environment.

Paper
Add Code

Deep Spoken Keyword Spotting: An Overview

no code implementations • 20 Nov 2021 • Iván López-Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Joint Far- and Near-End Speech Intelligibility Enhancement based on the Approximated Speech Intelligibility Index

no code implementations • 15 Nov 2021 • Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan

However, the existing optimal mutual information based method requires a complicated system model that includes natural speech variations, and relies on approximations and assumptions of the underlying signal distributions.

Speech Enhancement

Paper
Add Code

Audio-Visual Speech Inpainting with Deep Learning

no code implementations • 9 Oct 2020 • Giovanni Morrone, Daniel Michelsanti, Zheng-Hua Tan, Jesper Jensen

In this paper, we present a deep-learning-based framework for audio-visual speech inpainting, i. e., the task of restoring the missing parts of an acoustic speech signal from reliable audio context and uncorrupted visual information.

Multi-Task Learning

Paper
Add Code

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

1 code implementation • 21 Aug 2020 • Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.

Speech Enhancement Speech Separation

195

Paper
Code

Exploring Filterbank Learning for Keyword Spotting

no code implementations • 30 May 2020 • Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen

Despite their great performance over the years, handcrafted speech features are not necessarily optimal for any particular speech application.

Keyword Spotting

Paper
Add Code

Vocoder-Based Speech Synthesis from Silent Videos

no code implementations • 6 Apr 2020 • Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen

Both acoustic and visual information influence human perception of speech.

Multi-Task Learning Speech Synthesis

Paper
Add Code

On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

no code implementations • 3 Sep 2019 • Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen

Finally, we show that a loss function based on scale-invariant signal-to-distortion ratio (SI-SDR) achieves good general performance across a range of popular speech enhancement evaluation metrics, which suggests that SI-SDR is a good candidate as a general-purpose loss function for speech enhancement systems.

Speech Enhancement

Paper
Add Code

Keyword Spotting for Hearing Assistive Devices Robust to External Speakers

no code implementations • 22 Jun 2019 • Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen

Our results show that this multi-task deep residual network is able to achieve a KWS accuracy relative improvement of around 32% with respect to a system that does not deal with external speakers.

Keyword Spotting Multi-Task Learning

Paper
Add Code

Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect

no code implementations • 29 May 2019 • Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen

Regarding speech intelligibility, we find a general tendency of the benefit in training the systems with Lombard speech.

Speech Enhancement

Paper
Add Code

On Training Targets and Objective Functions for Deep-Learning-Based Audio-Visual Speech Enhancement

no code implementations • 15 Nov 2018 • Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen

Audio-visual speech enhancement (AV-SE) is the task of improving speech quality and intelligibility in a noisy environment using audio and visual information from a talker.

Speech Enhancement

Paper
Add Code

Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems

no code implementations • 15 Nov 2018 • Daniel Michelsanti, Zheng-Hua Tan, Sigurdur Sigurdsson, Jesper Jensen

Humans tend to change their way of speaking when they are immersed in a noisy environment, a reflex known as Lombard effect.

Speech Enhancement

Paper
Add Code

Monaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure

no code implementations • 2 Feb 2018 • Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen

Finally, we show that the proposed SE system performs on par with a traditional DNN based Short-Time Spectral Amplitude (STSA) SE system in terms of estimated speech intelligibility.

Sound Audio and Speech Processing

Paper
Add Code

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

no code implementations • 31 Aug 2017 • Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen

We show that deep bi-directional LSTM RNNs trained using uPIT in noisy environments can improve the Signal-to-Distortion Ratio (SDR) as well as the Extended Short-Time Objective Intelligibility (ESTOI) measure, on the speaker independent multi-talker speech separation and denoising task, for various noise types and Signal-to-Noise Ratios (SNRs).

Sound

Paper
Add Code

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

3 code implementations • 18 Mar 2017 • Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen

We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet).

Clustering Deep Clustering +1

131

Paper
Code

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

1 code implementation • 1 Jul 2016 • Dong Yu, Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen

We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem.

Clustering Deep Clustering +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.