Aiming at estimating the direction of arrival (DOA) of a desired speaker in a multi-talker environment using a microphone array, in this paper we propose a signal-informed method exploiting the availability of an external microphone attached to the desired speaker.
Target speaker extraction aims at extracting the target speaker from a mixture of multiple speakers exploiting auxiliary information about the target speaker.
In this paper, we perform a theoretical bias analysis for the SC-based RTF vector estimation method with multiple external microphones.
A popular approach for 3D source localization using multiple microphones is the steered-response power method, where the source position is directly estimated by maximizing a function of three continuous position variables.
Recently, a method has been proposed to estimate the direction of arrival (DOA) of a single speaker by minimizing the frequency-averaged Hermitian angle between an estimated relative transfer function (RTF) vector and a database of prototype anechoic RTF vectors.
To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance.
In mobile speech communication applications, wind noise can lead to a severe reduction of speech quality and intelligibility.
In this paper, we apply a deep learning-based bandwidth-extension system to the own voice reconstruction task and investigate different training strategies in order to overcome the limited availability of training data.
In this paper we consider an in-ear headphone equipped with an inner microphone and multiple loudspeakers and we propose an optimization procedure with a convex objective function to derive a fixed multi-loudspeaker ANC controller aiming at minimizing the sound pressure at the ear drum.
To achieve optimal individualized equalization typically requires knowledge of all transfer functions between the source, the hearing device, and the individual eardrum.
To improve the sound quality of hearing devices, equalization filters can be used that aim at achieving acoustic transparency, i. e., listening with the device in the ear is perceptually similar to the open ear.
To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech component in the short-time Fourier transform domain compared to the noisy microphone signals.
Based on measured acoustic paths to predict the sound pressure generated by external sources and the headphone at the ear drum, the FIR filter coefficients of the ANC controller are optimized for different sound fields.
In this paper we consider a binaural hearing aid setup, where in addition to the head-mounted microphones an external microphone is available.
In this paper, we focus on a single-channel target speaker extraction system based on a CNN-LSTM separator network and a speaker embedder network requiring reference speech of the target speaker.
Multi-frame algorithms for single-microphone speech enhancement, e. g., the multi-frame minimum variance distortionless response (MFMVDR) filter, are able to exploit speech correlation across adjacent time frames in the short-time Fourier transform (STFT) domain.
In this paper, we investigate a state-space model using correlation coefficients obtained with a small correlation window to improve the decoding performance of the linear and the non-linear AAD methods.
While the binaural minimum variance distortionless response (BMVDR) beamformer provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the interfering sources and the background noise.
This paper presents two single channel speech dereverberation methods to enhance the quality of speech signals that have been recorded in an enclosed space.
Nonnegative matrix factorization (NMF) has been actively investigated and used in a wide range of problems in the past decade.