Search Results for author: Reinhold Haeb-Umbach

Found 40 papers, 16 papers with code

Neural network based spectral mask estimation for acoustic beamforming

1 code implementation • ICASSP 2016 • Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach

The network training is independent of the number and the geometric configuration of the microphones.

185

Paper
Code

Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR

1 code implementation • 29 May 2019 • Naoyuki Kanda, Christoph Boeddeker, Jens Heitkaemper, Yusuke Fujita, Shota Horiguchi, Kenji Nagamatsu, Reinhold Haeb-Umbach

In this paper, we present Hitachi and Paderborn University's joint effort for automatic speech recognition (ASR) in a dinner party scenario.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

107

Paper
Code

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

1 code implementation • 26 Sep 2019 • Catalin Zorila, Christoph Boeddeker, Rama Doddipatla, Reinhold Haeb-Umbach

Despite the strong modeling power of neural network acoustic models, speech enhancement has been shown to deliver additional word error rate improvements if multi-channel data is available.

Speech Enhancement

107

Paper
Code

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

3 code implementations • 30 Oct 2019 • Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold Haeb-Umbach

We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal.

Position

Paper
Code

Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-supervised Sound Event Detection

1 code implementation • 11 Mar 2021 • Janek Ebbers, Reinhold Haeb-Umbach

It is trained to predict strong labels while using (predicted) tags, i. e., weak labels, as additional input.

Event Detection Sound Event Detection +1

Paper
Code

On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems

1 code implementation • 29 Nov 2022 • Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach

We propose a general framework to compute the word error rate (WER) of ASR systems that process recordings containing multiple speakers at their input and that produce multiple output word sequences (MIMO).

speech-recognition Speech Recognition

Paper
Code

MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems

1 code implementation • 21 Jul 2023 • Thilo von Neumann, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

MeetEval is an open-source toolkit to evaluate all kinds of meeting transcription systems.

Sentence

Paper
Code

MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

1 code implementation • 23 Sep 2022 • Tobias Cord-Landwehr, Thilo von Neumann, Christoph Boeddeker, Reinhold Haeb-Umbach

Training and evaluation of these single tasks requires synthetic data with access to intermediate signals that is as close as possible to the evaluation scenario.

Speech Enhancement

Paper
Code

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

1 code implementation • 30 Jul 2021 • Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

When processing meeting-like data in a segment-wise manner, i. e., by separating overlapping segments independently and stitching adjacent segments to continuous output streams, this constraint has to be fulfilled for any segment.

Speech Separation

Paper
Code

Speeding Up Permutation Invariant Training for Source Separation

1 code implementation • 30 Jul 2021 • Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach

The Hungarian algorithm can be used for uPIT and we introduce various algorithms for the Graph-PIT assignment problem to reduce the complexity to be polynomial in the number of utterances.

Paper
Code

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

1 code implementation • 28 Jul 2022 • Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker, Reinhold Haeb-Umbach

In this paper, we argue that such an approach involving the segmentation has several issues; for example, it inevitably faces a dilemma that larger segment sizes increase both the context available for enhancing the performance and the number of speakers for the local EEND module to handle.

Clustering Segmentation +2

Paper
Code

Threshold Independent Evaluation of Sound Event Detection Scores

1 code implementation • 31 Jan 2022 • Janek Ebbers, Romain Serizel, Reinhold Haeb-Umbach

Performing an adequate evaluation of sound event detection (SED) systems is far from trivial and is still subject to ongoing research.

Event Detection Sound Event Detection

Paper
Code

Post-Processing Independent Evaluation of Sound Event Detection Systems

1 code implementation • 27 Jun 2023 • Janek Ebbers, Reinhold Haeb-Umbach, Romain Serizel

It summarizes the system performance over a range of operating modes resulting from varying the decision threshold that is used to translate the system output scores into a binary detection output.

Event Detection Sound Event Detection

Paper
Code

Iterative Geometry Calibration from Distance Estimates for Wireless Acoustic Sensor Networks

1 code implementation • 11 Dec 2020 • Tobias Gburrek, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

In this paper we present an approach to geometry calibration in wireless acoustic sensor networks, whose nodes are assumed to be equipped with a compact microphone array.

Position

Paper
Code

On Synchronization of Wireless Acoustic Sensor Networks in the Presence of Time-varying Sampling Rate Offsets and Speaker Changes

1 code implementation • 25 Oct 2021 • Tobias Gburrek, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

A wireless acoustic sensor network records audio signals with sampling time and sampling rate offsets between the audio streams, if the analog-digital converters (ADCs) of the network devices are not synchronized.

Paper
Code

LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices

1 code implementation • 21 Aug 2023 • Joerg Schmalenstroeer, Tobias Gburrek, Reinhold Haeb-Umbach

We present LibriWASN, a data set whose design follows closely the LibriCSS meeting recognition data set, with the marked difference that the data is recorded with devices that are randomly positioned on a meeting table and whose sampling clocks are not synchronized.

Paper
Code

Directional Statistics and Filtering Using libDirectional

no code implementations • 28 Dec 2017 • Gerhard Kurz, Igor Gilitschenski, Florian Pfaff, Lukas Drude, Uwe D. Hanebeck, Reinhold Haeb-Umbach, Roland Y. Siegwart

In this paper, we present libDirectional, a MATLAB library for directional statistics and directional estimation.

Paper
Add Code

All-neural online source separation, counting, and diarization for meeting analysis

no code implementations • 21 Feb 2019 • Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach

While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Unsupervised training of a deep clustering model for multichannel blind source separation

no code implementations • 2 Apr 2019 • Lukas Drude, Daniel Hasenklever, Reinhold Haeb-Umbach

We propose a training scheme to train neural network-based source separation algorithms from scratch when parallel clean data is unavailable.

blind source separation Clustering +2

Paper
Add Code

Unsupervised training of neural mask-based beamforming

no code implementations • 2 Apr 2019 • Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach

In contrast to previous work on unsupervised training of neural mask estimators, our approach avoids the need for a possibly pre-trained teacher model entirely.

speech-recognition Speech Recognition

Paper
Add Code

Jointly optimal dereverberation and beamforming

no code implementations • 30 Oct 2019 • Christoph Boeddeker, Tomohiro Nakatani, Keisuke Kinoshita, Reinhold Haeb-Umbach

We previously proposed an optimal (in the maximum likelihood sense) convolutional beamformer that can perform simultaneous denoising and dereverberation, and showed its superiority over the widely used cascade of a WPE dereverberation filter and a conventional MPDR beamformer.

Denoising

Paper
Add Code

Demystifying TasNet: A Dissecting Approach

no code implementations • 20 Nov 2019 • Jens Heitkaemper, Darius Jakobeit, Christoph Boeddeker, Lukas Drude, Reinhold Haeb-Umbach

In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments.

Speech Separation

Paper
Add Code

End-to-end training of time domain audio separation and recognition

no code implementations • 18 Dec 2019 • Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition.

Speaker Recognition speech-recognition +2

Paper
Add Code

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

no code implementations • 4 Jun 2020 • Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Warping of Radar Data into Camera Image for Cross-Modal Supervision in Automotive Applications

no code implementations • 23 Dec 2020 • Christopher Grimm, Tai Fei, Ernst Warsitz, Ridha Farhoud, Tobias Breddermann, Reinhold Haeb-Umbach

As the warping operation relies on accurate scene flow estimation, we further propose a novel scene flow estimation algorithm which exploits information from camera, lidar and radar sensors.

Direction of Arrival Estimation Object Recognition +2

Paper
Add Code

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend

no code implementations • 23 Feb 2021 • Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian

Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions.

Action Detection Activity Detection +4

Paper
Add Code

SA-SDR: A novel loss function for separation of meeting style data

no code implementations • 29 Oct 2021 • Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

Many state-of-the-art neural network-based source separation systems use the averaged Signal-to-Distortion Ratio (SDR) as a training objective function.

Paper
Add Code

Monaural source separation: From anechoic to reverberant environments

no code implementations • 15 Nov 2021 • Tobias Cord-Landwehr, Christoph Boeddeker, Thilo von Neumann, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach

Impressive progress in neural network-based single-channel speech source separation has been made in recent years.

Paper
Add Code

A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network

no code implementations • 2 May 2022 • Tobias Gburrek, Christoph Boeddeker, Thilo von Neumann, Tobias Cord-Landwehr, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

We propose a system that transcribes the conversation of a typical meeting scenario that is captured by a set of initially unsynchronized microphone arrays at unknown positions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Investigation into Target Speaking Rate Adaptation for Voice Conversion

no code implementations • 5 Sep 2022 • Michael Kuhlmann, Fritz Seebauer, Janek Ebbers, Petra Wagner, Reinhold Haeb-Umbach

Disentangling speaker and content attributes of a speech signal into separate latent representations followed by decoding the content with an exchanged speaker representation is a popular approach for voice conversion, which can be trained with non-parallel and unlabeled speech data.

Disentanglement Voice Conversion

Paper
Add Code

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

no code implementations • 7 Mar 2023 • Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux

Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly.

Ranked #1 on Speech Recognition on LibriCSS (using extra training data)

Action Detection Activity Detection +1

Paper
Add Code

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

no code implementations • 1 Jun 2023 • Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

We introduce a monaural neural speaker embeddings extractor that computes an embedding for each speaker present in a speech mixture.

Paper
Add Code

Frame-wise and overlap-robust speaker embeddings for meeting diarization

no code implementations • 1 Jun 2023 • Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

Using a Teacher-Student training approach we developed a speaker embedding extraction system that outputs embeddings at frame rate.

Paper
Add Code

Mixture Encoder for Joint Speech Separation and Recognition

no code implementations • 21 Jun 2023 • Simon Berger, Peter Vieting, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach

Modular approaches separate speakers and recognize each of them with a single-speaker ASR system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Investigating Speaker Embedding Disentanglement on Natural Read Speech

no code implementations • 8 Aug 2023 • Michael Kuhlmann, Adrian Meise, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach

To quantify disentanglement, we identify acoustic features that are highly speaker-variant and can serve as proxies for the factors of variation underlying speech.

Disentanglement Fairness

Paper
Add Code

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

no code implementations • 15 Sep 2023 • Peter Vieting, Simon Berger, Thilo von Neumann, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach

This mixture encoder leverages the original overlapped speech to mitigate the effect of artifacts introduced by the speech separation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

no code implementations • 28 Sep 2023 • Thilo von Neumann, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix, Reinhold Haeb-Umbach

We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset.

Sentence Speech Separation

Paper
Add Code

On Feature Importance and Interpretability of Speaker Representations

no code implementations • 19 Oct 2023 • Frederik Rautenberg, Michael Kuhlmann, Jana Wiechmann, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach

Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal.

Disentanglement Feature Importance

Paper
Add Code

Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks

no code implementations • 27 Nov 2023 • Tobias Gburrek, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

We propose a diarization system, that estimates "who spoke when" based on spatial information, to be used as a front-end of a meeting transcription system running on the signals gathered from an acoustic sensor network (ASN).

Paper
Add Code

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

no code implementations • 8 Jan 2024 • Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

We propose a modified teacher-student training for the extraction of frame-wise speaker embeddings that allows for an effective diarization of meeting scenarios containing partially overlapping speech.

Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.