Search Results for author: Christoph Boeddeker

Found 29 papers, 11 papers with code

Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment

1 code implementation5 Jun 2024 Christoph Boeddeker, Tobias Cord-Landwehr, Reinhold Haeb-Umbach

Diarization is a crucial component in meeting transcription systems to ease the challenges of speech enhancement and attribute the transcriptions to the correct speaker.

Attribute Speech Enhancement

Geodesic interpolation of frame-wise speaker embeddings for the diarization of meeting scenarios

no code implementations8 Jan 2024 Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

We propose a modified teacher-student training for the extraction of frame-wise speaker embeddings that allows for an effective diarization of meeting scenarios containing partially overlapping speech.

Clustering

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

no code implementations28 Sep 2023 Thilo von Neumann, Christoph Boeddeker, Tobias Cord-Landwehr, Marc Delcroix, Reinhold Haeb-Umbach

We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset.

Sentence Speech Separation

Frame-wise and overlap-robust speaker embeddings for meeting diarization

no code implementations1 Jun 2023 Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

Using a Teacher-Student training approach we developed a speaker embedding extraction system that outputs embeddings at frame rate.

A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures

no code implementations1 Jun 2023 Tobias Cord-Landwehr, Christoph Boeddeker, Cătălin Zorilă, Rama Doddipatla, Reinhold Haeb-Umbach

We introduce a monaural neural speaker embeddings extractor that computes an embedding for each speaker present in a speech mixture.

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

1 code implementation7 Mar 2023 Christoph Boeddeker, Aswin Shanmugam Subramanian, Gordon Wichern, Reinhold Haeb-Umbach, Jonathan Le Roux

Since diarization and source separation of meeting data are closely related tasks, we here propose an approach to perform the two objectives jointly.

 Ranked #1 on Speech Recognition on LibriCSS (using extra training data)

Action Detection Activity Detection +1

On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems

1 code implementation29 Nov 2022 Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach

We propose a general framework to compute the word error rate (WER) of ASR systems that process recordings containing multiple speakers at their input and that produce multiple output word sequences (MIMO).

speech-recognition Speech Recognition

Reverberation as Supervision for Speech Separation

no code implementations15 Nov 2022 Rohith Aralikatti, Christoph Boeddeker, Gordon Wichern, Aswin Shanmugam Subramanian, Jonathan Le Roux

This paper proposes reverberation as supervision (RAS), a novel unsupervised loss function for single-channel reverberant speech separation.

Speech Separation

MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

1 code implementation23 Sep 2022 Tobias Cord-Landwehr, Thilo von Neumann, Christoph Boeddeker, Reinhold Haeb-Umbach

Training and evaluation of these single tasks requires synthetic data with access to intermediate signals that is as close as possible to the evaluation scenario.

Speech Enhancement

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

1 code implementation28 Jul 2022 Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker, Reinhold Haeb-Umbach

In this paper, we argue that such an approach involving the segmentation has several issues; for example, it inevitably faces a dilemma that larger segment sizes increase both the context available for enhancing the performance and the number of speakers for the local EEND module to handle.

Clustering Segmentation +2

A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network

no code implementations2 May 2022 Tobias Gburrek, Christoph Boeddeker, Thilo von Neumann, Tobias Cord-Landwehr, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

We propose a system that transcribes the conversation of a typical meeting scenario that is captured by a set of initially unsynchronized microphone arrays at unknown positions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Monaural source separation: From anechoic to reverberant environments

no code implementations15 Nov 2021 Tobias Cord-Landwehr, Christoph Boeddeker, Thilo von Neumann, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach

Impressive progress in neural network-based single-channel speech source separation has been made in recent years.

SA-SDR: A novel loss function for separation of meeting style data

no code implementations29 Oct 2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

Many state-of-the-art neural network-based source separation systems use the averaged Signal-to-Distortion Ratio (SDR) as a training objective function.

Speeding Up Permutation Invariant Training for Source Separation

1 code implementation30 Jul 2021 Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach

The Hungarian algorithm can be used for uPIT and we introduce various algorithms for the Graph-PIT assignment problem to reduce the complexity to be polynomial in the number of utterances.

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

1 code implementation30 Jul 2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

When processing meeting-like data in a segment-wise manner, i. e., by separating overlapping segments independently and stitching adjacent segments to continuous output streams, this constraint has to be fulfilled for any segment.

Speech Separation

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

no code implementations4 Jun 2020 Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

End-to-end training of time domain audio separation and recognition

no code implementations18 Dec 2019 Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition.

Speaker Recognition speech-recognition +2

Demystifying TasNet: A Dissecting Approach

no code implementations20 Nov 2019 Jens Heitkaemper, Darius Jakobeit, Christoph Boeddeker, Lukas Drude, Reinhold Haeb-Umbach

In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments.

Speech Separation

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

3 code implementations30 Oct 2019 Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold Haeb-Umbach

We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal.

Position

Jointly optimal dereverberation and beamforming

no code implementations30 Oct 2019 Christoph Boeddeker, Tomohiro Nakatani, Keisuke Kinoshita, Reinhold Haeb-Umbach

We previously proposed an optimal (in the maximum likelihood sense) convolutional beamformer that can perform simultaneous denoising and dereverberation, and showed its superiority over the widely used cascade of a WPE dereverberation filter and a conventional MPDR beamformer.

Denoising

An Investigation into the Effectiveness of Enhancement in ASR Training and Test for CHiME-5 Dinner Party Transcription

1 code implementation26 Sep 2019 Catalin Zorila, Christoph Boeddeker, Rama Doddipatla, Reinhold Haeb-Umbach

Despite the strong modeling power of neural network acoustic models, speech enhancement has been shown to deliver additional word error rate improvements if multi-channel data is available.

Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.