Search Results for author: Thilo von Neumann

Found 10 papers, 4 papers with code

MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

1 code implementation23 Sep 2022 Tobias Cord-Landwehr, Thilo von Neumann, Christoph Boeddeker, Reinhold Haeb-Umbach

Training and evaluation of these single tasks requires synthetic data with access to intermediate signals that is as close as possible to the evaluation scenario.

Speech Enhancement

Utterance-by-utterance overlap-aware neural diarization with Graph-PIT

1 code implementation28 Jul 2022 Keisuke Kinoshita, Thilo von Neumann, Marc Delcroix, Christoph Boeddeker, Reinhold Haeb-Umbach

In this paper, we argue that such an approach involving the segmentation has several issues; for example, it inevitably faces a dilemma that larger segment sizes increase both the context available for enhancing the performance and the number of speakers for the local EEND module to handle.

speaker-diarization Speaker Diarization

A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network

no code implementations2 May 2022 Tobias Gburrek, Christoph Boeddeker, Thilo von Neumann, Tobias Cord-Landwehr, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

We propose a system that transcribes the conversation of a typical meeting scenario that is captured by a set of initially unsynchronized microphone arrays at unknown positions.

Automatic Speech Recognition Speech Enhancement +1

Monaural source separation: From anechoic to reverberant environments

no code implementations15 Nov 2021 Tobias Cord-Landwehr, Christoph Boeddeker, Thilo von Neumann, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach

Impressive progress in neural network-based single-channel speech source separation has been made in recent years.

SA-SDR: A novel loss function for separation of meeting style data

no code implementations29 Oct 2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

Many state-of-the-art neural network-based source separation systems use the averaged Signal-to-Distortion Ratio (SDR) as a training objective function.

Speeding Up Permutation Invariant Training for Source Separation

1 code implementation30 Jul 2021 Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach

The Hungarian algorithm can be used for uPIT and we introduce various algorithms for the Graph-PIT assignment problem to reduce the complexity to be polynomial in the number of utterances.

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

1 code implementation30 Jul 2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

When processing meeting-like data in a segment-wise manner, i. e., by separating overlapping segments independently and stitching adjacent segments to continuous output streams, this constraint has to be fulfilled for any segment.

Speech Separation

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

no code implementations4 Jun 2020 Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown.

Automatic Speech Recognition Speech Extraction +1

End-to-end training of time domain audio separation and recognition

no code implementations18 Dec 2019 Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition.

Speaker Recognition speech-recognition +2

All-neural online source separation, counting, and diarization for meeting analysis

no code implementations21 Feb 2019 Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach

While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.

Automatic Speech Recognition speaker-diarization +2

Cannot find the paper you are looking for? You can Submit a new open access paper.