Search Results for author: Marc Delcroix

Found 29 papers, 9 papers with code

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

no code implementations11 Jan 2022 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Naoyuki Kamo, Takafumi Moriya

To mitigate the degradation, we introduced a rule-based method to switch the ASR input between the enhanced and observed signals, which showed promising results.

Speech Enhancement Speech Recognition

Attention-based Multi-hypothesis Fusion for Speech Summarization

2 code implementations16 Nov 2021 Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe

We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.

Speech Recognition Text Summarization

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

no code implementations31 Oct 2021 Martin Kocour, Kateřina Žmolíková, Lucas Ondel, Ján Švec, Marc Delcroix, Tsubasa Ochiai, Lukáš Burget, Jan Černocký

We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers.

Speech Recognition

SA-SDR: A novel loss function for separation of meeting style data

no code implementations29 Oct 2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

Many state-of-the-art neural network-based source separation systems use the averaged Signal-to-Distortion Ratio (SDR) as a training objective function.

Speeding Up Permutation Invariant Training for Source Separation

1 code implementation30 Jul 2021 Thilo von Neumann, Christoph Boeddeker, Keisuke Kinoshita, Marc Delcroix, Reinhold Haeb-Umbach

The Hungarian algorithm can be used for uPIT and we introduce various algorithms for the Graph-PIT assignment problem to reduce the complexity to be polynomial in the number of utterances.

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

1 code implementation30 Jul 2021 Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

When processing meeting-like data in a segment-wise manner, i. e., by separating overlapping segments independently and stitching adjacent segments to continuous output streams, this constraint has to be fulfilled for any segment.

Speech Separation

Few-shot learning of new sound classes for target sound extraction

no code implementations14 Jun 2021 Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Shoko Araki

Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds.

Few-Shot Learning

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

1 code implementation7 Jun 2021 Christopher Schymura, Benedikt Bönninghoff, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e. g. a microphone array).

Event Detection

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

no code implementations2 Jun 2021 Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoyuki Kamo

', we analyze ASR performance on observed and enhanced speech at various noise and interference conditions, and show that speech enhancement degrades ASR under some conditions even for overlapping speech.

Speech Enhancement Speech Recognition +1

Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech

1 code implementation19 May 2021 Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

This paper is to (1) report recent advances we made to this framework, including newly introduced robust constrained clustering algorithms, and (2) experimentally show that the method can now significantly outperform competitive diarization methods such as Encoder-Decoder Attractor (EDA)-EEND, on CALLHOME data which comprises real conversational speech data including overlapped speech and an arbitrary number of speakers.

Speaker Diarization

Exploiting Attention-based Sequence-to-Sequence Architectures for Sound Event Localization

1 code implementation28 Feb 2021 Christopher Schymura, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Dorothea Kolossa

Herein, attentions allow for capturing temporal dependencies in the audio signal by focusing on specific frames that are relevant for estimating the activity and direction-of-arrival of sound events at the current time-step.

Speech Recognition

Multimodal Attention Fusion for Target Speaker Extraction

no code implementations2 Feb 2021 Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues.

Speaker activity driven neural speech extraction

no code implementations14 Jan 2021 Marc Delcroix, Katerina Zmolikova, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani

Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest.

Speech Extraction

Neural Network-based Virtual Microphone Estimator

no code implementations12 Jan 2021 Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Shoko Araki

Developing microphone array technologies for a small number of microphones is important due to the constraints of many devices.

Speech Enhancement

Integration of variational autoencoder and spatial clustering for adaptive multi-channel neural speech separation

1 code implementation24 Nov 2020 Katerina Zmolikova, Marc Delcroix, Lukáš Burget, Tomohiro Nakatani, Jan "Honza" Černocký

In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation.

Audio and Speech Processing

Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds

no code implementations26 Oct 2020 Keisuke Kinoshita, Marc Delcroix, Naohiro Tawara

In this paper, we propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.

Listen to What You Want: Neural Network-based Universal Sound Selector

no code implementations10 Jun 2020 Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki

In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes.

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

no code implementations4 Jun 2020 Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown.

Speech Extraction Speech Recognition

Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system

no code implementations9 Mar 2020 Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani

Automatic meeting analysis is an essential fundamental technology required to let, e. g. smart devices follow and respond to our conversations.

Speaker Diarization Speech Enhancement

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

no code implementations14 Feb 2020 Yuma Koizumi, Kohei Yatabe, Marc Delcroix, Yoshiki Masuyama, Daiki Takeuchi

This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance.

Multi-Task Learning Speaker Identification +2

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

1 code implementation23 Jan 2020 Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, Keisuke Kinoshita, Naohiro Tawara, Tomohiro Nakatani, Shoko Araki

First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.

Speaker Identification Speech Extraction

End-to-end training of time domain audio separation and recognition

no code implementations18 Dec 2019 Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition.

Speaker Recognition Speech Recognition +1

All-neural online source separation, counting, and diarization for meeting analysis

no code implementations21 Feb 2019 Thilo von Neumann, Keisuke Kinoshita, Marc Delcroix, Shoko Araki, Tomohiro Nakatani, Reinhold Haeb-Umbach

While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation.

Speaker Diarization Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.