Search Results for author: Lukas Drude

Found 12 papers, 2 papers with code

Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition

no code implementations12 Jun 2023 Belen Alastruey, Lukas Drude, Jahn Heymann, Simon Wiesler

Convolutional frontends are a typical choice for Transformer-based automatic speech recognition to preprocess the spectrogram, reduce its sequence length, and combine local information in time and frequency similarly.

Automatic Speech Recognition speech-recognition +1

Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

no code implementations15 Jun 2021 Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin

Automatic speech recognition (ASR) in the cloud allows the use of larger models and more powerful multi-channel signal processing front-ends compared to on-device processing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

no code implementations4 Jun 2020 Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

End-to-end training of time domain audio separation and recognition

no code implementations18 Dec 2019 Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach

The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition.

Speaker Recognition speech-recognition +2

Demystifying TasNet: A Dissecting Approach

no code implementations20 Nov 2019 Jens Heitkaemper, Darius Jakobeit, Christoph Boeddeker, Lukas Drude, Reinhold Haeb-Umbach

In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments.

Speech Separation

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

3 code implementations30 Oct 2019 Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold Haeb-Umbach

We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal.

Position

Unsupervised training of a deep clustering model for multichannel blind source separation

no code implementations2 Apr 2019 Lukas Drude, Daniel Hasenklever, Reinhold Haeb-Umbach

We propose a training scheme to train neural network-based source separation algorithms from scratch when parallel clean data is unavailable.

blind source separation Clustering +2

Unsupervised training of neural mask-based beamforming

no code implementations2 Apr 2019 Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach

In contrast to previous work on unsupervised training of neural mask estimators, our approach avoids the need for a possibly pre-trained teacher model entirely.

speech-recognition Speech Recognition

Directional Statistics and Filtering Using libDirectional

no code implementations28 Dec 2017 Gerhard Kurz, Igor Gilitschenski, Florian Pfaff, Lukas Drude, Uwe D. Hanebeck, Reinhold Haeb-Umbach, Roland Y. Siegwart

In this paper, we present libDirectional, a MATLAB library for directional statistics and directional estimation.

Cannot find the paper you are looking for? You can Submit a new open access paper.