no code implementations • 14 Jan 2024 • Sergio Duarte-Torres, Arunasish Sen, Aman Rana, Lukas Drude, Alejandro Gomez-Alanis, Andreas Schwarz, Leif Rädel, Volker Leutnant
Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 12 Jun 2023 • Belen Alastruey, Lukas Drude, Jahn Heymann, Simon Wiesler
Convolutional frontends are a typical choice for Transformer-based automatic speech recognition to preprocess the spectrogram, reduce its sequence length, and combine local information in time and frequency similarly.
no code implementations • 27 Oct 2022 • Alejandro Gomez-Alanis, Lukas Drude, Andreas Schwarz, Rupak Vignesh Swaminathan, Simon Wiesler
Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Jun 2021 • Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin
Automatic speech recognition (ASR) in the cloud allows the use of larger models and more powerful multi-channel signal processing front-ends compared to on-device processing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 4 Jun 2020 • Thilo von Neumann, Christoph Boeddeker, Lukas Drude, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 Dec 2019 • Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-Umbach
The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition.
no code implementations • 20 Nov 2019 • Jens Heitkaemper, Darius Jakobeit, Christoph Boeddeker, Lukas Drude, Reinhold Haeb-Umbach
In recent years time domain speech separation has excelled over frequency domain separation in single channel scenarios and noise-free environments.
3 code implementations • 30 Oct 2019 • Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold Haeb-Umbach
We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal.
no code implementations • 2 Apr 2019 • Lukas Drude, Daniel Hasenklever, Reinhold Haeb-Umbach
We propose a training scheme to train neural network-based source separation algorithms from scratch when parallel clean data is unavailable.
no code implementations • 2 Apr 2019 • Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach
In contrast to previous work on unsupervised training of neural mask estimators, our approach avoids the need for a possibly pre-trained teacher model entirely.
no code implementations • 28 Dec 2017 • Gerhard Kurz, Igor Gilitschenski, Florian Pfaff, Lukas Drude, Uwe D. Hanebeck, Reinhold Haeb-Umbach, Roland Y. Siegwart
In this paper, we present libDirectional, a MATLAB library for directional statistics and directional estimation.
1 code implementation • ICASSP 2016 • Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach
The network training is independent of the number and the geometric configuration of the microphones.