no code implementations • LREC 2014 • Luca Cristoforetti, Mirco Ravanelli, Maurizio Omologo, Aless Sosi, ro, Alberto Abad, Martin Hagmueller, Petros Maragos
This paper describes a multi-microphone multi-language acoustic corpus being developed under the EC project Distant-speech Interaction for Robust Home Applications (DIRHA).
no code implementations • 23 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.
no code implementations • 24 Mar 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
Improving distant speech recognition is a crucial step towards flexible human-machine interfaces.
1 code implementation • 29 Sep 2017 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.
2 code implementations • 6 Oct 2017 • Mirco Ravanelli, Maurizio Omologo
This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project.
1 code implementation • 10 Oct 2017 • Mirco Ravanelli, Maurizio Omologo
Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition.
1 code implementation • 26 Nov 2017 • Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo
The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology.
Audio and Speech Processing Sound
1 code implementation • 26 Mar 2018 • Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio
A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR).
Ranked #6 on Speech Recognition on TIMIT
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 26 May 2018 • Mirco Ravanelli, Maurizio Omologo
Distant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems.
no code implementations • 30 Sep 2019 • Maarten Van Segbroeck, Ahmed Zaid, Ksenia Kutsenko, Cirenia Huerta, Tinh Nguyen, Xuewen Luo, Björn Hoffmeister, Jan Trmal, Maurizio Omologo, Roland Maas
We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment.
1 code implementation • 15 Nov 2019 • Tina Raissi, Santiago Pascual, Maurizio Omologo
The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step.
Sound Audio and Speech Processing I.2.7
no code implementations • 30 Aug 2021 • Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo
In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition.
no code implementations • 5 Nov 2021 • Feng-Ju Chang, Jing Liu, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, Ariya Rastrow, Siegfried Kunzmann
We also leverage both BLSTM and pretrained BERT based models to encode contextual data and guide the network training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 May 2022 • Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo
Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems.
no code implementations • 1 Mar 2023 • Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas
We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion.