Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems.
We also leverage both BLSTM and pretrained BERT based models to encode contextual data and guide the network training.
In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition.
The candidate time windows are selected from a set of large time intervals, possibly including a sample drop, and by using a preprocessing step.
Sound Audio and Speech Processing I.2.7
We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment.
Distant speech recognition is being revolutionized by deep learning, that has contributed to significantly outperform previous HMM-GMM systems.
A field that has directly benefited from the recent advances in deep learning is Automatic Speech Recognition (ASR).
Ranked #6 on Speech Recognition on TIMIT
The availability of realistic simulated corpora is of key importance for the future progress of distant speech recognition technology.
Audio and Speech Processing Sound
Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition.
This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project.
First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture.
Improving distant speech recognition is a crucial step towards flexible human-machine interfaces.
Despite the remarkable progress recently made in distant speech recognition, state-of-the-art technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by non-stationary noises and reverberation are met.
This paper describes a multi-microphone multi-language acoustic corpus being developed under the EC project Distant-speech Interaction for Robust Home Applications (DIRHA).