no code implementations • 22 Jun 2022 • Felix Weninger, Marco Gaudesi, Md Akmal Haidar, Nicola Ferri, Jesús Andrés-Ferrer, Puming Zhan
In the dual-mode Conformer Transducer model, layers can function in online or offline mode while sharing parameters, and in-place knowledge distillation from offline to online mode is applied in training to improve online accuracy.
no code implementations • 23 Sep 2021 • Marco Gaudesi, Felix Weninger, Dushyant Sharma, Puming Zhan
End-to-end (E2E) multi-channel ASR systems show state-of-the-art performance in far-field ASR tasks by joint training of a multi-channel front-end along with the ASR model.
no code implementations • 17 Sep 2021 • Felix Weninger, Marco Gaudesi, Ralf Leibold, Roberto Gemello, Puming Zhan
We use a single-channel encoder for CT speech and a multi-channel encoder with Spatial Filtering neural beamforming for FT speech, which are jointly trained with the encoder selection.
no code implementations • 20 Aug 2020 • Huili Chen, Yue Zhang, Felix Weninger, Rosalind Picard, Cynthia Breazeal, Hae Won Park
Automatic speech-based affect recognition of individuals in dyadic conversation is a challenging task, in part because of its heavy reliance on manual pre-processing.
no code implementations • 27 Jul 2020 • Felix Weninger, Franco Mana, Roberto Gemello, Jesús Andrés-Ferrer, Puming Zhan
In the result, the Noisy Student algorithm with soft labels and consistency regularization achieves 10. 4% word error rate (WER) reduction when adding 475h of unlabeled data, corresponding to a recovery rate of 92%.
1 code implementation • 27 Jul 2020 • Felix Weninger, Yue Zhang, Rosalind W. Picard
A common problem in machine learning is to deal with datasets with disjoint label spaces and missing labels.
no code implementations • 8 Jul 2019 • Felix Weninger, Jesús Andrés-Ferrer, Xinwei Li, Puming Zhan
Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the-art performances while having clear advantages in terms of simplicity.
no code implementations • 15 Dec 2014 • Felix Weninger, Björn Schuller, Florian Eyben, Martin Wöllmer, Gerhard Rigoll
Transcription of broadcast news is an interesting and challenging application for large-vocabulary continuous speech recognition (LVCSR).
no code implementations • 9 Sep 2014 • John R. Hershey, Jonathan Le Roux, Felix Weninger
Deep unfolding of this model yields a new kind of non-negative deep neural network, that can be trained using a multiplicative backpropagation-style update algorithm.