no code implementations • 1 Nov 2024 • Nikolaos Flemotomos, Roger Hsiao, Pawel Swietojanski, Takaaki Hori, Dogan Can, Xiaodan Zhuang
However, the biasing mechanism is typically based on a cross-attention module between the audio and a catalogue of biasing entries, which means computational complexity can pose severe practical limitations on the size of the biasing catalogue and consequently on accuracy improvements.
no code implementations • 18 Apr 2023 • Maurits Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang
By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases.
no code implementations • 2 Nov 2022 • Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang
This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios.
no code implementations • 21 Oct 2022 • Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva, Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott, Dogan Can, Pawel Swietojanski, Lyan Verwimp, Sibel Oyman, Tresi Arvizo, Honza Silovsky, Arnab Ghoshal, Mathieu Martel, Bharat Ram Ambati, Mohamed Ali
Code-switching describes the practice of using more than one language in the same sentence.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • EMNLP 2020 • Emanuele Bastianelli, Andrea Vanzo, Pawel Swietojanski, Verena Rieser
Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications.
Ranked #3 on
Slot Filling
on SLURP
(using extra training data)
1 code implementation • 14 Aug 2020 • Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski
We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.
1 code implementation • 25 Jan 2020 • Mirco Ravanelli, Jianyuan Zhong, Santiago Pascual, Pawel Swietojanski, Joao Monteiro, Jan Trmal, Yoshua Bengio
We then propose a revised encoder that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks.
no code implementations • 25 Sep 2019 • Ying Da Wang, Pawel Swietojanski, Ryan T Armstrong, Peyman Mostaghimi
We find GLCM-based loss to result in images with higher pixelwise accuracy and better perceptual scores.
9 code implementations • 13 Mar 2019 • Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser
We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer.
no code implementations • CONLL 2018 • Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, Ondrej Miksik
Natural human communication is nuanced and inherently multi-modal.
no code implementations • 31 Mar 2016 • Pawel Swietojanski, Steve Renals
We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators.
no code implementations • 12 Jan 2016 • Pawel Swietojanski, Jinyu Li, Steve Renals
This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data.