no code implementations • 15 Dec 2021 • Christoph Minixhofer, Ondřej Klejch, Peter Bell
In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows.
no code implementations • 12 Nov 2021 • Ondrej Klejch, Electra Wallington, Peter Bell
We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question.
no code implementations • 29 Oct 2021 • Yuanchao Li, Peter Bell, Catherine Lai
However, due to the scarcity of emotion labelled data and the difficulty of recognizing emotional speech, it is hard to obtain reliable linguistic features and models in this research area.
no code implementations • 1 May 2021 • Sarenne Wallbridge, Peter Bell, Catherine Lai
People convey information extremely effectively through spoken interaction using multiple channels of information transmission: the lexical channel of what is said, and the non-lexical channel of how it is said.
no code implementations • EACL 2021 • David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuščáková, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown
Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation.
no code implementations • 9 Feb 2021 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.
1 code implementation • 10 Dec 2020 • Prathmesh Madhu, Angel Villar-Corrales, Ronak Kosti, Torsten Bendschus, Corinna Reinhardt, Peter Bell, Andreas Maier, Vincent Christlein
(2) To improve the already strong results further, we created a small dataset (ClassArch) consisting of ancient Greek vase paintings from the 6-5th century BCE with person and pose annotations.
no code implementations • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results.
1 code implementation • 8 Nov 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
To the best of our knowledge, we have achieved state-of-the-art end-to-end Transformer based model performance on Switchboard and AMI.
1 code implementation • 27 Oct 2020 • Chau Luu, Peter Bell, Steve Renals
On a test set of US Supreme Court recordings, we show that by leveraging two additional forms of speaker attribute information derived respectively from the matched training data, and VoxCeleb corpus, we improve the performance of our deep speaker embeddings for both verification and diarization tasks, achieving a relative improvement of 26. 2% in DER and 6. 7% in EER compared to baselines using speaker labels only.
no code implementations • 19 Oct 2020 • David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathleen McKeown
In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation.
1 code implementation • 8 Sep 2020 • Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Peter Bell, Andreas Maier, Vincent Christlein
These compositions are useful in analyzing the interactions in an image to study artists and their artworks.
1 code implementation • 14 Aug 2020 • Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski
We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.
no code implementations • 28 May 2020 • Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition.
no code implementations • LREC 2020 • David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathy Mckeown
In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation.
1 code implementation • 31 Mar 2020 • Prathmesh Madhu, Ronak Kosti, Lara Mührenberg, Peter Bell, Andreas Maier, Vincent Christlein
We present experiments and analysis on three different models and show that the model trained on domain related data gives the best performance for recognizing character.
1 code implementation • 2 Feb 2020 • Chau Luu, Peter Bell, Steve Renals
The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks.
no code implementations • 31 Oct 2019 • Joanna Rownicka, Peter Bell, Steve Renals
We propose a multi-scale octave convolution layer to learn robust speech representations efficiently.
no code implementations • 25 Oct 2019 • Chau Luu, Peter Bell, Steve Renals
Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong.
1 code implementation • 23 Oct 2019 • Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals
Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions.
no code implementations • 30 Sep 2019 • Joanna Rownicka, Peter Bell, Steve Renals
In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker.
1 code implementation • 30 Sep 2019 • Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals
Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features.
no code implementations • 25 Sep 2019 • Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals
Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.
no code implementations • 27 Jun 2019 • Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals
Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions.
no code implementations • 30 May 2019 • Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell
This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model.
no code implementations • 12 Nov 2018 • Joanna Rownicka, Peter Bell, Steve Renals
We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation.
no code implementations • 8 Nov 2018 • Bertrand Higy, Peter Bell
End-to-end approaches have recently become popular as a means of simplifying the training and deployment of speech recognition systems.
1 code implementation • 30 Aug 2018 • Ondřej Klejch, Joachim Fainberg, Peter Bell
The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers.
no code implementations • 21 Sep 2017 • Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals
We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input.
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
no code implementations • 19 Sep 2016 • Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang
For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.
1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.