no code implementations • JEPTALNRECITAL 2012 • Gw{\'e}nol{\'e} Lecorv{\'e}, John Dines, Thomas Hain, Petr Motlicek
no code implementations • 8 Sep 2015 • Mortaza Doulaty, Oscar Saz, Thomas Hain
Hence it is often not evident if data should be considered to be out-of-domain.
no code implementations • 8 Sep 2015 • Mortaza Doulaty, Oscar Saz, Thomas Hain
Negative transfer in training of acoustic models for automatic speech recognition has been reported in several contexts such as domain change or speaker characteristics.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Sep 2015 • Raymond W. M. Ng, Mortaza Doulaty, Rama Doddipatla, Wilker Aziz, Kashif Shah, Oscar Saz, Madina Hasan, Ghada Alharbi, Lucia Specia, Thomas Hain
The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23. 45 and 14. 75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 16 Nov 2015 • Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain
This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition.
no code implementations • 21 Dec 2015 • Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W. M. Ng, Madina Hasan, Yu-Lan Liu, Thomas Hain
We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows.
no code implementations • LREC 2016 • Ghada Alharbi, Thomas Hain
This study describes a new corpus of over 60, 000 hand-annotated metadiscourse acts from 106 OpenCourseWare lectures, from two different disciplines: Physics and Economics.
no code implementations • LREC 2016 • Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain
This paper introduces a new British English speech database, named the homeService corpus, which has been gathered as part of the homeService project.
no code implementations • 10 Jun 2016 • Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain
Huge amounts of digital videos are being produced and broadcast every day, leading to giant media archives.
no code implementations • 2 Jul 2019 • Mortaza, Doulaty, Thomas Hain
The proposed technique for training data selection, significantly outperforms random selection, posterior-based selection as well as using all of the available data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 24 Sep 2019 • Yanpei Shi, Qiang Huang, Thomas Hain
While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments.
no code implementations • 16 Oct 2019 • Yanpei Shi, Thomas Hain
To evaluate the effectiveness of our approaches compared to prior work, two tasks are conducted -- phone classification and speaker recognition -- and test on different TIMIT data sets.
no code implementations • 17 Oct 2019 • Yanpei Shi, Qiang Huang, Thomas Hain
In the proposed approach, frame-level encoder and attention are applied on segments of an input utterance and generate individual segment vectors.
no code implementations • 14 Jan 2020 • Yanpei Shi, Qiang Huang, Thomas Hain
Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks.
no code implementations • 14 Jan 2020 • Yanpei Shi, Thomas Hain
The proposed approach separates different speaker properties from a two-speaker signal in embedding space.
no code implementations • 15 May 2020 • Yanpei Shi, Qiang Huang, Thomas Hain
The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.
no code implementations • 15 May 2020 • Yanpei Shi, Qiang Huang, Thomas Hain
To evaluate the effectiveness of the proposed approach, artificial datasets based on Switchboard Cellular part1 (SWBC) and Voxceleb1 are constructed in two conditions, where speakers' voices are overlapped and not overlapped.
no code implementations • 16 May 2020 • Qiang Huang, Thomas Hain
The first task is to predict an utterance quality score, and the second is to identify where an anomalous distortion takes place in a recording.
1 code implementation • 22 Oct 2020 • Mingjie Chen, Yanpei Shi, Thomas Hain
In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples.
Sound Audio and Speech Processing
no code implementations • 29 Oct 2020 • Yanpei Shi, Mingjie Chen, Qiang Huang, Thomas Hain
The use of memory mechanism could reach 10. 6% and 7. 7% relative improvement compared with not using memory mechanism.
no code implementations • 29 Mar 2021 • Cong-Thanh Do, Rama Doddipatla, Thomas Hain
In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 Mar 2022 • George Close, Thomas Hain, Stefan Goetze
Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results.
1 code implementation • 31 Mar 2022 • Mingjie Chen, Yanghao Zhou, Heyan Huang, Thomas Hain
It was shown recently that a combination of ASR and TTS models yield highly competitive performance on standard voice conversion tasks such as the Voice Conversion Challenge 2020 (VCC2020).
1 code implementation • 13 Apr 2022 • William Ravenscroft, Stefan Goetze, Thomas Hain
A feature of TCNs is that they have a receptive field (RF) dependent on the specific model configuration which determines the number of input frames that can be observed to produce an individual output frame.
Ranked #1 on Speech Dereverberation on WHAMR_ext
1 code implementation • 17 May 2022 • William Ravenscroft, Stefan Goetze, Thomas Hain
It is shown that this weighted multi-dilation temporal convolutional network (WD-TCN) consistently outperforms the TCN across various model configurations and using the WD-TCN model is a more parameter efficient method to improve the performance of the model than increasing the number of convolutional blocks.
Ranked #1 on Speech Dereverberation on WHAMR!
no code implementations • 19 May 2022 • Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
This paper analyses and explores the internal dynamics between layers during training with CNN, LSTM and Transformer based approaches using Canonical correlation analysis (CCA) and centered kernel alignment (CKA) for the experiments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Jul 2022 • Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain
This shows positive information transfer from acted datasets to those with more natural emotions and the benefits from training on different corpora.
no code implementations • 7 Jul 2022 • Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain
A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language.
no code implementations • 7 Jul 2022 • Muhammad Umar Farooq, Thomas Hain
This technique measures the similarities between posterior distributions from various monolingual acoustic models against a target speech signal.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Jul 2022 • Chanho Park, Rehan Ahmad, Thomas Hain
By using the submodular function, a training set for automatic speech recognition matching the target data set is selected.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 27 Oct 2022 • William Ravenscroft, Stefan Goetze, Thomas Hain
In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation.
Ranked #12 on Speech Separation on WHAMR!
no code implementations • 3 Nov 2022 • Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
Instead, this paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network to adapt the model parameters to be feature-conditioned.
no code implementations • 3 Nov 2022 • Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
End-to-End automatic speech recognition (ASR) models aim to learn a generalised speech representation to perform recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Jan 2023 • George Close, William Ravenscroft, Thomas Hain, Stefan Goetze
Recent work in the domain of speech enhancement has explored the use of self-supervised speech representations to aid in the training of neural speech enhancement models.
no code implementations • 1 Mar 2023 • Rehan Ahmad, Md Asif Jalal, Muhammad Umar Farooq, Anna Ollerenshaw, Thomas Hain
Knowledge distillation has widely been used for model compression and domain adaptation for speech applications.
no code implementations • 14 Apr 2023 • William Ravenscroft, Stefan Goetze, Thomas Hain
In this work, the impact of applying these training signal length (TSL) limits is analysed for two speech separation models: SepFormer, a transformer model, and Conv-TasNet, a convolutional model.
no code implementations • 14 Jun 2023 • Muhammad Umar Farooq, Thomas Hain
The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model.
no code implementations • 30 Jun 2023 • Anna Ollerenshaw, Md Asif Jalal, Rosanna Milner, Thomas Hain
The benefits of using a distributed approach to speech emotion understanding are supported by the results of cross-corpora analysis experiments.
no code implementations • 25 Jul 2023 • George Close, Thomas Hain, Stefan Goetze
Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e. g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing.
1 code implementation • 27 Jul 2023 • George Close, Thomas Hain, Stefan Goetze
In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations.
1 code implementation • 9 Oct 2023 • William Ravenscroft, Stefan Goetze, Thomas Hain
Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation.
Ranked #3 on Speech Separation on WHAMR!
no code implementations • 12 Oct 2023 • Chanho Park, Chengsong Lu, Mingjie Chen, Thomas Hain
WER estimation is a task aiming to predict the WER of an ASR system, given a speech utterance and a transcription.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 29 Oct 2023 • Muhammad Umar Farooq, Rehan Ahmad, Thomas Hain
However, a limitation of KD training is that the student model classes must be a proper or improper subset of the teacher model classes.
no code implementations • 24 Jan 2024 • Rhiannon Mogridge, George Close, Robert Sutherland, Thomas Hain, Jon Barker, Stefan Goetze, Anton Ragni
Neural networks have been successfully used for non-intrusive speech intelligibility prediction.
no code implementations • 7 Feb 2024 • Rehan Ahmad, Muhammad Umar Farooq, Thomas Hain
A previous study has shown the effectiveness of using ensemble teacher models in T/S training for unsupervised domain adaptation (UDA) but its performance still lags behind compared to the model trained on in-domain data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 10 Mar 2024 • Amit Meghanani, Thomas Hain
These task-specific representations are used for robust performance on various downstream tasks by fine-tuning on the labelled data.
no code implementations • 13 Mar 2024 • Amit Meghanani, Thomas Hain
HuBERT-based CAE model achieves the best results for word discrimination in all languages, despite Hu-BERT being pre-trained on English only.