Search Results for author: Thomas Hain

Found 51 papers, 7 papers with code

Data-selective Transfer Learning for Multi-Domain Speech Recognition

no code implementations8 Sep 2015 Mortaza Doulaty, Oscar Saz, Thomas Hain

Negative transfer in training of acoustic models for automatic speech recognition has been reported in several contexts such as domain change or speaker characteristics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

The USFD Spoken Language Translation System for IWSLT 2014

no code implementations13 Sep 2015 Raymond W. M. Ng, Mortaza Doulaty, Rama Doddipatla, Wilker Aziz, Kashif Shah, Oscar Saz, Madina Hasan, Ghada Alharbi, Lucia Specia, Thomas Hain

The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23. 45 and 14. 75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Latent Dirichlet Allocation Based Organisation of Broadcast Media Archives for Deep Neural Network Adaptation

no code implementations16 Nov 2015 Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain

This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition.

Acoustic Modelling Automatic Speech Recognition +2

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

no code implementations21 Dec 2015 Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W. M. Ng, Madina Hasan, Yu-Lan Liu, Thomas Hain

We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows.

Acoustic Modelling Automatic Speech Recognition +3

The OpenCourseWare Metadiscourse (OCWMD) Corpus

no code implementations LREC 2016 Ghada Alharbi, Thomas Hain

This study describes a new corpus of over 60, 000 hand-annotated metadiscourse acts from 106 OpenCourseWare lectures, from two different disciplines: Physics and Economics.

A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus

no code implementations LREC 2016 Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain

This paper introduces a new British English speech database, named the homeService corpus, which has been gathered as part of the homeService project.

Automatic Genre and Show Identification of Broadcast Media

no code implementations10 Jun 2016 Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain

Huge amounts of digital videos are being produced and broadcast every day, leading to giant media archives.

Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition

no code implementations2 Jul 2019 Mortaza, Doulaty, Thomas Hain

The proposed technique for training data selection, significantly outperforms random selection, posterior-based selection as well as using all of the available data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Improving Noise Robustness In Speaker Identification Using A Two-Stage Attention Model

no code implementations24 Sep 2019 Yanpei Shi, Qiang Huang, Thomas Hain

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments.

Speaker Identification Speaker Recognition

Contextual Joint Factor Acoustic Embeddings

no code implementations16 Oct 2019 Yanpei Shi, Thomas Hain

To evaluate the effectiveness of our approaches compared to prior work, two tasks are conducted -- phone classification and speaker recognition -- and test on different TIMIT data sets.

Classification General Classification +1

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

no code implementations17 Oct 2019 Yanpei Shi, Qiang Huang, Thomas Hain

In the proposed approach, frame-level encoder and attention are applied on segments of an input utterance and generate individual segment vectors.

Speaker Identification

Robust Speaker Recognition Using Speech Enhancement And Attention Model

no code implementations14 Jan 2020 Yanpei Shi, Qiang Huang, Thomas Hain

Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks.

Speaker Identification Speaker Recognition +1

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

no code implementations14 Jan 2020 Yanpei Shi, Thomas Hain

The proposed approach separates different speaker properties from a two-speaker signal in embedding space.

Speaker Identification Vocal Bursts Valence Prediction

Speaker Re-identification with Speaker Dependent Speech Enhancement

no code implementations15 May 2020 Yanpei Shi, Qiang Huang, Thomas Hain

The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.

Speaker Recognition Speech Enhancement

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

no code implementations15 May 2020 Yanpei Shi, Qiang Huang, Thomas Hain

To evaluate the effectiveness of the proposed approach, artificial datasets based on Switchboard Cellular part1 (SWBC) and Voxceleb1 are constructed in two conditions, where speakers' voices are overlapped and not overlapped.

Speaker Identification

Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models

no code implementations16 May 2020 Qiang Huang, Thomas Hain

The first task is to predict an utterance quality score, and the second is to identify where an anomalous distortion takes place in a recording.

Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization

1 code implementation22 Oct 2020 Mingjie Chen, Yanpei Shi, Thomas Hain

In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples.

Sound Audio and Speech Processing

T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model

no code implementations29 Oct 2020 Yanpei Shi, Mingjie Chen, Qiang Huang, Thomas Hain

The use of memory mechanism could reach 10. 6% and 7. 7% relative improvement compared with not using memory mechanism.

Speaker Identification

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

no code implementations29 Mar 2021 Cong-Thanh Do, Rama Doddipatla, Thomas Hain

In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

no code implementations23 Mar 2022 George Close, Thomas Hain, Stefan Goetze

Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results.

Speech Enhancement

Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution

1 code implementation31 Mar 2022 Mingjie Chen, Yanghao Zhou, Heyan Huang, Thomas Hain

It was shown recently that a combination of ASR and TTS models yield highly competitive performance on standard voice conversion tasks such as the Voice Conversion Challenge 2020 (VCC2020).

Voice Conversion

Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation

1 code implementation13 Apr 2022 William Ravenscroft, Stefan Goetze, Thomas Hain

A feature of TCNs is that they have a receptive field (RF) dependent on the specific model configuration which determines the number of input frames that can be observed to produce an individual output frame.

Speech Dereverberation

Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation

1 code implementation17 May 2022 William Ravenscroft, Stefan Goetze, Thomas Hain

It is shown that this weighted multi-dilation temporal convolutional network (WD-TCN) consistently outperforms the TCN across various model configurations and using the WD-TCN model is a more parameter efficient method to improve the performance of the model than increasing the number of convolutional blocks.

Speech Dereverberation

Insights on Neural Representations for End-to-End Speech Recognition

no code implementations19 May 2022 Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

This paper analyses and explores the internal dynamics between layers during training with CNN, LSTM and Transformer based approaches using Canonical correlation analysis (CCA) and centered kernel alignment (CKA) for the experiments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A cross-corpus study on speech emotion recognition

no code implementations5 Jul 2022 Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain

This shows positive information transfer from acted datasets to those with more natural emotions and the benefits from training on different corpora.

Cross-corpus Speech Emotion Recognition

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

no code implementations7 Jul 2022 Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain

A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language.

speech-recognition Speech Recognition +1

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

no code implementations7 Jul 2022 Muhammad Umar Farooq, Thomas Hain

This technique measures the similarities between posterior distributions from various monolingual acoustic models against a target speech signal.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Unsupervised data selection for Speech Recognition with contrastive loss ratios

no code implementations25 Jul 2022 Chanho Park, Rehan Ahmad, Thomas Hain

By using the submodular function, a training set for automatic speech recognition matching the target data set is selected.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

2 code implementations27 Oct 2022 William Ravenscroft, Stefan Goetze, Thomas Hain

In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation.

Speech Dereverberation Speech Separation

Dynamic Kernels and Channel Attention for Low Resource Speaker Verification

no code implementations3 Nov 2022 Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

Instead, this paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network to adapt the model parameters to be feature-conditioned.

Speaker Verification Speech Enhancement

Probing Statistical Representations For End-To-End ASR

no code implementations3 Nov 2022 Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

End-to-End automatic speech recognition (ASR) models aim to learn a generalised speech representation to perform recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Perceive and predict: self-supervised speech representation based loss functions for speech enhancement

no code implementations11 Jan 2023 George Close, William Ravenscroft, Thomas Hain, Stefan Goetze

Recent work in the domain of speech enhancement has explored the use of self-supervised speech representations to aid in the training of neural speech enhancement models.

Speech Enhancement

On Data Sampling Strategies for Training Neural Network Speech Separation Models

no code implementations14 Apr 2023 William Ravenscroft, Stefan Goetze, Thomas Hain

In this work, the impact of applying these training signal length (TSL) limits is analysed for two speech separation models: SepFormer, a transformer model, and Conv-TasNet, a convolutional model.

Speech Separation

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

no code implementations14 Jun 2023 Muhammad Umar Farooq, Thomas Hain

The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model.

Data Augmentation speech-recognition +2

Empirical Interpretation of the Relationship Between Speech Acoustic Context and Emotion Recognition

no code implementations30 Jun 2023 Anna Ollerenshaw, Md Asif Jalal, Rosanna Milner, Thomas Hain

The benefits of using a distributed approach to speech emotion understanding are supported by the results of cross-corpora analysis experiments.

Emotional Intelligence Speech Emotion Recognition

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations

no code implementations25 Jul 2023 George Close, Thomas Hain, Stefan Goetze

Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e. g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing.

Speech Enhancement

The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions

1 code implementation27 Jul 2023 George Close, Thomas Hain, Stefan Goetze

In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations.

Speech Enhancement

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

1 code implementation9 Oct 2023 William Ravenscroft, Stefan Goetze, Thomas Hain

Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation.

Computational Efficiency Speech Separation

MUST: A Multilingual Student-Teacher Learning approach for low-resource speech recognition

no code implementations29 Oct 2023 Muhammad Umar Farooq, Rehan Ahmad, Thomas Hain

However, a limitation of KD training is that the student model classes must be a proper or improper subset of the teacher model classes.

Knowledge Distillation speech-recognition +1

Progressive unsupervised domain adaptation for ASR using ensemble models and multi-stage training

no code implementations7 Feb 2024 Rehan Ahmad, Muhammad Umar Farooq, Thomas Hain

A previous study has shown the effectiveness of using ensemble teacher models in T/S training for unsupervised domain adaptation (UDA) but its performance still lags behind compared to the model trained on in-domain data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations

no code implementations10 Mar 2024 Amit Meghanani, Thomas Hain

These task-specific representations are used for robust performance on various downstream tasks by fine-tuning on the labelled data.

Automatic Speech Recognition Data Augmentation +3

Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations

no code implementations13 Mar 2024 Amit Meghanani, Thomas Hain

HuBERT-based CAE model achieves the best results for word discrimination in all languages, despite Hu-BERT being pre-trained on English only.

Self-Supervised Learning Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.