Search Results for author: Thomas Hain

Found 34 papers, 4 papers with code

Unsupervised data selection for Speech Recognition with contrastive loss ratios

no code implementations25 Jul 2022 Chanho Park, Rehan Ahmad, Thomas Hain

By using the submodular function, a training set for automatic speech recognition matching the target data set is selected.

Automatic Speech Recognition speech-recognition

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

no code implementations7 Jul 2022 Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain

A separate regression neural network is trained for each source-target language pair to transform posteriors from source acoustic model to the target language.

speech-recognition Speech Recognition +1

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

no code implementations7 Jul 2022 Muhammad Umar Farooq, Thomas Hain

This technique measures the similarities between posterior distributions from various monolingual acoustic models against a target speech signal.

Automatic Speech Recognition Cross-Lingual Transfer +1

A cross-corpus study on speech emotion recognition

no code implementations5 Jul 2022 Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain

This shows positive information transfer from acted datasets to those with more natural emotions and the benefits from training on different corpora.

Speech Emotion Recognition

Insights on Neural Representations for End-to-End Speech Recognition

no code implementations19 May 2022 Anna Ollerenshaw, Md Asif Jalal, Thomas Hain

This paper analyses and explores the internal dynamics between layers during training with CNN, LSTM and Transformer based approaches using Canonical correlation analysis (CCA) and centered kernel alignment (CKA) for the experiments.

Automatic Speech Recognition speech-recognition

Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation

1 code implementation17 May 2022 William Ravenscroft, Stefan Goetze, Thomas Hain

It is shown that this weighted multi-dilation temporal convolutional network (WD-TCN) consistently outperforms the TCN across various model configurations and using the WD-TCN model is a more parameter efficient method to improve the performance of the model than increasing the number of convolutional blocks.

Speech Dereverberation

Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation

1 code implementation13 Apr 2022 William Ravenscroft, Stefan Goetze, Thomas Hain

A feature of TCNs is that they have a receptive field (RF) dependent on the specific model configuration which determines the number of input frames that can be observed to produce an individual output frame.

Speech Dereverberation

Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution

1 code implementation31 Mar 2022 Mingjie Chen, Yanghao Zhou, Heyan Huang, Thomas Hain

It was shown recently that a combination of ASR and TTS models yield highly competitive performance on standard voice conversion tasks such as the Voice Conversion Challenge 2020 (VCC2020).

Voice Conversion

MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data

no code implementations23 Mar 2022 George Close, Thomas Hain, Stefan Goetze

Training of speech enhancement systems often does not incorporate knowledge of human perception and thus can lead to unnatural sounding results.

Speech Enhancement

Multiple-hypothesis CTC-based semi-supervised adaptation of end-to-end speech recognition

no code implementations29 Mar 2021 Cong-Thanh Do, Rama Doddipatla, Thomas Hain

In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function.

Automatic Speech Recognition speech-recognition

T-vectors: Weakly Supervised Speaker Identification Using Hierarchical Transformer Model

no code implementations29 Oct 2020 Yanpei Shi, Mingjie Chen, Qiang Huang, Thomas Hain

The use of memory mechanism could reach 10. 6% and 7. 7% relative improvement compared with not using memory mechanism.

Speaker Identification

Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization

1 code implementation22 Oct 2020 Mingjie Chen, Yanpei Shi, Thomas Hain

In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples.

Sound Audio and Speech Processing

Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models

no code implementations16 May 2020 Qiang Huang, Thomas Hain

The first task is to predict an utterance quality score, and the second is to identify where an anomalous distortion takes place in a recording.

Speaker Re-identification with Speaker Dependent Speech Enhancement

no code implementations15 May 2020 Yanpei Shi, Qiang Huang, Thomas Hain

The obtained results show that the proposed approach using speaker dependent speech enhancement can yield better speaker recognition and speech enhancement performances than two baselines in various noise conditions.

Speaker Recognition Speech Enhancement

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

no code implementations15 May 2020 Yanpei Shi, Qiang Huang, Thomas Hain

To evaluate the effectiveness of the proposed approach, artificial datasets based on Switchboard Cellular part1 (SWBC) and Voxceleb1 are constructed in two conditions, where speakers' voices are overlapped and not overlapped.

Speaker Identification

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

no code implementations14 Jan 2020 Yanpei Shi, Thomas Hain

The proposed approach separates different speaker properties from a two-speaker signal in embedding space.

Speaker Identification

Robust Speaker Recognition Using Speech Enhancement And Attention Model

no code implementations14 Jan 2020 Yanpei Shi, Qiang Huang, Thomas Hain

Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks.

Speaker Identification Speaker Recognition +1

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

no code implementations17 Oct 2019 Yanpei Shi, Qiang Huang, Thomas Hain

In the proposed approach, frame-level encoder and attention are applied on segments of an input utterance and generate individual segment vectors.

Speaker Identification

Contextual Joint Factor Acoustic Embeddings

no code implementations16 Oct 2019 Yanpei Shi, Thomas Hain

To evaluate the effectiveness of our approaches compared to prior work, two tasks are conducted -- phone classification and speaker recognition -- and test on different TIMIT data sets.

Classification General Classification +1

Improving Noise Robustness In Speaker Identification Using A Two-Stage Attention Model

no code implementations24 Sep 2019 Yanpei Shi, Qiang Huang, Thomas Hain

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments.

Speaker Identification Speaker Recognition

Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition

no code implementations2 Jul 2019 Mortaza, Doulaty, Thomas Hain

The proposed technique for training data selection, significantly outperforms random selection, posterior-based selection as well as using all of the available data.

Automatic Speech Recognition Domain Adaptation +2

Automatic Genre and Show Identification of Broadcast Media

no code implementations10 Jun 2016 Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain

Huge amounts of digital videos are being produced and broadcast every day, leading to giant media archives.

The OpenCourseWare Metadiscourse (OCWMD) Corpus

no code implementations LREC 2016 Ghada Alharbi, Thomas Hain

This study describes a new corpus of over 60, 000 hand-annotated metadiscourse acts from 106 OpenCourseWare lectures, from two different disciplines: Physics and Economics.

A Framework for Collecting Realistic Recordings of Dysarthric Speech - the homeService Corpus

no code implementations LREC 2016 Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain

This paper introduces a new British English speech database, named the homeService corpus, which has been gathered as part of the homeService project.

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

no code implementations21 Dec 2015 Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W. M. Ng, Madina Hasan, Yu-Lan Liu, Thomas Hain

We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows.

Acoustic Modelling Automatic Speech Recognition +1

Latent Dirichlet Allocation Based Organisation of Broadcast Media Archives for Deep Neural Network Adaptation

no code implementations16 Nov 2015 Mortaza Doulaty, Oscar Saz, Raymond W. M. Ng, Thomas Hain

This paper presents a new method for the discovery of latent domains in diverse speech data, for the use of adaptation of Deep Neural Networks (DNNs) for Automatic Speech Recognition.

Acoustic Modelling Automatic Speech Recognition +1

The USFD Spoken Language Translation System for IWSLT 2014

no code implementations13 Sep 2015 Raymond W. M. Ng, Mortaza Doulaty, Rama Doddipatla, Wilker Aziz, Kashif Shah, Oscar Saz, Madina Hasan, Ghada Alharbi, Lucia Specia, Thomas Hain

The USFD primary system incorporates state-of-the-art ASR and MT techniques and gives a BLEU score of 23. 45 and 14. 75 on the English-to-French and English-to-German speech-to-text translation task with the IWSLT 2014 data.

Automatic Speech Recognition Machine Translation +3

Data-selective Transfer Learning for Multi-Domain Speech Recognition

no code implementations8 Sep 2015 Mortaza Doulaty, Oscar Saz, Thomas Hain

Negative transfer in training of acoustic models for automatic speech recognition has been reported in several contexts such as domain change or speaker characteristics.

Automatic Speech Recognition speech-recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.