Search Results for author: Steve Renals

Found 48 papers, 16 papers with code

Phonetic Error Analysis of Raw Waveform Acoustic Models with Parametric and Non-Parametric CNNs

no code implementations2 Jun 2024 Erfan Loweimi, Andrea Carmantini, Peter Bell, Steve Renals, Zoran Cvetkovic

Our raw waveform acoustic models consists of parametric (Sinc2Net) or non-parametric CNNs and Bidirectional LSTMs, achieving down to 13. 7%/15. 2% PERs on TIMIT Dev/Test sets, outperforming reported PERs for raw waveform models in the literature.

Transfer Learning

Towards Robust Waveform-Based Acoustic Models

no code implementations16 Oct 2021 Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu

We study the problem of learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions.

Data Augmentation Inductive Bias +3

Automatic audiovisual synchronisation for ultrasound tongue imaging

no code implementations31 May 2021 Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals

Our results demonstrate the strength of our approach and its ability to generalise to data from new domains.

Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors

no code implementations27 Feb 2021 Manuel Sam Ribeiro, Joanne Cleland, Aciel Eshky, Korin Richmond, Steve Renals

For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio.

Silent versus modal multi-speaker speech recognition from ultrasound and video

no code implementations27 Feb 2021 Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing.

speech-recognition Speech Recognition

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

no code implementations9 Feb 2021 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

no code implementations8 Nov 2020 Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

1 code implementation27 Oct 2020 Chau Luu, Peter Bell, Steve Renals

On a test set of US Supreme Court recordings, we show that by leveraging two additional forms of speaker attribute information derived respectively from the matched training data, and VoxCeleb corpus, we improve the performance of our deep speaker embeddings for both verification and diarization tasks, achieving a relative improvement of 26. 2% in DER and 6. 7% in EER compared to baselines using speaker labels only.

Attribute Multi-Task Learning +2

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

1 code implementation14 Aug 2020 Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.

Data Augmentation Domain Adaptation +2

Word Error Rate Estimation Without ASR Output: e-WER2

1 code implementation8 Aug 2020 Ahmed Ali, Steve Renals

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

When Can Self-Attention Be Replaced by Feed Forward Layers?

no code implementations28 May 2020 Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition.

speech-recognition Speech Recognition

DropClass and DropAdapt: Dropping classes for deep speaker representation learning

1 code implementation2 Feb 2020 Chau Luu, Peter Bell, Steve Renals

The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks.

General Classification Representation Learning +1

Channel adversarial training for speaker verification and diarization

no code implementations25 Oct 2019 Chau Luu, Peter Bell, Steve Renals

Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong.

Speaker Verification

Speaker Adaptive Training using Model Agnostic Meta-Learning

1 code implementation23 Oct 2019 Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions.

Meta-Learning

Acoustic Model Adaptation from Raw Waveforms with SincNet

1 code implementation30 Sep 2019 Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features.

Acoustic Modelling

Embeddings for DNN speaker adaptive training

no code implementations30 Sep 2019 Joanna Rownicka, Peter Bell, Steve Renals

In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker.

Speaker Recognition

Top-down training for neural networks

no code implementations25 Sep 2019 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.

speech-recognition Speech Recognition

Synchronising audio and ultrasound by learning cross-modal embeddings

1 code implementation1 Jul 2019 Aciel Eshky, Manuel Sam Ribeiro, Korin Richmond, Steve Renals

Audiovisual synchronisation is the task of determining the time offset between speech audio and a video recording of the articulators.

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

1 code implementation1 Jul 2019 Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.

speaker-diarization Speaker Diarization +1

Speaker-independent classification of phonetic segments from raw ultrasound in child speech

no code implementations1 Jul 2019 Manuel Sam Ribeiro, Aciel Eshky, Korin Richmond, Steve Renals

Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production.

General Classification

Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

no code implementations27 Jun 2019 Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions.

Test-time Adaptation

Lattice-based lightly-supervised acoustic model training

no code implementations30 May 2019 Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model.

Language Modelling speech-recognition +2

Dynamic Evaluation of Transformer Language Models

1 code implementation17 Apr 2019 Ben Krause, Emmanuel Kahembwe, Iain Murray, Steve Renals

This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation.

Language Modelling

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

no code implementations12 Nov 2018 Joanna Rownicka, Peter Bell, Steve Renals

We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation.

speech-recognition Speech Recognition

Word Error Rate Estimation for Speech Recognition: e-WER

1 code implementation ACL 2018 Ahmed Ali, Steve Renals

Measuring the performance of automatic speech recognition (ASR) systems requires manually transcribed data in order to compute the word error rate (WER), which is often time-consuming and expensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Speech Recognition Challenge in the Wild: Arabic MGB-3

1 code implementation21 Sep 2017 Ahmed Ali, Stephan Vogel, Steve Renals

Two hours of audio per dialect were released for development and a further two hours were used for evaluation.

Arabic Speech Recognition Dialect Identification +2

End-to-End Neural Segmental Models for Speech Recognition

no code implementations1 Aug 2017 Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

Decoder speech-recognition +1

Small-footprint Highway Deep Neural Networks for Speech Recognition

no code implementations18 Oct 2016 Liang Lu, Steve Renals

Furthermore, HDNNs are more controllable than DNNs: the gate functions of an HDNN can control the behavior of the whole network using a very small number of model parameters.

speech-recognition Speech Recognition

Multiplicative LSTM for sequence modelling

1 code implementation26 Sep 2016 Ben Krause, Liang Lu, Iain Murray, Steve Renals

We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures.

Density Estimation Language Modelling

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations19 Sep 2016 Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Diversity +2

Multi-view Dimensionality Reduction for Dialect Identification of Arabic Broadcast Speech

no code implementations19 Sep 2016 Sameer Khurana, Ahmed Ali, Steve Renals

In this work, we present a new Vector Space Model (VSM) of speech utterances for the task of spoken dialect identification.

Dialect Identification Dimensionality Reduction

Knowledge Distillation for Small-footprint Highway Networks

no code implementations2 Aug 2016 Liang Lu, Michelle Guo, Steve Renals

We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models.

Acoustic Modelling Knowledge Distillation +2

Differentiable Pooling for Unsupervised Acoustic Model Adaptation

no code implementations31 Mar 2016 Pawel Swietojanski, Steve Renals

We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators.

speech-recognition Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

no code implementations1 Mar 2016 Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals

This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.

Acoustic Modelling Language Modelling +2

Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation

no code implementations12 Jan 2016 Pawel Swietojanski, Jinyu Li, Steve Renals

This work presents a broad study on the adaptation of neural network acoustic models by means of learning hidden unit contributions (LHUC) -- a method that linearly re-combines hidden units in a speaker- or environment-dependent manner using small amounts of unsupervised adaptation data.

speech-recognition Speech Recognition

Small-footprint Deep Neural Networks with Highway Connections for Speech Recognition

no code implementations14 Dec 2015 Liang Lu, Steve Renals

For speech recognition, deep neural networks (DNNs) have significantly improved the recognition accuracy in most of benchmark datasets and application domains.

speech-recognition Speech Recognition

Tied Probabilistic Linear Discriminant Analysis for Speech Recognition

no code implementations4 Nov 2014 Liang Lu, Steve Renals

Acoustic models using probabilistic linear discriminant analysis (PLDA) capture the correlations within feature vectors using subspaces which do not vastly expand the model.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.