Search Results for author: Peter Bell

Found 33 papers, 11 papers with code

Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

no code implementations15 Dec 2021 Christoph Minixhofer, Ondřej Klejch, Peter Bell

In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows.

Classification

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

no code implementations12 Nov 2021 Ondrej Klejch, Electra Wallington, Peter Bell

We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question.

Cross-Lingual ASR Cross-Lingual Transfer +1

Fusing ASR Outputs in Joint Training for Speech Emotion Recognition

no code implementations29 Oct 2021 Yuanchao Li, Peter Bell, Catherine Lai

However, due to the scarcity of emotion labelled data and the difficulty of recognizing emotional speech, it is hard to obtain reliable linguistic features and models in this research area.

Automatic Speech Recognition Speech Emotion Recognition

It's not what you said, it's how you said it: discriminative perception of speech as a multichannel communication system

no code implementations1 May 2021 Sarenne Wallbridge, Peter Bell, Catherine Lai

People convey information extremely effectively through spoken interaction using multiple channels of information transmission: the lexical channel of what is said, and the non-lexical channel of how it is said.

Segmenting Subtitles for Correcting ASR Segmentation Errors

no code implementations EACL 2021 David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuščáková, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown

Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation.

Information Retrieval Machine Translation +1

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

no code implementations9 Feb 2021 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.

Automatic Speech Recognition

Enhancing Human Pose Estimation in Ancient Vase Paintings via Perceptually-grounded Style Transfer Learning

1 code implementation10 Dec 2020 Prathmesh Madhu, Angel Villar-Corrales, Ronak Kosti, Torsten Bendschus, Corinna Reinhardt, Peter Bell, Andreas Maier, Vincent Christlein

(2) To improve the already strong results further, we created a small dataset (ClassArch) consisting of ancient Greek vase paintings from the 6-5th century BCE with person and pose annotations.

Image Retrieval Pose Estimation +2

On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

no code implementations8 Nov 2020 Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results.

Automatic Speech Recognition

Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models

1 code implementation8 Nov 2020 Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

To the best of our knowledge, we have achieved state-of-the-art end-to-end Transformer based model performance on Switchboard and AMI.

Automatic Speech Recognition

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

1 code implementation27 Oct 2020 Chau Luu, Peter Bell, Steve Renals

On a test set of US Supreme Court recordings, we show that by leveraging two additional forms of speaker attribute information derived respectively from the matched training data, and VoxCeleb corpus, we improve the performance of our deep speaker embeddings for both verification and diarization tasks, achieving a relative improvement of 26. 2% in DER and 6. 7% in EER compared to baselines using speaker labels only.

Multi-Task Learning Speaker Recognition +1

Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

no code implementations19 Oct 2020 David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathleen McKeown

In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation.

Information Retrieval POS +3

Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors

1 code implementation8 Sep 2020 Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Peter Bell, Andreas Maier, Vincent Christlein

These compositions are useful in analyzing the interactions in an image to study artists and their artworks.

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

1 code implementation14 Aug 2020 Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.

Data Augmentation Domain Adaptation +1

When Can Self-Attention Be Replaced by Feed Forward Layers?

no code implementations28 May 2020 Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition.

Speech Recognition

Recognizing Characters in Art History Using Deep Learning

1 code implementation31 Mar 2020 Prathmesh Madhu, Ronak Kosti, Lara Mührenberg, Peter Bell, Andreas Maier, Vincent Christlein

We present experiments and analysis on three different models and show that the model trained on domain related data gives the best performance for recognizing character.

DropClass and DropAdapt: Dropping classes for deep speaker representation learning

1 code implementation2 Feb 2020 Chau Luu, Peter Bell, Steve Renals

The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks.

General Classification Representation Learning +1

Multi-scale Octave Convolutions for Robust Speech Recognition

no code implementations31 Oct 2019 Joanna Rownicka, Peter Bell, Steve Renals

We propose a multi-scale octave convolution layer to learn robust speech representations efficiently.

Robust Speech Recognition

Channel adversarial training for speaker verification and diarization

no code implementations25 Oct 2019 Chau Luu, Peter Bell, Steve Renals

Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong.

Speaker Verification

Speaker Adaptive Training using Model Agnostic Meta-Learning

1 code implementation23 Oct 2019 Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions.

Meta-Learning

Embeddings for DNN speaker adaptive training

no code implementations30 Sep 2019 Joanna Rownicka, Peter Bell, Steve Renals

In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker.

Speaker Recognition

Acoustic Model Adaptation from Raw Waveforms with SincNet

1 code implementation30 Sep 2019 Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features.

Acoustic Modelling

Top-down training for neural networks

no code implementations25 Sep 2019 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.

Speech Recognition

Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

no code implementations27 Jun 2019 Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions.

Lattice-based lightly-supervised acoustic model training

no code implementations30 May 2019 Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model.

Speech Recognition Text Matching

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

no code implementations12 Nov 2018 Joanna Rownicka, Peter Bell, Steve Renals

We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation.

Speech Recognition

Few-shot learning with attention-based sequence-to-sequence models

no code implementations8 Nov 2018 Bertrand Higy, Peter Bell

End-to-end approaches have recently become popular as a means of simplifying the training and deployment of speech recognition systems.

Few-Shot Learning Speech Recognition

Learning to adapt: a meta-learning approach for speaker adaptation

1 code implementation30 Aug 2018 Ondřej Klejch, Joachim Fainberg, Peter Bell

The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers.

Automatic Speech Recognition Meta-Learning

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations19 Sep 2016 Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.