Search Results for author: Peter Bell

Found 46 papers, 11 papers with code

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations19 Sep 2016 Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Language Modelling +1

Learning to adapt: a meta-learning approach for speaker adaptation

1 code implementation30 Aug 2018 Ondřej Klejch, Joachim Fainberg, Peter Bell

The performance of automatic speech recognition systems can be improved by adapting an acoustic model to compensate for the mismatch between training and testing conditions, for example by adapting to unseen speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Few-shot learning with attention-based sequence-to-sequence models

no code implementations8 Nov 2018 Bertrand Higy, Peter Bell

End-to-end approaches have recently become popular as a means of simplifying the training and deployment of speech recognition systems.

Few-Shot Learning speech-recognition +1

Analyzing deep CNN-based utterance embeddings for acoustic model adaptation

no code implementations12 Nov 2018 Joanna Rownicka, Peter Bell, Steve Renals

We analyze the representations learned by deep CNNs and compare them with deep neural network (DNN) representations and i-vectors, in the context of acoustic model adaptation.

speech-recognition Speech Recognition

Lattice-based lightly-supervised acoustic model training

no code implementations30 May 2019 Joachim Fainberg, Ondřej Klejch, Steve Renals, Peter Bell

This text data can be used for lightly supervised training, in which text matching the audio is selected using an existing speech recognition model.

Language Modelling speech-recognition +2

Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

no code implementations27 Jun 2019 Ondrej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions.

Test-time Adaptation

Top-down training for neural networks

no code implementations25 Sep 2019 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Interpreting the top layers as a classifier and the lower layers a feature extractor, one can hypothesize that unwanted network convergence may occur when the classifier has overfit with respect to the feature extractor.

speech-recognition Speech Recognition

Acoustic Model Adaptation from Raw Waveforms with SincNet

1 code implementation30 Sep 2019 Joachim Fainberg, Ondřej Klejch, Erfan Loweimi, Peter Bell, Steve Renals

Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features.

Acoustic Modelling

Embeddings for DNN speaker adaptive training

no code implementations30 Sep 2019 Joanna Rownicka, Peter Bell, Steve Renals

In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker.

Speaker Recognition

Speaker Adaptive Training using Model Agnostic Meta-Learning

1 code implementation23 Oct 2019 Ondřej Klejch, Joachim Fainberg, Peter Bell, Steve Renals

Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions.

Meta-Learning

Channel adversarial training for speaker verification and diarization

no code implementations25 Oct 2019 Chau Luu, Peter Bell, Steve Renals

Previous work has encouraged domain-invariance in deep speaker embedding by adversarially classifying the dataset or labelled environment to which the generated features belong.

Speaker Verification

DropClass and DropAdapt: Dropping classes for deep speaker representation learning

1 code implementation2 Feb 2020 Chau Luu, Peter Bell, Steve Renals

The first proposed method, DropClass, works via periodically dropping a random subset of classes from the training data and the output layer throughout training, resulting in a feature extractor trained on many different classification tasks.

General Classification Representation Learning +1

Recognizing Characters in Art History Using Deep Learning

1 code implementation31 Mar 2020 Prathmesh Madhu, Ronak Kosti, Lara Mührenberg, Peter Bell, Andreas Maier, Vincent Christlein

We present experiments and analysis on three different models and show that the model trained on domain related data gives the best performance for recognizing character.

When Can Self-Attention Be Replaced by Feed Forward Layers?

no code implementations28 May 2020 Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition.

speech-recognition Speech Recognition

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

1 code implementation14 Aug 2020 Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals, Pawel Swietojanski

We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation.

Data Augmentation Domain Adaptation +2

Understanding Compositional Structures in Art Historical Images using Pose and Gaze Priors

1 code implementation8 Sep 2020 Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Peter Bell, Andreas Maier, Vincent Christlein

These compositions are useful in analyzing the interactions in an image to study artists and their artworks.

Leveraging speaker attribute information using multi task learning for speaker verification and diarization

1 code implementation27 Oct 2020 Chau Luu, Peter Bell, Steve Renals

On a test set of US Supreme Court recordings, we show that by leveraging two additional forms of speaker attribute information derived respectively from the matched training data, and VoxCeleb corpus, we improve the performance of our deep speaker embeddings for both verification and diarization tasks, achieving a relative improvement of 26. 2% in DER and 6. 7% in EER compared to baselines using speaker labels only.

Attribute Multi-Task Learning +2

On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

no code implementations8 Nov 2020 Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Enhancing Human Pose Estimation in Ancient Vase Paintings via Perceptually-grounded Style Transfer Learning

1 code implementation10 Dec 2020 Prathmesh Madhu, Angel Villar-Corrales, Ronak Kosti, Torsten Bendschus, Corinna Reinhardt, Peter Bell, Andreas Maier, Vincent Christlein

(2) To improve the already strong results further, we created a small dataset (ClassArch) consisting of ancient Greek vase paintings from the 6-5th century BCE with person and pose annotations.

Image Retrieval Pose Estimation +3

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

no code implementations9 Feb 2021 Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Segmenting Subtitles for Correcting ASR Segmentation Errors

no code implementations EACL 2021 David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuščáková, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown

Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation.

Information Retrieval Machine Translation +4

It's not what you said, it's how you said it: discriminative perception of speech as a multichannel communication system

no code implementations1 May 2021 Sarenne Wallbridge, Peter Bell, Catherine Lai

People convey information extremely effectively through spoken interaction using multiple channels of information transmission: the lexical channel of what is said, and the non-lexical channel of how it is said.

Fusing ASR Outputs in Joint Training for Speech Emotion Recognition

no code implementations29 Oct 2021 Yuanchao Li, Peter Bell, Catherine Lai

However, due to the scarcity of emotion labelled data and the difficulty of recognizing emotional speech, it is hard to obtain reliable linguistic features and models in this research area.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

no code implementations12 Nov 2021 Ondrej Klejch, Electra Wallington, Peter Bell

We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question.

Cross-Lingual ASR Cross-Lingual Transfer +1

Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

no code implementations15 Dec 2021 Christoph Minixhofer, Ondřej Klejch, Peter Bell

In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows.

Classification

ICC++: Explainable Image Retrieval for Art Historical Corpora using Image Composition Canvas

no code implementations22 Jun 2022 Prathmesh Madhu, Tilman Marquart, Ronak Kosti, Dirk Suckow, Peter Bell, Andreas Maier, Vincent Christlein

In this work, we present a novel approach called Image Composition Canvas (ICC++) to compare and retrieve images having similar compositional elements.

Image Retrieval Retrieval

Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

no code implementations5 Oct 2022 Yuanchao Li, Yumnah Mohamied, Peter Bell, Catherine Lai

Self-supervised speech models have grown fast during the past few years and have proven feasible for use in various downstream tasks.

Emotion Recognition

Evaluating and reducing the distance between synthetic and real speech distributions

no code implementations29 Nov 2022 Christoph Minixhofer, Ondřej Klejch, Peter Bell

While modern Text-to-Speech (TTS) systems can produce natural-sounding speech, they remain unable to reproduce the full diversity found in natural speech data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Transfer Learning for Olfactory Object Detection

no code implementations24 Jan 2023 Mathias Zinnen, Prathmesh Madhu, Peter Bell, Andreas Maier, Vincent Christlein

We investigate the effect of style and category similarity in multiple datasets used for object detection pretraining.

Object object-detection +2

ODOR: The ICPR2022 ODeuropa Challenge on Olfactory Object Recognition

no code implementations24 Jan 2023 Mathias Zinnen, Prathmesh Madhu, Ronak Kosti, Peter Bell, Andreas Maier, Vincent Christlein

The Odeuropa Challenge on Olfactory Object Recognition aims to foster the development of object detection in the visual arts and to promote an olfactory perspective on digital heritage.

Domain Adaptation Few-Shot Learning +4

Explanations for Automatic Speech Recognition

no code implementations27 Feb 2023 Xiaoliang Wu, Peter Bell, Ajitha Rajan

We address quality assessment for neural network based ASR by providing explanations that help increase our understanding of the system and ultimately help build trust in the system.

Automatic Speech Recognition Explainable Artificial Intelligence (XAI) +4

The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR

no code implementations31 Mar 2023 Ramon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, Ondrej Klejch, Peter Bell

Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English as spoken today around the globe.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

no code implementations25 May 2023 Yuanchao Li, Zeyu Zhao, Ondrej Klejch, Peter Bell, Catherine Lai

To overcome this challenge, we investigate how Automatic Speech Recognition (ASR) performs on emotional speech by analyzing the ASR performance on emotion corpora and examining the distribution of word errors and confidence scores in ASR transcripts to gain insight into how emotion affects ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Transfer Learning for Personality Perception via Speech Emotion Recognition

no code implementations25 May 2023 Yuanchao Li, Peter Bell, Catherine Lai

In this work, we investigate the relationship between two affective attributes: personality and emotion, from a transfer learning perspective.

Speech Emotion Recognition Transfer Learning

Can We Trust Explainable AI Methods on ASR? An Evaluation on Phoneme Recognition

no code implementations29 May 2023 Xiaoliang Wu, Peter Bell, Ajitha Rajan

Explainable AI (XAI) techniques have been widely used to help explain and understand the output of deep learning models in fields such as image classification and Natural Language Processing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Quantifying the perceptual value of lexical and non-lexical channels in speech

no code implementations7 Jul 2023 Sarenne Wallbridge, Peter Bell, Catherine Lai

Speech is a fundamental means of communication that can be seen to provide two channels for transmitting information: the lexical channel of which words are said, and the non-lexical channel of how they are spoken.

Improving Code-switched ASR with Linguistic Information

no code implementations COLING 2022 Jie Chi, Peter Bell

This paper seeks to improve the performance of automatic speech recognition (ASR) systems operating on code-switched speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.