Search Results for author: Philip C. Woodland

Found 36 papers, 10 papers with code

FastInject: Injecting Unpaired Text Data into CTC-based ASR training

no code implementations14 Dec 2023 Keqi Deng, Philip C. Woodland

Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models have achieved impressive results, especially with the development of self-supervised learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

no code implementations19 Nov 2023 Keqi Deng, Philip C. Woodland

An Auto-regressive Integrate-and-Fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Speech-based Slot Filling using Large Language Models

no code implementations13 Nov 2023 Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland

Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks.

In-Context Learning slot-filling +1

It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation

1 code implementation30 Sep 2023 Wen Wu, Wenlin Chen, Chao Zhang, Philip C. Woodland

Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment.

Density Estimation Meta-Learning

Decoupled Structure for Improved Adaptability of End-to-End Models

no code implementations25 Aug 2023 Keqi Deng, Philip C. Woodland

Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

1 code implementation14 Aug 2023 Wen Wu, Chao Zhang, Philip C. Woodland

Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors.

Action Detection Activity Detection +4

Label-Synchronous Neural Transducer for End-to-End ASR

no code implementations6 Jul 2023 Keqi Deng, Philip C. Woodland

Hence blank tokens are no longer needed and the prediction network can be easily adapted using text data.

Domain Adaptation

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data

no code implementations4 Jul 2023 Guangzhi Sun, Chao Zhang, Ivan Vulić, Paweł Budzianowski, Philip C. Woodland

In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression

1 code implementation11 Jun 2023 Wen Wu, Chao Zhang, Philip C. Woodland

In automatic emotion recognition (AER), labels assigned by different human annotators to the same utterance are often inconsistent due to the inherent complexity of emotion and the subjectivity of perception.

Attribute Emotion Recognition +1

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

1 code implementation2 Jun 2023 Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland

End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Self-supervised representations in speech-based depression detection

no code implementations20 May 2023 Wen Wu, Chao Zhang, Philip C. Woodland

This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

no code implementations20 Mar 2023 Xiaoyu Yang, Qiujia Li, Chao Zhang, Philip C. Woodland

The performance of the student model can be further enhanced when multiple teachers are used jointly, achieving word error rate reductions (WERRs) of 17. 5% and 10. 6%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Adaptable End-to-End ASR Models using Replaceable Internal LMs and Residual Softmax

no code implementations16 Feb 2023 Keqi Deng, Philip C. Woodland

End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence distribution of paired audio-transcript training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Distribution-based Emotion Recognition in Conversation

1 code implementation9 Nov 2022 Wen Wu, Chao Zhang, Philip C. Woodland

Automatic emotion recognition in conversation (ERC) is crucial for emotion-aware conversational artificial intelligence.

Emotion Recognition in Conversation

Biased Self-supervised learning for ASR

no code implementations4 Nov 2022 Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdelrahman Mohamed, Philip C. Woodland

Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

End-to-end Spoken Language Understanding with Tree-constrained Pointer Generator

1 code implementation29 Oct 2022 Guangzhi Sun, Chao Zhang, Philip C. Woodland

Specifically, a tree-constrained pointer generator (TCPGen), a powerful and efficient biasing model component, is studied, which leverages a slot shortlist with corresponding entities to extract biasing lists.

intent-classification Intent Classification +6

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription

no code implementations8 Jul 2022 Xianrui Zheng, Chao Zhang, Philip C. Woodland

Self-supervised-learning-based pre-trained models for speech data, such as Wav2Vec 2. 0 (W2V2), have become the backbone of many speech tasks.

Action Detection Activity Detection +3

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

no code implementations8 Mar 2022 Wen Wu, Chao Zhang, Xixin Wu, Philip C. Woodland

In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes.

Attribute Emotion Classification +1

Knowledge Distillation for Neural Transducers from Large Self-Supervised Pre-trained Models

no code implementations7 Oct 2021 Xiaoyu Yang, Qiujia Li, Philip C. Woodland

Self-supervised pre-training is an effective approach to leveraging a large amount of unlabelled data to reduce word error rates (WERs) of automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition

no code implementations7 Oct 2021 Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C. Woodland

As end-to-end automatic speech recognition (ASR) models reach promising performance, various downstream tasks rely on good confidence estimators for these systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

no code implementations29 Jul 2021 Xianrui Zheng, Chao Zhang, Philip C. Woodland

Furthermore, on the AMI corpus, the proposed conversion for language prior probabilities enables BERT to obtain an extra 3% relative WERR, and the combination of BERT, GPT and GPT-2 results in further improvements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Combining Frame-Synchronous and Label-Synchronous Systems for Speech Recognition

1 code implementation1 Jul 2021 Qiujia Li, Chao Zhang, Philip C. Woodland

Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Distributed Optimisation Framework Combining Natural Gradient with Hessian-Free for Discriminative Sequence Training

no code implementations12 Mar 2021 Adnan Haider, Chao Zhang, Florian L. Kreyssig, Philip C. Woodland

This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training that can operate efficiently in a distributed manner.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Emotion recognition by fusing time synchronous and time asynchronous representations

no code implementations27 Oct 2020 Wen Wu, Chao Zhang, Philip C. Woodland

In this paper, a novel two-branch neural network model structure is proposed for multimodal emotion recognition, which consists of a time synchronous branch (TSB) and a time asynchronous branch (TAB).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Improved Large-margin Softmax Loss for Speaker Diarisation

no code implementations10 Nov 2019 Yassir Fathullah, Chao Zhang, Philip C. Woodland

Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers.

Discriminative Neural Clustering for Speaker Diarisation

1 code implementation22 Oct 2019 Qiujia Li, Florian L. Kreyssig, Chao Zhang, Philip C. Woodland

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem.

Clustering Data Augmentation

Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition

no code implementations14 Sep 2019 Qiujia Li, Chao Zhang, Philip C. Woodland

This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Sequence Training of DNN Acoustic Models With Natural Gradient

no code implementations6 Apr 2018 Adnan Haider, Philip C. Woodland

Deep Neural Network (DNN) acoustic models often use discriminative sequence training that optimises an objective function that better approximates the word error rate (WER) than frame-based training.

Computational Efficiency

Very Deep Convolutional Neural Networks for Robust Speech Recognition

2 code implementations2 Oct 2016 Yanmin Qian, Philip C. Woodland

On the Aurora 4 task, the very deep CNN achieves a WER of 8. 81%, further 7. 99% with auxiliary feature joint training, and 7. 09% with LSTM-RNN joint decoding.

Robust Speech Recognition speech-recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.