Search Results for author: Xiaodan Zhuang

Found 10 papers, 0 papers with code

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition

no code implementations16 Jan 2025 Takaaki Hori, Martin Kocour, Adnan Haider, Erik McDermott, Xiaodan Zhuang

We propose "delayed fusion," which applies LLM scores to ASR hypotheses with a delay during decoding and enables easier use of pre-trained LLMs in ASR tasks.

Automatic Speech Recognition speech-recognition +1

Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval

no code implementations1 Nov 2024 Nikolaos Flemotomos, Roger Hsiao, Pawel Swietojanski, Takaaki Hori, Dogan Can, Xiaodan Zhuang

However, the biasing mechanism is typically based on a cross-attention module between the audio and a catalogue of biasing entries, which means computational complexity can pose severe practical limitations on the size of the biasing catalogue and consequently on accuracy improvements.

Quantization Retrieval +2

Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

no code implementations23 Aug 2024 Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang

This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and attention-based encoder-decoder (AED) loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Optimizing Byte-level Representation for End-to-end ASR

no code implementations14 Jun 2024 Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan Zhuang

Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages is large.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

no code implementations18 Apr 2023 Maurits Bleeker, Pawel Swietojanski, Stefan Braun, Xiaodan Zhuang

By including approximate nearest neighbour phrases (ANN-P) in the context list, we encourage the learned representation to disambiguate between similar, but not identical, biasing phrases.

speech-recognition Speech Recognition

Exploring Retraining-Free Speech Recognition for Intra-sentential Code-Switching

no code implementations27 Aug 2021 Zhen Huang, Xiaodan Zhuang, Daben Liu, Xiaoqiang Xiao, Yuchen Zhang, Sabato Marco Siniscalchi

To achieve such an ambitious goal, new mechanisms for foreign pronunciation generation and language model (LM) enrichment have been devised.

Language Modeling Language Modelling +2

Frame-level SpecAugment for Deep Convolutional Neural Networks in Hybrid ASR Systems

no code implementations7 Dec 2020 Xinwei Li, Yuanyuan Zhang, Xiaodan Zhuang, Daben Liu

We demonstrate that f-SpecAugment is more effective than the utterance level SpecAugment for deep CNN based hybrid models.

Data Augmentation

Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts

no code implementations CVPR 2014 Shuang Wu, Sravanthi Bondugula, Florian Luisier, Xiaodan Zhuang, Pradeep Natarajan

Current state-of-the-art systems for visual content analysis require large training sets for each class of interest, and performance degrades rapidly with fewer examples.

Attribute Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.