Search Results for author: Xunying Liu

Found 38 papers, 8 papers with code

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

no code implementations13 May 2022 Zengrui Jin, Mengzhe Geng, Jiajun Deng, Tianzi Wang, Shujie Hu, Guinan Li, Xunying Liu

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date.

Automatic Speech Recognition Data Augmentation

Audio-visual multi-channel speech separation, dereverberation and recognition

no code implementations5 Apr 2022 Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate recognition of cocktail party speech characterised by the interference from overlapping speakers, background noise and room reverberation remains a highly challenging task to date.

Automatic Speech Recognition Speech Enhancement +1

Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

no code implementations19 Mar 2022 Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech.

Automatic Speech Recognition Data Augmentation

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

no code implementations21 Feb 2022 Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng

Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum decomposition are proposed in this paper to facilitate auxiliary feature based speaker adaptation of state-of-the-art hybrid DNN/TDNN and end-to-end Conformer speech recognition systems.

Automatic Speech Recognition

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

no code implementations18 Feb 2022 Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng

The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality.

Multi-Task Learning Speaker Verification

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

no code implementations18 Feb 2022 Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng

Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.

Quantization Speech Synthesis +2

Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition

no code implementations24 Jan 2022 Xurong Xie, Xiang Sui, Xunying Liu, Lan Wang

Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper.

Acoustic Modelling Speech Recognition

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition

no code implementations24 Jan 2022 Xurong Xie, Rukiye Ruzi, Xunying Liu, Lan Wang

Dysarthric speech recognition is a challenging task due to acoustic variability and limited amount of available data.

Speech Recognition

Recent Progress in the CUHK Dysarthric Speech Recognition System

no code implementations15 Jan 2022 Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.

Audio-Visual Speech Recognition Automatic Speech Recognition +2

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

no code implementations14 Jan 2022 Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng

This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation.

Data Augmentation Speech Recognition

Mixed Precision of Quantization of Transformer Language Models for Speech Recognition

no code implementations29 Nov 2021 Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng

Experiments conducted on Penn Treebank (PTB) and a Switchboard corpus trained LF-MMI TDNN system suggest the proposed mixed precision Transformer quantization techniques achieved model size compression ratios of up to 16 times over the full precision baseline with no recognition performance degradation.

Quantization Speech Recognition

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

no code implementations29 Nov 2021 Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng

In order to overcome the difficulty in using gradient descent methods to directly estimate discrete quantized weights, alternating direction methods of multipliers (ADMM) are used to efficiently train quantized LMs.

Neural Architecture Search Quantization +1

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

2 code implementations19 Jul 2021 Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng

This argument motivates the current work that presents a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism in the connection between feature groups.

Speaker Verification

Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

no code implementations18 Jun 2021 Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc.

Multi-Task Learning Unsupervised Domain Adaptation

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

1 code implementation18 Jun 2021 Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement.

Disentanglement Quantization +1

Bayesian Transformer Language Models for Speech Recognition

no code implementations9 Feb 2021 Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.

Speech Recognition Variational Inference

Bayesian Learning for Deep Neural Network Adaptation

1 code implementation14 Dec 2020 Xurong Xie, Xunying Liu, Tan Lee, Lan Wang

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.

Speech Recognition Variational Inference

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

no code implementations8 Dec 2020 Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2. 5\% absolute WER reduction.

Automatic Speech Recognition Domain Adaptation +1

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

no code implementations3 Nov 2020 Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng

Third, a conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech, conditioned on the target DSE that is learned via speaker encoder or speaker adaptation.

Speech Recognition Voice Conversion

Replay and Synthetic Speech Detection with Res2net Architecture

2 code implementations28 Oct 2020 Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng

This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks.

Feature Engineering Synthetic Speech Detection

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

no code implementations6 Sep 2020 Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng

During the training stage, an encoder-decoder-based hybrid connectionist-temporal-classification-attention (CTC-attention) phoneme recognizer is trained, whose encoder has a bottle-neck layer.

Speech Recognition Voice Conversion

Understanding the wiring evolution in differentiable neural architecture search

1 code implementation2 Sep 2020 Sirui Xie, Shoukang Hu, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

To this end, we pose questions that future differentiable methods for neural wiring discovery need to confront, hoping to evoke a discussion and rethinking on how much bias has been enforced implicitly in existing NAS methods.

Neural Architecture Search

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

no code implementations17 Jul 2020 Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng

Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation.

Automatic Speech Recognition Neural Architecture Search

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

no code implementations11 Jun 2020 Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng

Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training.

Data Augmentation Speaker Verification

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

no code implementations8 Apr 2020 Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data.

Speaker Verification

DSNAS: Direct Neural Architecture Search without Parameter Retraining

1 code implementation CVPR 2020 Shoukang Hu, Sirui Xie, Hehui Zheng, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin

We argue that given a computer vision task for which a NAS method is expected, this definition can reduce the vaguely-defined NAS evaluation to i) accuracy of this task and ii) the total computation consumed to finally obtain a model with satisfying accuracy.

Neural Architecture Search

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

no code implementations1 Feb 2020 Xu Li, Xixin Wu, Xunying Liu, Helen Meng

And then we explore the non-categories by looking for the SPPGs with more than one peak.

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations6 Jan 2020 Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Ranked #7 on Lipreading on LRS2 (using extra training data)

Audio-Visual Speech Recognition Lipreading +2

Adversarial Attacks on GMM i-vector based Speaker Verification Systems

1 code implementation8 Nov 2019 Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng

Experiment results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples prove to be transferable and pose threats to neuralnetwork speaker embedding based systems (e. g. x-vector systems).

Speaker Verification

Future Word Contexts in Neural Network Language Models

no code implementations18 Aug 2017 Xie Chen, Xunying Liu, Anton Ragni, Yu Wang, Mark Gales

Instead of using a recurrent unit to capture the complete future word contexts, a feedforward unit is used to model a finite number of succeeding, future, words.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.