Search Results for author: Helen Meng

Found 129 papers, 32 papers with code

On Controlling Fallback Responses for Grounded Dialogue Generation

no code implementations • Findings (ACL) 2022 • Hongyuan Lu, Wai Lam, Hong Cheng, Helen Meng

We propose a novel framework that automatically generates a control token with the generator to bias the succeeding response towards informativeness for answerable contexts and fallback for unanswerable contexts in an end-to-end manner.

Dialogue Generation Informativeness

Paper
Add Code

Partner Personas Generation for Dialogue Response Generation

no code implementations • NAACL 2022 • Hongyuan Lu, Wai Lam, Hong Cheng, Helen Meng

Incorporating personas information allows diverse and engaging responses in dialogue response generation.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout

no code implementations • dialdoc (ACL) 2022 • Kun Li, Tianhua Zhang, Liping Tang, Junan Li, Hongyuan Lu, Xixin Wu, Helen Meng

For the response generator, we use grounding span prediction as an auxiliary task to be jointly trained with the main task of response generation.

Dialogue Generation Response Generation

Paper
Add Code

Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis

no code implementations • COLING 2022 • Xueyuan Chen, Shun Lei, Zhiyong Wu, Dong Xu, Weifeng Zhao, Helen Meng

On top of these, a bi-reference attention mechanism is used to align both local-scale reference style embedding sequence and local-scale context style embedding sequence with corresponding phoneme embedding sequence.

Speech Synthesis

Paper
Add Code

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

no code implementations • 14 Feb 2024 • Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng

Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i. e. "hallucinations", even when they hold relevant knowledge.

Paper
Add Code

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

no code implementations • 26 Jan 2024 • Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng

Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech.

Domain Adaptation Generative Adversarial Network +1

Paper
Add Code

SCNet: Sparse Compression Network for Music Source Separation

no code implementations • 24 Jan 2024 • Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng

We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information.

Music Source Separation

Paper
Add Code

Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation

no code implementations • 15 Jan 2024 • Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng

Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention.

Music Generation

Paper
Add Code

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

no code implementations • 8 Jan 2024 • Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

To the best of our knowledge, this work represents an early effort to integrate SIMO and SISO for multi-talker speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation

no code implementations • 24 Dec 2023 • Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng

In this paper, we present CaRE-SEP, a consistent and relevant embedding network for general sound separation to encourage a comprehensive reconsideration of query usage in audio separation.

Paper
Add Code

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

no code implementations • 19 Dec 2023 • Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng

Both objective and subjective evaluations demonstrate that our proposed method can effectively improve the naturalness and expressiveness of the synthesized speech in audiobook synthesis especially for the role and out-of-domain scenarios.

Speech Synthesis

Paper
Add Code

SimCalib: Graph Neural Network Calibration based on Similarity between Nodes

no code implementations • 19 Dec 2023 • Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng

A novel calibration framework, named SimCalib, is accordingly proposed to consider similarity between nodes at global and local levels.

Paper
Add Code

Injecting linguistic knowledge into BERT for Dialogue State Tracking

no code implementations • 27 Nov 2023 • Xiaohan Feng, Xixin Wu, Helen Meng

This correlation facilitates a comprehensive understanding of the linguistic features influencing the DST model's decision-making process.

Decision Making Dialogue State Tracking

Paper
Add Code

DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

no code implementations • 18 Oct 2023 • Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng

Existing augmentation methods for speaker verification manipulate the raw signal, which are time-consuming and the augmented samples lack diversity.

Data Augmentation Speaker Verification

Paper
Add Code

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

1 code implementation • 19 Sep 2023 • Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass

How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning?

Instruction Following Language Modelling +5

Paper
Code

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias

no code implementations • 14 Sep 2023 • Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng

However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces.

Audio Synthesis Generative Adversarial Network +1

Paper
Add Code

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

no code implementations • 4 Sep 2023 • Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

no code implementations • 4 Sep 2023 • Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng

Recently, excellent progress has been made in speech recognition.

Domain Generalization speech-recognition +1

Paper
Add Code

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

no code implementations • 1 Sep 2023 • Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng

Specifically, in the pre-training step, we design a phoneme predictor to produce the frame-level phoneme probability vectors as the phonemic timing information and a speaker encoder to model the timbre variations of different singers, and directly estimate the frame-level f0 values from the audio to provide the pitch information.

Singing Voice Synthesis Unsupervised Pre-training

Paper
Add Code

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

no code implementations • 31 Aug 2023 • Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng

This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio.

Representation Learning Speech Synthesis +2

Paper
Add Code

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

no code implementations • 31 Aug 2023 • Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice.

Singing Voice Synthesis

Paper
Add Code

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

no code implementations • 31 Aug 2023 • Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng

The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style.

Expressive Speech Synthesis Sentence +1

Paper
Add Code

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

no code implementations • 31 Aug 2023 • Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech.

Multi-Task Learning

Paper
Add Code

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

no code implementations • 29 Aug 2023 • Jingyan Zhou, Minda Hu, Junan Li, Xiaoying Zhang, Xixin Wu, Irwin King, Helen Meng

Our analysis exhibits the potentials and flaws in existing resources (models and datasets) in developing explainable moral judgment-making systems.

Ethics

Paper
Add Code

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

no code implementations • 6 Jul 2023 • Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui Jin, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date.

Speech Dereverberation Speech Separation

Paper
Add Code

Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

no code implementations • 27 Jun 2023 • Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi Wang, Helen Meng, Xunying Liu

Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity.

Domain Adaptation speech-recognition +1

Paper
Add Code

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

no code implementations • 25 May 2023 • Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SAIL: Search-Augmented Instruction Learning

no code implementations • 24 May 2023 • Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.

Denoising Fact Checking +3

Paper
Add Code

The defender's perspective on automatic speaker verification: An overview

no code implementations • 22 May 2023 • Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) plays a critical role in security-sensitive environments.

Speaker Verification

Paper
Add Code

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

no code implementations • 16 May 2023 • Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng

Experimental results show that our proposed method achieves comparable similarity and higher naturalness than the supervised method, which needs a huge amount of annotated corpora for training and is applicable to improve similarity for VC methods with other SSL representations as input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting

no code implementations • 15 May 2023 • Xiaoying Zhang, Baolin Peng, Kun Li, Jingyan Zhou, Helen Meng

Building end-to-end task bots and maintaining their integration with new functionalities using minimal human efforts is a long-standing challenge in dialog research.

dialog state tracking

Paper
Add Code

CB-Conformer: Contextual biasing Conformer for biased word recognition

1 code implementation • 19 Apr 2023 • Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng

In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.

Automatic Speech Recognition Language Modelling +2

Paper
Code

Interpretable Unified Language Checking

1 code implementation • 7 Apr 2023 • Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass

Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge.

Fact Checking Fairness +2

Paper
Code

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

1 code implementation • 14 Mar 2023 • Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng

Experimental results based on the ACII Challenge 2022 dataset demonstrate the superior performance of the proposed system and the effectiveness of considering multiple relationships using hierarchical regression chain models.

Ranked #1 on Vocal Bursts Intensity Prediction on HUME-VB

A-VB Culture A-VB High +6

Paper
Code

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

no code implementations • 14 Mar 2023 • Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng

This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.

Alzheimer's Disease Detection Binary Classification

Paper
Add Code

Decision Support System for Chronic Diseases Based on Drug-Drug Interactions

1 code implementation • 4 Mar 2023 • Tian Bian, Yuli Jiang, Jia Li, Tingyang Xu, Yu Rong, Yi Su, Timothy Kwok, Helen Meng, Hong Cheng

Many patients with chronic diseases resort to multiple medications to relieve various symptoms, which raises concerns about the safety of multiple medication use, as severe drug-drug antagonism can lead to serious adverse effects or even death.

counterfactual Representation Learning

Paper
Code

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

no code implementations • 28 Feb 2023 • Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2. 0 models consistently outperform the standalone wav2vec2. 0 models by statistically significant WER reductions of 8. 22% and 3. 43% absolute (26. 71% and 15. 88% relative) on the two tasks respectively.

speech-recognition Speech Recognition

Paper
Add Code

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

no code implementations • 20 Feb 2023 • Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng

Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains challenging.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

no code implementations • 2 Feb 2023 • Holam Chung, Junan Li, Pengfei Liu1, Wai-Kim Leung, Xixin Wu, Helen Meng

Homophone characters are common in tonal syllable-based languages, such as Mandarin and Cantonese.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Exploiting prompt learning with pre-trained language models for Alzheimer's Disease detection

1 code implementation • 29 Oct 2022 • Yi Wang, Jiajun Deng, Tianzi Wang, Bo Zheng, Shoukang Hu, Xunying Liu, Helen Meng

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression.

Alzheimer's Disease Detection Language Modelling +1

Paper
Code

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

1 code implementation • 27 Oct 2022 • Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng

Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages.

Transfer Learning

155

Paper
Code

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $β$-VAE

no code implementations • 25 Oct 2022 • Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng

We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation.

Disentanglement Voice Conversion

Paper
Add Code

Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation

no code implementations • 7 Oct 2022 • Liping Tang, Zhen Li, ZhiQuan Luo, Helen Meng

Further experiments on the downstream task of Cross-Lingual Natural Language Inference show that the proposed model achieves significant performance improvement for distant language pairs in downstream tasks compared to state-of-the-art adversarial and non-adversarial models.

Cross-Lingual Natural Language Inference

Paper
Add Code

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

no code implementations • 3 Oct 2022 • Xuanjun Chen, Haibin Wu, Helen Meng, Hung-Yi Lee, Jyh-Shing Roger Jang

Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications.

Adversarial Robustness Audio-Visual Active Speaker Detection

Paper
Add Code

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

1 code implementation • 22 Sep 2022 • Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling progressively in multiple stages into MSMC Representations (MSMCRs) with different time resolutions, and quantizing them with multiple VQ codebooks, respectively.

155

Paper
Code

Bayesian Neural Network Language Modeling for Speech Recognition

1 code implementation • 28 Aug 2022 • Boyang Xue, Shoukang Hu, Junhao Xu, Mengzhe Geng, Xunying Liu, Helen Meng

State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.

Data Augmentation Language Modelling +4

Paper
Code

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

1 code implementation • 18 Aug 2022 • Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng

One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic.

Disentanglement Voice Conversion

103

Paper
Code

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

no code implementations • 10 Aug 2022 • Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng

This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches.

Style Transfer

Paper
Add Code

Exploring linguistic feature and model combination for speech recognition based automatic AD detection

no code implementations • 28 Jun 2022 • Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and delay progression.

Model Selection speech-recognition +1

Paper
Add Code

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

no code implementations • 24 Jun 2022 • Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng

A key challenge for automatic speech recognition (ASR) systems is to model the speaker level variability.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

no code implementations • 23 Jun 2022 • Mingyu Cui, Jiajun Deng, Shoukang Hu, Xurong Xie, Tianzi Wang, Shujie Hu, Mengzhe Geng, Boyang Xue, Xunying Liu, Helen Meng

Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection

no code implementations • 23 Jun 2022 • Tianzi Wang, Jiajun Deng, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, Helen Meng

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.

Alzheimer's Disease Detection Data Augmentation +3

Paper
Add Code

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng

However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.

Open-Ended Question Answering Speaker Verification

Paper
Add Code

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

no code implementations • 15 Jun 2022 • Shujie Hu, Xurong Xie, Mengzhe Geng, Mingyu Cui, Jiajun Deng, Guinan Li, Tianzi Wang, Xunying Liu, Helen Meng

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cross-lingual Word Embeddings in Hyperbolic Space

no code implementations • 4 May 2022 • Chandni Saxena, Mudit Chaudhary, Helen Meng

Cross-lingual word embeddings can be applied to several natural language processing applications across multiple languages.

Cross-Lingual Word Embeddings Word Embeddings

Paper
Add Code

Audio-visual multi-channel speech separation, dereverberation and recognition

no code implementations • 5 Apr 2022 • Guinan Li, Jianwei Yu, Jiajun Deng, Xunying Liu, Helen Meng

Despite the rapid advance of automatic speech recognition (ASR) technologies, accurate recognition of cocktail party speech characterised by the interference from overlapping speakers, background noise and room reverberation remains a highly challenging task to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

1 code implementation • 31 Mar 2022 • Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng

In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence.

Sentence

Paper
Code

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

1 code implementation • 31 Mar 2022 • Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng

Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task.

Paper
Code

Neural Architecture Search for Speech Emotion Recognition

no code implementations • 31 Mar 2022 • Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng

Deep neural networks have brought significant advancements to speech emotion recognition (SER).

Neural Architecture Search Speech Emotion Recognition

Paper
Add Code

Spoofing-Aware Speaker Verification by Multi-Level Fusion

no code implementations • 29 Mar 2022 • Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.

Speaker Verification

Paper
Add Code

On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition

no code implementations • 28 Mar 2022 • Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu

Accurate recognition of dysarthric and elderly speech remain challenging tasks to date.

speech-recognition Speech Recognition

Paper
Add Code

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

no code implementations • 24 Mar 2022 • Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng

In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

no code implementations • 23 Mar 2022 • Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

In this paper, we propose a hierarchical framework to model speaking style from context.

Expressive Speech Synthesis Knowledge Distillation +1

Paper
Add Code

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

2 code implementations • 23 Mar 2022 • Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng

Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention.

Speech Enhancement

211

Paper
Code

Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition

no code implementations • 19 Mar 2022 • Shujie Hu, Shansong Liu, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shoukang Hu, Mingyu Cui, Xunying Liu, Helen Meng

Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems for normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

no code implementations • 21 Feb 2022 • Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, Helen Meng

Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum decomposition are proposed in this paper to facilitate auxiliary feature based speaker adaptation of state-of-the-art hybrid DNN/TDNN and end-to-end Conformer speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

no code implementations • 18 Feb 2022 • Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng

Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.

Quantization Speech Synthesis +2

Paper
Add Code

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

no code implementations • 18 Feb 2022 • Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng

The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality.

Multi-Task Learning Speaker Verification

Paper
Add Code

TalkTive: A Conversational Agent Using Backchannels to Engage Older Adults in Neurocognitive Disorders Screening

no code implementations • 16 Feb 2022 • Zijian Ding, Jiawen Kang, Tinky Oi Ting HO, Ka Ho Wong, Helene H. Fung, Helen Meng, Xiaojuan Ma

This is used in the development of TalkTive, a CA which can predict both timing and form of backchanneling during cognitive assessments.

Paper
Add Code

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

1 code implementation • 16 Feb 2022 • Jingyan Zhou, Jiawen Deng, Fei Mi, Yitong Li, Yasheng Wang, Minlie Huang, Xin Jiang, Qun Liu, Helen Meng

The research of open-domain dialog systems has been greatly prospered by neural models trained on large-scale corpora, however, such corpora often introduce various safety problems (e. g., offensive languages, biases, and toxic behaviors) that significantly hinder the deployment of dialog systems in practice.

Bias Detection Open-Domain Dialog

Paper
Code

Partially Fake Audio Detection by Self-attention-based Fake Span Discovery

no code implementations • 14 Feb 2022 • Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng

Also ADD 2022 is the first challenge to propose the partially fake audio detection task.

Open-Ended Question Answering Speech Synthesis +1

Paper
Add Code

User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems

1 code implementation • 7 Feb 2022 • Yang Deng, Wenxuan Zhang, Wai Lam, Hong Cheng, Helen Meng

In this paper, we propose a novel framework, namely USDA, to incorporate the sequential dynamics of dialogue acts for predicting user satisfaction, by jointly learning User Satisfaction Estimation and Dialogue Act Recognition tasks.

Paper
Code

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

no code implementations • 4 Feb 2022 • Naijun Zheng, Na Li, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su, Helen Meng

This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks.

Action Detection Activity Detection +6

Paper
Add Code

Convex Polytope Modelling for Unsupervised Derivation of Semantic Structure for Data-efficient Natural Language Understanding

no code implementations • 25 Jan 2022 • Jingyan Zhou, Xiaohan Feng, King Keung Wu, Helen Meng

Popular approaches for Natural Language Understanding (NLU) usually rely on a huge amount of annotated data or handcrafted rules, which is laborious and not adaptive to domain extension.

Natural Language Understanding

Paper
Add Code

Toward Self-learning End-to-End Task-Oriented Dialog Systems

no code implementations • SIGDIAL (ACL) 2022 • Xiaoying Zhang, Baolin Peng, Jianfeng Gao, Helen Meng

In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

1 code implementation • 17 Jan 2022 • PengFei Liu, Kun Li, Helen Meng

Emotion recognition is a challenging and actively-studied research area that plays a critical role in emotion-aware human-computer interaction systems.

Multimodal Emotion Recognition

Paper
Code

COLD: A Benchmark for Chinese Offensive Language Detection

1 code implementation • 16 Jan 2022 • Jiawen Deng, Jingyan Zhou, Hao Sun, Chujie Zheng, Fei Mi, Helen Meng, Minlie Huang

To this end, we propose a benchmark --COLD for Chinese offensive language analysis, including a Chinese Offensive Language Dataset --COLDATASET and a baseline detector --COLDETECTOR which is trained on the dataset.

175

Paper
Code

Recent Progress in the CUHK Dysarthric Speech Recognition System

no code implementations • 15 Jan 2022 • Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Paper
Add Code

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

no code implementations • 14 Jan 2022 • Mengzhe Geng, Xurong Xie, Shansong Liu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng

This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation.

Data Augmentation speech-recognition +1

Paper
Add Code

Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

no code implementations • 14 Jan 2022 • Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi Ye, Zengrui Jin, Xunying Liu, Helen Meng

Automatic recognition of disordered speech remains a highly challenging task to date.

Data Augmentation speech-recognition +1

Paper
Add Code

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

1 code implementation • 8 Jan 2022 • Shoukang Hu, Xurong Xie, Mingyu Cui, Jiajun Deng, Shansong Liu, Jianwei Yu, Mengzhe Geng, Xunying Liu, Helen Meng

State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers

no code implementations • 29 Nov 2021 • Junhao Xu, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng

Index Terms: Language models, Recurrent neural networks, Quantization, Alternating direction methods of multipliers.

Quantization

Paper
Add Code

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

no code implementations • 29 Nov 2021 • Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng

In order to overcome the difficulty in using gradient descent methods to directly estimate discrete quantized weights, alternating direction methods of multipliers (ADMM) are used to efficiently train quantized LMs.

Neural Architecture Search Quantization +2

Paper
Add Code

Mixed Precision of Quantization of Transformer Language Models for Speech Recognition

no code implementations • 29 Nov 2021 • Junhao Xu, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng

Experiments conducted on Penn Treebank (PTB) and a Switchboard corpus trained LF-MMI TDNN system suggest the proposed mixed precision Transformer quantization techniques achieved model size compression ratios of up to 16 times over the full precision baseline with no recognition performance degradation.

Quantization speech-recognition +1

Paper
Add Code

Characterizing the adversarial vulnerability of speech self-supervised learning

no code implementations • 8 Nov 2021 • Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority.

Adversarial Robustness Benchmarking +2

Paper
Add Code

Countering Online Hate Speech: An NLP Perspective

1 code implementation • 7 Sep 2021 • Mudit Chaudhary, Chandni Saxena, Helen Meng

This paper presents a holistic conceptual framework on hate-speech NLP countering methods along with a thorough survey on the current progress of NLP for countering online hate speech.

Paper
Code

Adversarial Data Augmentation for Disordered Speech Recognition

no code implementations • 2 Aug 2021 • Zengrui Jin, Mengzhe Geng, Xurong Xie, Jianwei Yu, Shansong Liu, Xunying Liu, Helen Meng

Automatic recognition of disordered speech remains a highly challenging task to date.

Data Augmentation speech-recognition +1

Paper
Add Code

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

2 code implementations • 19 Jul 2021 • Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng

This argument motivates the current work that presents a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism in the connection between feature groups.

Speaker Verification

Paper
Code

Adversarial Sample Detection for Speaker Verification by Neural Vocoders

1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee

We also show that the neural vocoder adopted in the detection framework is dataset-independent.

Speaker Verification

Paper
Code

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

1 code implementation • 18 Jun 2021 • Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement.

Disentanglement Quantization +1

318

Paper
Code

Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

no code implementations • 18 Jun 2021 • Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc.

Multi-Task Learning Unsupervised Domain Adaptation

Paper
Add Code

Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling

2 code implementations • 11 Jun 2021 • Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su

However, state-of-the-art context modeling methods in conversational TTS only model the textual information in context with a recurrent neural network (RNN).

Speech Synthesis Text-To-Speech Synthesis

258

Paper
Code

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

no code implementations • 1 Jun 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.

Adversarial Defense Adversarial Robustness +2

Paper
Add Code

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

no code implementations • 28 May 2021 • Songxiang Liu, Yuewen Cao, Dan Su, Helen Meng

Singing voice conversion (SVC) is one promising technique which can enrich the way of human-computer interaction by endowing a computer the ability to produce high-fidelity and expressive singing voice.

Denoising Voice Conversion +1

Paper
Add Code

Out-of-Scope Domain and Intent Classification through Hierarchical Joint Modeling

1 code implementation • 30 Apr 2021 • PengFei Liu, Kun Li, Helen Meng

User queries for a real-world dialog system may sometimes fall outside the scope of the system's capabilities, but appropriate system responses will enable smooth processing throughout the human-computer interaction.

Classification General Classification +3

Paper
Code

Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing

1 code implementation • 25 Apr 2021 • PengFei Liu, Youzhang Ning, King Keung Wu, Kun Li, Helen Meng

This paper presents an unsupervised two-stage approach to discover intents and generate meaningful intent labels automatically from a collection of unlabeled utterances in a domain.

Clustering Dependency Parsing +4

Paper
Code

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis

no code implementations • 14 Apr 2021 • Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng

Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS).

Dependency Parsing Representation Learning +3

Paper
Add Code

Towards Multi-Scale Style Control for Expressive Speech Synthesis

no code implementations • 8 Apr 2021 • Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng

This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.

Expressive Speech Synthesis Style Transfer

Paper
Add Code

Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

no code implementations • 14 Feb 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) is one of the core technologies in biometric identification.

Adversarial Defense Open-Ended Question Answering +2

Paper
Add Code

Bayesian Transformer Language Models for Speech Recognition

no code implementations • 9 Feb 2021 • Boyang Xue, Jianwei Yu, Junhao Xu, Shansong Liu, Shoukang Hu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.

speech-recognition Speech Recognition +1

Paper
Add Code

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

no code implementations • 30 Jan 2021 • Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng

To increase the robustness of highly controllable style transfer on multiple factors in VC, we propose a disentangled speech representation learning framework based on adversarial learning.

Representation Learning Style Transfer +1

Paper
Add Code

Creation and Evaluation of a Pre-tertiary Artificial Intelligence (AI) Curriculum

no code implementations • 19 Jan 2021 • Thomas K. F. Chiu, Helen Meng, Ching-Sing Chai, Irwin King, Savio Wong, Yeung Yam

Background: AI4Future is a cross-sector project that engages five major partners - CUHK Faculty of Engineering and Faculty of Education, Hong Kong secondary schools, the government and the AI industry.

Paper
Add Code

Unstructured Knowledge Access in Task-oriented Dialog Modeling using Language Inference, Knowledge Retrieval and Knowledge-Integrative Response Generation

1 code implementation • 15 Jan 2021 • Mudit Chaudhary, Borislav Dzodzo, Sida Huang, Chun Hei Lo, Mingzhi Lyu, Lun Yiu Nie, Jinbo Xing, Tianhua Zhang, Xiaoying Zhang, Jingyan Zhou, Hong Cheng, Wai Lam, Helen Meng

Dialog systems enriched with external knowledge can handle user queries that are outside the scope of the supporting databases/APIs.

Natural Language Inference Response Generation +1

Paper
Code

Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network

no code implementations • 21 Dec 2020 • Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng

By using deep learning approaches, Speech Emotion Recog-nition (SER) on a single domain has achieved many excellentresults.

Speech Emotion Recognition

Paper
Add Code

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

no code implementations • 13 Dec 2020 • Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng

Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels.

Representation Learning Sentence

Paper
Add Code

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition

no code implementations • 8 Dec 2020 • Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng

On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2. 5\% absolute WER reduction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Audio-visual Multi-channel Integration and Recognition of Overlapped Speech

no code implementations • 16 Nov 2020 • Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu

Automatic speech recognition (ASR) technologies have been significantly advanced in the past few decades.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

2 code implementations • 11 Nov 2020 • Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng

This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC) system, which can achieve high conversion performance, with inference speed 4x faster than real-time on CPUs.

Voice Conversion

312

Paper
Code

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

no code implementations • 3 Nov 2020 • Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng

Third, a conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech, conditioned on the target DSE that is learned via speaker encoder or speaker adaptation.

speech-recognition Speech Recognition +1

Paper
Add Code

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

no code implementations • 28 Oct 2020 • Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng

Non-autoregressive (NAR) transformer models have achieved significantly inference speedup but at the cost of inferior accuracy compared to autoregressive (AR) models in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Replay and Synthetic Speech Detection with Res2net Architecture

2 code implementations • 28 Oct 2020 • Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng

This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks.

Feature Engineering Synthetic Speech Detection

Paper
Code

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

1 code implementation • 6 Sep 2020 • Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng

During the training stage, an encoder-decoder-based hybrid connectionist-temporal-classification-attention (CTC-attention) phoneme recognizer is trained, whose encoder has a bottle-neck layer.

feature selection speech-recognition +2

312

Paper
Code

Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks

no code implementations • 17 Jul 2020 • Shoukang Hu, Xurong Xie, Shansong Liu, Mingyu Cui, Mengzhe Geng, Xunying Liu, Helen Meng

Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.

Talking Head Generation

Paper
Add Code

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

no code implementations • 11 Jun 2020 • Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng

Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training.

Binary Classification Data Augmentation +1

Paper
Add Code

Audio-visual Multi-channel Recognition of Overlapped Speech

no code implementations • 18 May 2020 • Jianwei Yu, Bo Wu, Rongzhi Gu, Shi-Xiong Zhang, LianWu Chen, Yong Xu. Meng Yu, Dan Su, Dong Yu, Xunying Liu, Helen Meng

Automatic speech recognition (ASR) of overlapped speech remains a highly challenging task to date.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

no code implementations • 8 Apr 2020 • Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data.

Speaker Verification

Paper
Add Code

Defense against adversarial attacks on spoofing countermeasures of ASV

no code implementations • 6 Mar 2020 • Haibin Wu, Songxiang Liu, Helen Meng, Hung-Yi Lee

Various forefront countermeasure methods for automatic speaker verification (ASV) with considerable performance in anti-spoofing are proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Add Code

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

no code implementations • 1 Feb 2020 • Xu Li, Xixin Wu, Xunying Liu, Helen Meng

And then we explore the non-categories by looking for the SPPGs with more than one peak.

Paper
Add Code

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations • 6 Jan 2020 • Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Ranked #4 on Audio-Visual Speech Recognition on LRS2

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Adversarial Attacks on GMM i-vector based Speaker Verification Systems

2 code implementations • 8 Nov 2019 • Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng

Experiment results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples prove to be transferable and pose threats to neuralnetwork speaker embedding based systems (e. g. x-vector systems).

Speaker Verification

Paper
Code

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

no code implementations • 23 Oct 2019 • Xingchen Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng

Our best systems achieve a relative improvement of 11. 9% and 8. 3% on the TIMIT and WSJ tasks respectively.

Representation Learning

Paper
Add Code

Adversarial Attacks on Spoofing Countermeasures of automatic speaker verification

1 code implementation • 19 Oct 2019 • Songxiang Liu, Haibin Wu, Hung-Yi Lee, Helen Meng

High-performance spoofing countermeasure systems for automatic speaker verification (ASV) have been proposed in the ASVspoof 2019 challenge.

Speaker Verification

Paper
Code

Semi-Supervised Graph Classification: A Hierarchical Graph Perspective

1 code implementation • 10 Apr 2019 • Jia Li, Yu Rong, Hong Cheng, Helen Meng, Wenbing Huang, Junzhou Huang

We study the node classification problem in the hierarchical graph where a `node' is a graph instance, e. g., a user group in the above example.

Ranked #10 on Graph Classification on D&D

General Classification Graph Classification +3

206

Paper
Code

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

no code implementations • 17 Nov 2016 • Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai

Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace.

General Classification Speech Emotion Recognition

Paper
Add Code

Analysis of Dysarthric Speech using Distinctive Feature Recognition

no code implementations • WS 2015 • Ka Ho Wong, Yu Ting Yeung, Patrick C. M. Wong, Gina-Anne Levow, Helen Meng

Speech Recognition

Paper
Add Code

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings

no code implementations • EMNLP 2015 • Pengfei Liu, Shafiq Joty, Helen Meng

Feature Engineering Opinion Mining +3

Paper
Add Code

SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis

no code implementations • SEMEVAL 2014 • Pengfei Liu, Helen Meng

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +3

Paper
Add Code

Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition

no code implementations • 23 Sep 2013 • Xin Zheng, Zhiyong Wu, Helen Meng, Weifeng Li, Lianhong Cai

In this paper, we first present a new variant of Gaussian restricted Boltzmann machine (GRBM) called multivariate Gaussian restricted Boltzmann machine (MGRBM), with its definition and learning algorithm.

Robust Speech Recognition speech-recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.