no code implementations • COLING 2022 • Xueyuan Chen, Shun Lei, Zhiyong Wu, Dong Xu, Weifeng Zhao, Helen Meng
On top of these, a bi-reference attention mechanism is used to align both local-scale reference style embedding sequence and local-scale context style embedding sequence with corresponding phoneme embedding sequence.
2 code implementations • 22 May 2023 • Ce Zheng, Lei LI, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, Baobao Chang
Inspired by in-context learning (ICL), a new paradigm based on demonstration contexts without parameter updating, we explore whether ICL can edit factual knowledge.
1 code implementation • CVPR 2023 • Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang
Levenshtein distance based on audio quantization as a similarity metric of corresponding speech of gestures helps match more appropriate gestures with speech, and solves the alignment problem of speech and gestures well.
no code implementations • 18 May 2023 • Xingchen Song, Di wu, BinBin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu
In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}.
no code implementations • 16 May 2023 • Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng
Experimental results show that our proposed method achieves comparable similarity and higher naturalness than the supervised method, which needs a huge amount of annotated corpora for training and is applicable to improve similarity for VC methods with other SSL representations as input.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 19 Apr 2023 • Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng
In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.
2 code implementations • 6 Mar 2023 • Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Jiangtao Feng, Jingjing Xu, Yu Qiao, Zhiyong Wu
However, the implementation of ICL is sophisticated due to the diverse retrieval and inference methods involved, as well as the varying pre-processing requirements for different models, datasets, and tasks.
1 code implementation • 11 Feb 2023 • Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, Lingpeng Kong
The performance of ICL is highly dominated by the quality of the selected in-context examples.
1 code implementation • 9 Feb 2023 • Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong
Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data.
1 code implementation • 31 Dec 2022 • Qingxiu Dong, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu sun, Jingjing Xu, Lei LI, Zhifang Sui
With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few examples.
1 code implementation • 20 Dec 2022 • Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, Lingpeng Kong
Despite the surprising few-shot performance of in-context learning (ICL), it is still a common practice to randomly sample examples to serve as context.
no code implementations • 19 Dec 2022 • Qintong Li, Zhiyong Wu, Lingpeng Kong, Wei Bi
Explaining the black-box predictions of NLP models naturally and accurately is an important open problem in natural language generation.
no code implementations • 26 Nov 2022 • Xiaojun Meng, Wenlin Dai, Yasheng Wang, Baojun Wang, Zhiyong Wu, Xin Jiang, Qun Liu
Then we present a novel lexicon-injected semantic parser, which collects slot labels of tree representation as a lexicon, and injects lexical features to the span representation of parser.
no code implementations • 21 Nov 2022 • Sijie Cheng, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, Lingpeng Kong
The major difficulty is finding the conflict point, where the statement contradicts our real world.
2 code implementations • 1 Nov 2022 • Xingchen Song, Di wu, Zhiyong Wu, BinBin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu
In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models.
no code implementations • 31 Oct 2022 • Xingchen Song, Di wu, BinBin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu
Therefore, we name it FusionFormer.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 25 Oct 2022 • Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng
We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation.
1 code implementation • 22 Oct 2022 • Jiacheng Ye, Jiahui Gao, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong
To improve the quality of dataset synthesis, we propose a progressive zero-shot dataset generation framework, ProGen, which leverages the feedback from the task-specific model to guide the generation of new training data via in-context examples.
1 code implementation • 17 Oct 2022 • Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks.
1 code implementation • COLING 2022 • Chenxin An, Ming Zhong, Zhiyong Wu, Qin Zhu, Xuanjing Huang, Xipeng Qiu
Traditional training paradigms for extractive and abstractive summarization systems always only use token-level or sentence-level training objectives.
1 code implementation • 25 Aug 2022 • Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao
This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022.
1 code implementation • 18 Aug 2022 • Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng
One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic.
no code implementations • 10 Aug 2022 • Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng
This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches.
no code implementations • 6 Jul 2022 • Bin Su, Shaoguang Mao, Frank Soong, Zhiyong Wu
The ORARS addresses the MOS prediction problem by pairing a test sample with each of the pre-scored anchored reference samples.
no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng
However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.
1 code implementation • 25 May 2022 • Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong
In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.
1 code implementation • ACL 2022 • Zhiyong Wu, Wei Bi, Xiang Li, Lingpeng Kong, Ben Kao
We propose knowledge internalization (KI), which aims to complement the lexical knowledge into neural dialog models.
1 code implementation • 31 Mar 2022 • Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng
In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence.
1 code implementation • 31 Mar 2022 • Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng
Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task.
no code implementations • 31 Mar 2022 • Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng
Deep neural networks have brought significant advancements to speech emotion recognition (SER).
no code implementations • 24 Mar 2022 • Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng
In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 23 Mar 2022 • Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng
In this paper, we propose a hierarchical framework to model speaking style from context.
2 code implementations • 23 Mar 2022 • Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng
Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention.
2 code implementations • 16 Feb 2022 • Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong
There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs).
no code implementations • 14 Oct 2021 • Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu
These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.
no code implementations • EMNLP 2021 • YingMei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang
Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models.
1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee
We also show that the neural vocoder adopted in the detection framework is dataset-independent.
1 code implementation • 15 Jun 2021 • Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee
Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.
2 code implementations • 11 Jun 2021 • Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su
However, state-of-the-art context modeling methods in conversational TTS only model the textual information in context with a recurrent neural network (RNN).
no code implementations • 1 Jun 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
1 code implementation • ACL 2021 • Lin Zheng, Zhiyong Wu, Lingpeng Kong
Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks.
no code implementations • ACL 2021 • Zhiyong Wu, Lingpeng Kong, Wei Bi, Xiang Li, Ben Kao
A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.
no code implementations • 14 Apr 2021 • Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng
Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS).
no code implementations • 8 Apr 2021 • Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng
This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.
no code implementations • 5 Apr 2021 • Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu
The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.
no code implementations • 14 Feb 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee
Automatic speaker verification (ASV) is one of the core technologies in biometric identification.
no code implementations • 30 Jan 2021 • Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng
To increase the robustness of highly controllable style transfer on multiple factors in VC, we propose a disentangled speech representation learning framework based on adversarial learning.
no code implementations • 1 Jan 2021 • Zhiyong Wu, Lingpeng Kong, Ben Kao
A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.
no code implementations • 21 Dec 2020 • Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng
By using deep learning approaches, Speech Emotion Recog-nition (SER) on a single domain has achieved many excellentresults.
no code implementations • 13 Dec 2020 • Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng
Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • YingMei Guo, Zhiyong Wu, Mingxing Xu
Unlike non-conversation scenes, emotion recognition in dialogues (ERD) poses more complicated challenges due to its interactive nature and intricate contextual information.
no code implementations • 28 Oct 2020 • Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng
Non-autoregressive (NAR) transformer models have achieved significantly inference speedup but at the cost of inferior accuracy compared to autoregressive (AR) models in automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 26 Oct 2020 • Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu
Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.
no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng
Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.
no code implementations • 26 May 2020 • Dongyang Dai, Li Chen, Yu-Ping Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang
Firstly, the speech synthesis model is pre-trained with both multi-speaker clean data and noisy augmented data; then the pre-trained model is adapted on noisy low-resource new speaker data; finally, by setting the clean speech condition, the model can synthesize the new speaker's clean voice.
1 code implementation • 2020 • Yongkun Li, Zhiyong Wu, Shuai Lin, Hong Xie, Min Lv, Yinlong Xu, John C. S. Lui
Random walk is widely applied to sample large-scale graphs due to its simplicity of implementation and solid theoretical foundations of bias analysis.
1 code implementation • ACL 2020 • Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu
However, this approach of evaluating a language model is undermined by the uncertainty of the amount of knowledge that is learned by the probe itself.
no code implementations • 23 Oct 2019 • Xingchen Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng
Our best systems achieve a relative improvement of 11. 9% and 8. 3% on the TIMIT and WSJ tasks respectively.
no code implementations • 15 Apr 2017 • Zhiqian Zhang, Chenliang Li, Zhiyong Wu, Aixin Sun, Dengpan Ye, Xiangyang Luo
Inspired by the recent success of neural networks in many areas, in this paper, we present a simple but effective neural network framework for next POI recommendation, named NEXT.
no code implementations • 17 Nov 2016 • Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace.
no code implementations • 23 Sep 2013 • Xin Zheng, Zhiyong Wu, Helen Meng, Weifeng Li, Lianhong Cai
In this paper, we first present a new variant of Gaussian restricted Boltzmann machine (GRBM) called multivariate Gaussian restricted Boltzmann machine (MGRBM), with its definition and learning algorithm.