no code implementations • COLING 2022 • Xueyuan Chen, Shun Lei, Zhiyong Wu, Dong Xu, Weifeng Zhao, Helen Meng
On top of these, a bi-reference attention mechanism is used to align both local-scale reference style embedding sequence and local-scale context style embedding sequence with corresponding phoneme embedding sequence.
no code implementations • 12 Jun 2024 • Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech.
no code implementations • 25 Apr 2024 • Yu Jiang, Jie Liang, Fuchen Ma, Yuanliang Chen, Chijin Zhou, Yuheng Shen, Zhiyong Wu, Jingzhou Fu, Mingzhe Wang, Shanshan Li, Quan Zhang
Fuzzing, a widely-used technique for bug detection, has seen advancements through Large Language Models (LLMs).
1 code implementation • CVPR 2024 • Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu
While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work.
1 code implementation • 21 Mar 2024 • Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, XiaoLi Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu
Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains.
1 code implementation • 8 Mar 2024 • Peng Liu, Dongyang Dai, Zhiyong Wu
In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow approach designed to reconstruct high-fidelity audio waveforms from Mel-spectrograms or discrete tokens.
1 code implementation • 19 Feb 2024 • Kai Wang, Yuwei Xu, Zhiyong Wu, Siqiang Luo
Knowledge Graph (KG) inductive reasoning, which aims to infer missing facts from new KGs that are not seen during training, has been widely adopted in various applications.
no code implementations • 15 Feb 2024 • Yaoxiang Wang, Zhiyong Wu, Junfeng Yao, Jinsong Su
The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks.
1 code implementation • 12 Feb 2024 • Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong
Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents.
2 code implementations • 24 Jan 2024 • Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng
We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information.
Ranked #3 on Music Source Separation on MUSDB18-HQ
1 code implementation • 17 Jan 2024 • Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu
In our preliminary study, we have discovered a key challenge in developing visual GUI agents: GUI grounding -- the capacity to accurately locate screen elements based on instructions.
no code implementations • 15 Jan 2024 • Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng
Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention.
no code implementations • 7 Jan 2024 • Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu
To tackle these issues, we introduce FreeTalker, which, to the best of our knowledge, is the first framework for the generation of both spontaneous (e. g., co-speech gesture) and non-spontaneous (e. g., moving around the podium) speaker motions.
no code implementations • 24 Dec 2023 • Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng
In this paper, we present CaRE-SEP, a consistent and relevant embedding network for general sound separation to encourage a comprehensive reconsideration of query usage in audio separation.
no code implementations • 19 Dec 2023 • Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng
Both objective and subjective evaluations demonstrate that our proposed method can effectively improve the naturalness and expressiveness of the synthesized speech in audiobook synthesis especially for the role and out-of-domain scenarios.
no code implementations • 19 Dec 2023 • Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng
A novel calibration framework, named SimCalib, is accordingly proposed to consider similarity between nodes at global and local levels.
no code implementations • 18 Dec 2023 • Zilin Wang, Haolin Zhuang, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen, Yu Yang, Boshi Tang, Zhiyong Wu
This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to address the exploration capability deficiency in existing music-conditioned 3D dance generation models.
no code implementations • 14 Dec 2023 • Boshi Tang, Jianan Wang, Zhiyong Wu, Lei Zhang
Although Score Distillation Sampling (SDS) has exhibited remarkable performance in conditional 3D content generation, a comprehensive understanding of its formulation is still lacking, hindering the development of 3D generation.
no code implementations • 15 Nov 2023 • Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu
Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e. g., chemical molecular formula).
no code implementations • 15 Nov 2023 • Fei Yuan, Shuai Yuan, Zhiyong Wu, Lei LI
Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages.
no code implementations • 18 Oct 2023 • Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng
Existing augmentation methods for speaker verification manipulate the raw signal, which are time-consuming and the augmented samples lack diversity.
no code implementations • 11 Oct 2023 • Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, HaoZhi Huang
Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles.
1 code implementation • 9 Oct 2023 • Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
Diffusion models have gained prominence in generating high-quality sequences of text.
1 code implementation • 7 Oct 2023 • Siyu Ren, Zhiyong Wu, Kenny Q. Zhu
In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling.
1 code implementation • 30 Sep 2023 • Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong
Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge.
no code implementations • 21 Sep 2023 • Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang
More interestingly, although we aim at the synthesis effect of the style transfer model, the synthesized speech by the proposed text prosodic analysis model is even better than the style transfer from the original speech in some user evaluation indicators.
no code implementations • 14 Sep 2023 • Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng
However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces.
1 code implementation • 13 Sep 2023 • Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai
The automatic co-speech gesture generation draws much attention in computer animation.
no code implementations • 4 Sep 2023 • Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng
Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 4 Sep 2023 • Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng
Recently, excellent progress has been made in speech recognition.
no code implementations • 1 Sep 2023 • Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng
Specifically, in the pre-training step, we design a phoneme predictor to produce the frame-level phoneme probability vectors as the phonemic timing information and a speaker encoder to model the timbre variations of different singers, and directly estimate the frame-level f0 values from the audio to provide the pitch information.
no code implementations • 31 Aug 2023 • Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng
This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice.
no code implementations • 31 Aug 2023 • Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng
For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech.
no code implementations • 31 Aug 2023 • Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng
The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style.
1 code implementation • 26 Aug 2023 • Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai
In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures.
no code implementations • 9 Aug 2023 • Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao
With our essential designs on facial style learning, our model is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer it onto a personalized image renderer in a zero-shot manner.
2 code implementations • 22 May 2023 • Ce Zheng, Lei LI, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, Baobao Chang
Inspired by in-context learning (ICL), a new paradigm based on demonstration contexts without parameter updating, we explore whether ICL can edit factual knowledge.
no code implementations • 18 May 2023 • Xingchen Song, Di wu, BinBin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu
In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}.
1 code implementation • CVPR 2023 • Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang
Levenshtein distance based on audio quantization as a similarity metric of corresponding speech of gestures helps match more appropriate gestures with speech, and solves the alignment problem of speech and gestures well.
no code implementations • 16 May 2023 • Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng
Experimental results show that our proposed method achieves comparable similarity and higher naturalness than the supervised method, which needs a huge amount of annotated corpora for training and is applicable to improve similarity for VC methods with other SSL representations as input.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 19 Apr 2023 • Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng
In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.
3 code implementations • 6 Mar 2023 • Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Jiangtao Feng, Jingjing Xu, Yu Qiao, Zhiyong Wu
However, the implementation of ICL is sophisticated due to the diverse retrieval and inference methods involved, as well as the varying pre-processing requirements for different models, datasets, and tasks.
1 code implementation • 11 Feb 2023 • Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, Lingpeng Kong
The performance of ICL is highly dominated by the quality of the selected in-context examples.
1 code implementation • 9 Feb 2023 • Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong
Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data.
1 code implementation • 31 Dec 2022 • Qingxiu Dong, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu sun, Jingjing Xu, Lei LI, Zhifang Sui
With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few examples.
1 code implementation • 20 Dec 2022 • Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, Lingpeng Kong
Despite the surprising few-shot performance of in-context learning (ICL), it is still a common practice to randomly sample examples to serve as context.
1 code implementation • 19 Dec 2022 • Qintong Li, Zhiyong Wu, Lingpeng Kong, Wei Bi
Explaining the black-box predictions of NLP models naturally and accurately is an important open problem in natural language generation.
no code implementations • 26 Nov 2022 • Xiaojun Meng, Wenlin Dai, Yasheng Wang, Baojun Wang, Zhiyong Wu, Xin Jiang, Qun Liu
Then we present a novel lexicon-injected semantic parser, which collects slot labels of tree representation as a lexicon, and injects lexical features to the span representation of parser.
no code implementations • 21 Nov 2022 • Sijie Cheng, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, Lingpeng Kong
The major difficulty is finding the conflict point, where the statement contradicts our real world.
1 code implementation • 1 Nov 2022 • Xingchen Song, Di wu, Zhiyong Wu, BinBin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu
In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models.
no code implementations • 31 Oct 2022 • Xingchen Song, Di wu, BinBin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu
Therefore, we name it FusionFormer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 25 Oct 2022 • Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng
We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation.
2 code implementations • 22 Oct 2022 • Jiacheng Ye, Jiahui Gao, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong
To improve the quality of dataset synthesis, we propose a progressive zero-shot dataset generation framework, ProGen, which leverages the feedback from the task-specific model to guide the generation of new training data via in-context examples.
1 code implementation • 17 Oct 2022 • Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong
Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks.
1 code implementation • COLING 2022 • Chenxin An, Ming Zhong, Zhiyong Wu, Qin Zhu, Xuanjing Huang, Xipeng Qiu
Traditional training paradigms for extractive and abstractive summarization systems always only use token-level or sentence-level training objectives.
1 code implementation • 25 Aug 2022 • Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao
This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022.
1 code implementation • 18 Aug 2022 • Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng
One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic.
no code implementations • 10 Aug 2022 • Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng
This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches.
no code implementations • 6 Jul 2022 • Bin Su, Shaoguang Mao, Frank Soong, Zhiyong Wu
The ORARS addresses the MOS prediction problem by pairing a test sample with each of the pre-scored anchored reference samples.
no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng
However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.
2 code implementations • 25 May 2022 • Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong
In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.
1 code implementation • ACL 2022 • Zhiyong Wu, Wei Bi, Xiang Li, Lingpeng Kong, Ben Kao
We propose knowledge internalization (KI), which aims to complement the lexical knowledge into neural dialog models.
1 code implementation • 31 Mar 2022 • Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng
In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence.
1 code implementation • 31 Mar 2022 • Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng
Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task.
no code implementations • 31 Mar 2022 • Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng
Deep neural networks have brought significant advancements to speech emotion recognition (SER).
no code implementations • 24 Mar 2022 • Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng
In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 23 Mar 2022 • Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng
In this paper, we propose a hierarchical framework to model speaking style from context.
2 code implementations • 23 Mar 2022 • Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng
Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention.
3 code implementations • 16 Feb 2022 • Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong
There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs).
no code implementations • 14 Oct 2021 • Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu
These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.
no code implementations • EMNLP 2021 • YingMei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang
Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models.
1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee
We also show that the neural vocoder adopted in the detection framework is dataset-independent.
1 code implementation • 15 Jun 2021 • Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee
Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.
2 code implementations • 11 Jun 2021 • Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su
However, state-of-the-art context modeling methods in conversational TTS only model the textual information in context with a recurrent neural network (RNN).
no code implementations • 1 Jun 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
1 code implementation • ACL 2021 • Lin Zheng, Zhiyong Wu, Lingpeng Kong
Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks.
no code implementations • ACL 2021 • Zhiyong Wu, Lingpeng Kong, Wei Bi, Xiang Li, Ben Kao
A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.
no code implementations • 14 Apr 2021 • Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng
Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS).
no code implementations • 8 Apr 2021 • Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng
This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.
no code implementations • 5 Apr 2021 • Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu
The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.
no code implementations • 14 Feb 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee
Automatic speaker verification (ASV) is one of the core technologies in biometric identification.
no code implementations • 30 Jan 2021 • Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng
To increase the robustness of highly controllable style transfer on multiple factors in VC, we propose a disentangled speech representation learning framework based on adversarial learning.
no code implementations • 1 Jan 2021 • Zhiyong Wu, Lingpeng Kong, Ben Kao
A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.
no code implementations • 21 Dec 2020 • Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng
By using deep learning approaches, Speech Emotion Recog-nition (SER) on a single domain has achieved many excellentresults.
no code implementations • 13 Dec 2020 • Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng
Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • YingMei Guo, Zhiyong Wu, Mingxing Xu
Unlike non-conversation scenes, emotion recognition in dialogues (ERD) poses more complicated challenges due to its interactive nature and intricate contextual information.
no code implementations • 28 Oct 2020 • Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng
Non-autoregressive (NAR) transformer models have achieved significantly inference speedup but at the cost of inferior accuracy compared to autoregressive (AR) models in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 26 Oct 2020 • Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu
Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.
no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng
Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.
no code implementations • 26 May 2020 • Dongyang Dai, Li Chen, Yu-Ping Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang
Firstly, the speech synthesis model is pre-trained with both multi-speaker clean data and noisy augmented data; then the pre-trained model is adapted on noisy low-resource new speaker data; finally, by setting the clean speech condition, the model can synthesize the new speaker's clean voice.
1 code implementation • 2020 • Yongkun Li, Zhiyong Wu, Shuai Lin, Hong Xie, Min Lv, Yinlong Xu, John C. S. Lui
Random walk is widely applied to sample large-scale graphs due to its simplicity of implementation and solid theoretical foundations of bias analysis.
1 code implementation • ACL 2020 • Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu
However, this approach of evaluating a language model is undermined by the uncertainty of the amount of knowledge that is learned by the probe itself.
no code implementations • 23 Oct 2019 • Xingchen Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng
Our best systems achieve a relative improvement of 11. 9% and 8. 3% on the TIMIT and WSJ tasks respectively.
no code implementations • 15 Apr 2017 • Zhiqian Zhang, Chenliang Li, Zhiyong Wu, Aixin Sun, Dengpan Ye, Xiangyang Luo
Inspired by the recent success of neural networks in many areas, in this paper, we present a simple but effective neural network framework for next POI recommendation, named NEXT.
no code implementations • 17 Nov 2016 • Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace.
no code implementations • 23 Sep 2013 • Xin Zheng, Zhiyong Wu, Helen Meng, Weifeng Li, Lianhong Cai
In this paper, we first present a new variant of Gaussian restricted Boltzmann machine (GRBM) called multivariate Gaussian restricted Boltzmann machine (MGRBM), with its definition and learning algorithm.