Search Results for author: Zhiyong Wu

Found 96 papers, 39 papers with code

Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis

no code implementations • COLING 2022 • Xueyuan Chen, Shun Lei, Zhiyong Wu, Dong Xu, Weifeng Zhao, Helen Meng

On top of these, a bi-reference attention mechanism is used to align both local-scale reference style embedding sequence and local-scale context style embedding sequence with corresponding phoneme embedding sequence.

Speech Synthesis

Paper
Add Code

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

no code implementations • 12 Jun 2024 • Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng

Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech.

Paper
Add Code

When Fuzzing Meets LLMs: Challenges and Opportunities

no code implementations • 25 Apr 2024 • Yu Jiang, Jie Liang, Fuchen Ma, Yuanliang Chen, Chijin Zhou, Yuheng Shen, Zhiyong Wu, Jingzhou Fu, Mingzhe Wang, Shanshan Li, Quan Zhang

Fuzzing, a widely-used technique for bug detection, has seen advancements through Large Language Models (LLMs).

Paper
Add Code

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

1 code implementation • CVPR 2024 • Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu

While previous works mostly generate structural human skeletons, resulting in the omission of appearance information, we focus on the direct generation of audio-driven co-speech gesture videos in this work.

Video Generation

Paper
Code

A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

1 code implementation • 21 Mar 2024 • Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, XiaoLi Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains.

175

Paper
Code

RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

1 code implementation • 8 Mar 2024 • Peng Liu, Dongyang Dai, Zhiyong Wu

In this study, we introduce RFWave, a cutting-edge multi-band Rectified Flow approach designed to reconstruct high-fidelity audio waveforms from Mel-spectrograms or discrete tokens.

Audio Generation Computational Efficiency

Paper
Code

LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs

1 code implementation • 19 Feb 2024 • Kai Wang, Yuwei Xu, Zhiyong Wu, Siqiang Luo

Knowledge Graph (KG) inductive reasoning, which aims to infer missing facts from new KGs that are not seen during training, has been widely adopted in various applications.

Knowledge Graphs

Paper
Code

TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

no code implementations • 15 Feb 2024 • Yaoxiang Wang, Zhiyong Wu, Junfeng Yao, Jinsong Su

The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks.

Paper
Add Code

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

1 code implementation • 12 Feb 2024 • Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong

Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents.

1,346

Paper
Code

SCNet: Sparse Compression Network for Music Source Separation

2 code implementations • 24 Jan 2024 • Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng

We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information.

Ranked #3 on Music Source Separation on MUSDB18-HQ

Music Source Separation

Paper
Code

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

1 code implementation • 17 Jan 2024 • Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

In our preliminary study, we have discovered a key challenge in developing visual GUI agents: GUI grounding -- the capacity to accurately locate screen elements based on instructions.

122

Paper
Code

Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation

no code implementations • 15 Jan 2024 • Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng

Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention.

Music Generation

Paper
Add Code

Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

no code implementations • 7 Jan 2024 • Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

To tackle these issues, we introduce FreeTalker, which, to the best of our knowledge, is the first framework for the generation of both spontaneous (e. g., co-speech gesture) and non-spontaneous (e. g., moving around the podium) speaker motions.

Gesture Generation

Paper
Add Code

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation

no code implementations • 24 Dec 2023 • Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng

In this paper, we present CaRE-SEP, a consistent and relevant embedding network for general sound separation to encourage a comprehensive reconsideration of query usage in audio separation.

Paper
Add Code

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

no code implementations • 19 Dec 2023 • Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng

Both objective and subjective evaluations demonstrate that our proposed method can effectively improve the naturalness and expressiveness of the synthesized speech in audiobook synthesis especially for the role and out-of-domain scenarios.

Decoder Speech Synthesis

Paper
Add Code

SimCalib: Graph Neural Network Calibration based on Similarity between Nodes

no code implementations • 19 Dec 2023 • Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng

A novel calibration framework, named SimCalib, is accordingly proposed to consider similarity between nodes at global and local levels.

Graph Neural Network

Paper
Add Code

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

no code implementations • 18 Dec 2023 • Zilin Wang, Haolin Zhuang, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen, Yu Yang, Boshi Tang, Zhiyong Wu

This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to address the exploration capability deficiency in existing music-conditioned 3D dance generation models.

Paper
Add Code

Stable Score Distillation for High-Quality 3D Generation

no code implementations • 14 Dec 2023 • Boshi Tang, Jianan Wang, Zhiyong Wu, Lei Zhang

Although Score Distillation Sampling (SDS) has exhibited remarkable performance in conditional 3D content generation, a comprehensive understanding of its formulation is still lacking, hindering the development of 3D generation.

3D Generation

Paper
Add Code

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models

no code implementations • 15 Nov 2023 • Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu

Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e. g., chemical molecular formula).

World Knowledge

Paper
Add Code

How Vocabulary Sharing Facilitates Multilingualism in LLaMA?

no code implementations • 15 Nov 2023 • Fei Yuan, Shuai Yuan, Zhiyong Wu, Lei LI

Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages.

Paper
Add Code

DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification

no code implementations • 18 Oct 2023 • Yuanyuan Wang, Yang Zhang, Zhiyong Wu, Zhihan Yang, Tao Wei, Kun Zou, Helen Meng

Existing augmentation methods for speaker verification manipulate the raw signal, which are time-consuming and the augmented samples lack diversity.

Data Augmentation Speaker Verification

Paper
Add Code

AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation

no code implementations • 11 Oct 2023 • Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, HaoZhi Huang

Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles.

Paper
Add Code

DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for Accelerated Seq2Seq Diffusion Models

1 code implementation • 9 Oct 2023 • Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

Diffusion models have gained prominence in generating high-quality sequences of text.

689

Paper
Code

EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling

1 code implementation • 7 Oct 2023 • Siyu Ren, Zhiyong Wu, Kenny Q. Zhu

In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling.

Language Modelling

104

Paper
Code

Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

1 code implementation • 30 Sep 2023 • Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong

Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge.

World Knowledge

Paper
Code

A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis

no code implementations • 21 Sep 2023 • Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang

More interestingly, although we aim at the synthesis effect of the style transfer model, the synthesized speech by the proposed text prosodic analysis model is even better than the style transfer from the original speech in some user evaluation indicators.

Emotion Recognition Speech Synthesis +1

Paper
Add Code

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias

no code implementations • 14 Sep 2023 • Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng

However, it is still challenging to train a universal vocoder which can generalize well to out-of-domain (OOD) scenarios, such as unseen speaking styles, non-speech vocalization, singing, and musical pieces.

Audio Synthesis Generative Adversarial Network +1

Paper
Add Code

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

1 code implementation • 13 Sep 2023 • Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai

The automatic co-speech gesture generation draws much attention in computer animation.

Gesture Generation

Paper
Code

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

no code implementations • 4 Sep 2023 • Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

no code implementations • 4 Sep 2023 • Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng

Recently, excellent progress has been made in speech recognition.

Domain Generalization speech-recognition +1

Paper
Add Code

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

no code implementations • 1 Sep 2023 • Shaohuan Zhou, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng

Specifically, in the pre-training step, we design a phoneme predictor to produce the frame-level phoneme probability vectors as the phonemic timing information and a speaker encoder to model the timbre variations of different singers, and directly estimate the frame-level f0 values from the audio to provide the pitch information.

Singing Voice Synthesis Unsupervised Pre-training

Paper
Add Code

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

no code implementations • 31 Aug 2023 • Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng

This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice.

Singing Voice Synthesis

Paper
Add Code

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

no code implementations • 31 Aug 2023 • Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech.

Decoder Multi-Task Learning

Paper
Add Code

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis

no code implementations • 31 Aug 2023 • Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng

The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style.

Expressive Speech Synthesis Sentence +1

Paper
Add Code

The DiffuseStyleGesture+ entry to the GENEA Challenge 2023

1 code implementation • 26 Aug 2023 • Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai

In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures.

131

Paper
Code

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

no code implementations • 9 Aug 2023 • Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao

With our essential designs on facial style learning, our model is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer it onto a personalized image renderer in a zero-shot manner.

Decoder Style Transfer +1

Paper
Add Code

Can We Edit Factual Knowledge by In-Context Learning?

2 code implementations • 22 May 2023 • Ce Zheng, Lei LI, Qingxiu Dong, Yuxuan Fan, Zhiyong Wu, Jingjing Xu, Baobao Chang

Inspired by in-context learning (ICL), a new paradigm based on demonstration contexts without parameter updating, we explore whether ICL can edit factual knowledge.

In-Context Learning knowledge editing

Paper
Code

ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs

no code implementations • 18 May 2023 • Xingchen Song, Di wu, BinBin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu

In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without any accuracy loss}.

Paper
Add Code

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

1 code implementation • CVPR 2023 • Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang

Levenshtein distance based on audio quantization as a similarity metric of corresponding speech of gestures helps match more appropriate gestures with speech, and solves the alignment problem of speech and gestures well.

Gesture Generation Quantization

Paper
Code

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

no code implementations • 16 May 2023 • Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng

Experimental results show that our proposed method achieves comparable similarity and higher naturalness than the supervised method, which needs a huge amount of annotated corpora for training and is applicable to improve similarity for VC methods with other SSL representations as input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

CB-Conformer: Contextual biasing Conformer for biased word recognition

1 code implementation • 19 Apr 2023 • Yaoxun Xu, Baiji Liu, Qiaochu Huang and, Xingchen Song, Zhiyong Wu, Shiyin Kang, Helen Meng

In this work, we propose CB-Conformer to improve biased word recognition by introducing the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.

Automatic Speech Recognition Language Modelling +2

Paper
Code

OpenICL: An Open-Source Framework for In-context Learning

3 code implementations • 6 Mar 2023 • Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Jiangtao Feng, Jingjing Xu, Yu Qiao, Zhiyong Wu

However, the implementation of ICL is sophisticated due to the diverse retrieval and inference methods involved, as well as the varying pre-processing requirements for different models, datasets, and tasks.

In-Context Learning Language Modelling +4

512

Paper
Code

Compositional Exemplars for In-context Learning

1 code implementation • 11 Feb 2023 • Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, Lingpeng Kong

The performance of ICL is highly dominated by the quality of the selected in-context examples.

Code Generation Contrastive Learning +6

Paper
Code

In-Context Learning with Many Demonstration Examples

1 code implementation • 9 Feb 2023 • Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong

Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data.

16k 8k +2

Paper
Code

A Survey on In-context Learning

1 code implementation • 31 Dec 2022 • Qingxiu Dong, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu sun, Jingjing Xu, Lei LI, Zhifang Sui

With the increasing ability of large language models (LLMs), in-context learning (ICL) has become a new paradigm for natural language processing (NLP), where LLMs make predictions only based on contexts augmented with a few examples.

In-Context Learning

760

Paper
Code

Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering

1 code implementation • 20 Dec 2022 • Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, Lingpeng Kong

Despite the surprising few-shot performance of in-context learning (ICL), it is still a common practice to randomly sample examples to serve as context.

In-Context Learning

Paper
Code

Explanation Regeneration via Information Bottleneck

1 code implementation • 19 Dec 2022 • Qintong Li, Zhiyong Wu, Lingpeng Kong, Wei Bi

Explaining the black-box predictions of NLP models naturally and accurately is an important open problem in natural language generation.

Explanation Generation Language Modelling +2

Paper
Code

Lexicon-injected Semantic Parsing for Task-Oriented Dialog

no code implementations • 26 Nov 2022 • Xiaojun Meng, Wenlin Dai, Yasheng Wang, Baojun Wang, Zhiyong Wu, Xin Jiang, Qun Liu

Then we present a novel lexicon-injected semantic parser, which collects slot labels of tree representation as a lexicon, and injects lexical features to the span representation of parser.

Semantic Parsing

Paper
Add Code

Unsupervised Explanation Generation via Correct Instantiations

no code implementations • 21 Nov 2022 • Sijie Cheng, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, Lingpeng Kong

The major difficulty is finding the conflict point, where the statement contradicts our real world.

Explanation Generation

Paper
Add Code

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

1 code implementation • 1 Nov 2022 • Xingchen Song, Di wu, Zhiyong Wu, BinBin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu

In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models.

3,810

Paper
Code

FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition

no code implementations • 31 Oct 2022 • Xingchen Song, Di wu, BinBin Zhang, Zhiyong Wu, Wenpeng Li, Dongfang Li, Pengshen Zhang, Zhendong Peng, Fuping Pan, Changbao Zhu, Zhongqin Wu

Therefore, we name it FusionFormer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $β$-VAE

no code implementations • 25 Oct 2022 • Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng

We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation.

Disentanglement Voice Conversion

Paper
Add Code

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

2 code implementations • 22 Oct 2022 • Jiacheng Ye, Jiahui Gao, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong

To improve the quality of dataset synthesis, we propose a progressive zero-shot dataset generation framework, ProGen, which leverages the feedback from the task-specific model to guide the generation of new training data via in-context examples.

Informativeness text-classification +2

Paper
Code

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

1 code implementation • 17 Oct 2022 • Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks.

Text Generation

689

Paper
Code

COLO: A Contrastive Learning based Re-ranking Framework for One-Stage Summarization

1 code implementation • COLING 2022 • Chenxin An, Ming Zhong, Zhiyong Wu, Qin Zhu, Xuanjing Huang, Xipeng Qiu

Traditional training paradigms for extractive and abstractive summarization systems always only use token-level or sentence-level training objectives.

Abstractive Text Summarization Contrastive Learning +2

Paper
Code

The ReprGesture entry to the GENEA Challenge 2022

1 code implementation • 25 Aug 2022 • Sicheng Yang, Zhiyong Wu, Minglei Li, Mengchen Zhao, Jiuxin Lin, Liyang Chen, Weihong Bao

This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022.

Decoder Gesture Generation +1

Paper
Code

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

1 code implementation • 18 Aug 2022 • Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng

One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic.

Disentanglement Voice Conversion

106

Paper
Code

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

no code implementations • 10 Aug 2022 • Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng

This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches.

Style Transfer

Paper
Add Code

Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives

no code implementations • 6 Jul 2022 • Bin Su, Shaoguang Mao, Frank Soong, Zhiyong Wu

The ORARS addresses the MOS prediction problem by pairing a test sample with each of the pre-scored anchored reference samples.

regression

Paper
Add Code

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng

However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.

Open-Ended Question Answering Speaker Verification

Paper
Add Code

Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

2 code implementations • 25 May 2022 • Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.

text-classification Text Classification +1

Paper
Code

Lexical Knowledge Internalization for Neural Dialog Generation

1 code implementation • ACL 2022 • Zhiyong Wu, Wei Bi, Xiang Li, Lingpeng Kong, Ben Kao

We propose knowledge internalization (KI), which aims to complement the lexical knowledge into neural dialog models.

Contrastive Learning

Paper
Code

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

1 code implementation • 31 Mar 2022 • Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng

In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence.

Sentence

Paper
Code

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

1 code implementation • 31 Mar 2022 • Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng

Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task.

Paper
Code

Neural Architecture Search for Speech Emotion Recognition

no code implementations • 31 Mar 2022 • Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng

Deep neural networks have brought significant advancements to speech emotion recognition (SER).

Neural Architecture Search Speech Emotion Recognition

Paper
Add Code

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

no code implementations • 24 Mar 2022 • Xintao Zhao, Feng Liu, Changhe Song, Zhiyong Wu, Shiyin Kang, Deyi Tuo, Helen Meng

In this paper, we proposed an any-to-one VC method using hybrid bottleneck features extracted from CTC-BNFs and CE-BNFs to complement each other advantages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis

no code implementations • 23 Mar 2022 • Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng

In this paper, we propose a hierarchical framework to model speaking style from context.

Expressive Speech Synthesis Knowledge Distillation +1

Paper
Add Code

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

2 code implementations • 23 Mar 2022 • Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng

Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention.

Ranked #5 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Speech Enhancement

225

Paper
Code

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

3 code implementations • 16 Feb 2022 • Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong

There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs).

Knowledge Distillation Natural Language Inference +5

Paper
Code

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

no code implementations • 14 Oct 2021 • Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu

These embeddings, when used as implicit phonetic supplementary information, can alleviate the data shortage of explicit phoneme annotations.

Paper
Add Code

Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding

no code implementations • EMNLP 2021 • YingMei Guo, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang

Although various data augmentation approaches have been proposed to synthesize training data in low-resource target languages, the augmented data sets are often noisy, and thus impede the performance of SLU models.

Data Augmentation Denoising +1

Paper
Add Code

Adversarial Sample Detection for Speaker Verification by Neural Vocoders

1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee

We also show that the neural vocoder adopted in the detection framework is dataset-independent.

Speaker Verification

Paper
Code

Voting for the right answer: Adversarial defense for speaker verification

1 code implementation • 15 Jun 2021 • Haibin Wu, Yang Zhang, Zhiyong Wu, Dong Wang, Hung-Yi Lee

Automatic speaker verification (ASV) is a well developed technology for biometric identification, and has been ubiquitous implemented in security-critic applications, such as banking and access control.

Adversarial Defense Speaker Verification

Paper
Code

Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling

2 code implementations • 11 Jun 2021 • Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, Dan Su

However, state-of-the-art context modeling methods in conversational TTS only model the textual information in context with a recurrent neural network (RNN).

Speech Synthesis Text-To-Speech Synthesis

263

Paper
Code

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning

no code implementations • 1 Jun 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.

Adversarial Defense Adversarial Robustness +2

Paper
Add Code

Cascaded Head-colliding Attention

1 code implementation • ACL 2021 • Lin Zheng, Zhiyong Wu, Lingpeng Kong

Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks.

Language Modelling Machine Translation +1

Paper
Code

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

no code implementations • ACL 2021 • Zhiyong Wu, Lingpeng Kong, Wei Bi, Xiang Li, Ben Kao

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Multimodal Machine Translation Translation

Paper
Add Code

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis

no code implementations • 14 Apr 2021 • Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng

Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS).

Dependency Parsing Representation Learning +3

Paper
Add Code

Towards Multi-Scale Style Control for Expressive Speech Synthesis

no code implementations • 8 Apr 2021 • Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng

This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.

Expressive Speech Synthesis Style Transfer

Paper
Add Code

The Multi-speaker Multi-style Voice Cloning Challenge 2021

no code implementations • 5 Apr 2021 • Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.

Benchmarking Voice Cloning

Paper
Add Code

Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

no code implementations • 14 Feb 2021 • Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee

Automatic speaker verification (ASV) is one of the core technologies in biometric identification.

Adversarial Defense Open-Ended Question Answering +2

Paper
Add Code

Adversarially learning disentangled speech representations for robust multi-factor voice conversion

no code implementations • 30 Jan 2021 • Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng

To increase the robustness of highly controllable style transfer on multiple factors in VC, we propose a disentangled speech representation learning framework based on adversarial learning.

Representation Learning Style Transfer +1

Paper
Add Code

Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation

no code implementations • 1 Jan 2021 • Zhiyong Wu, Lingpeng Kong, Ben Kao

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Multimodal Machine Translation Translation

Paper
Add Code

Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network

no code implementations • 21 Dec 2020 • Xiong Cai, Zhiyong Wu, Kuo Zhong, Bin Su, Dongyang Dai, Helen Meng

By using deep learning approaches, Speech Emotion Recog-nition (SER) on a single domain has achieved many excellentresults.

Speech Emotion Recognition

Paper
Add Code

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

no code implementations • 13 Dec 2020 • Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng

Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels.

Representation Learning Sentence

Paper
Add Code

FERNet: Fine-grained Extraction and Reasoning Network for Emotion Recognition in Dialogues

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • YingMei Guo, Zhiyong Wu, Mingxing Xu

Unlike non-conversation scenes, emotion recognition in dialogues (ERD) poses more complicated challenges due to its interactive nature and intricate contextual information.

Emotion Recognition

Paper
Add Code

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

no code implementations • 28 Oct 2020 • Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, Helen Meng

Non-autoregressive (NAR) transformer models have achieved significantly inference speedup but at the cost of inferior accuracy compared to autoregressive (AR) models in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improving pronunciation assessment via ordinal regression with anchored reference samples

no code implementations • 26 Oct 2020 • Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong Wu

Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations.

regression Sentence

Paper
Add Code

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.

Talking Head Generation

Paper
Add Code

Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

no code implementations • 26 May 2020 • Dongyang Dai, Li Chen, Yu-Ping Wang, Mu Wang, Rui Xia, Xuchen Song, Zhiyong Wu, Yuxuan Wang

Firstly, the speech synthesis model is pre-trained with both multi-speaker clean data and noisy augmented data; then the pre-trained model is adapted on noisy low-resource new speaker data; finally, by setting the clean speech condition, the model can synthesize the new speaker's clean voice.

Decoder Speech Enhancement +1

Paper
Add Code

Walking with Perception: Efficient Random Walk Sampling via Common Neighbor Awareness

1 code implementation • ‏‏‎ ‎ 2020 • Yongkun Li, Zhiyong Wu, Shuai Lin, Hong Xie, Min Lv, Yinlong Xu, John C. S. Lui

Random walk is widely applied to sample large-scale graphs due to its simplicity of implementation and solid theoretical foundations of bias analysis.

Computational Efficiency

694

Paper
Code

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

1 code implementation • ACL 2020 • Zhiyong Wu, Yun Chen, Ben Kao, Qun Liu

However, this approach of evaluating a language model is undermined by the uncertainty of the amount of knowledge that is learned by the probe itself.

Dependency Parsing Language Modelling +2

103

Paper
Code

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

no code implementations • 23 Oct 2019 • Xingchen Song, Guangsen Wang, Zhiyong Wu, Yiheng Huang, Dan Su, Dong Yu, Helen Meng

Our best systems achieve a relative improvement of 11. 9% and 8. 3% on the TIMIT and WSJ tasks respectively.

Representation Learning

Paper
Add Code

NEXT: A Neural Network Framework for Next POI Recommendation

no code implementations • 15 Apr 2017 • Zhiqian Zhang, Chenliang Li, Zhiyong Wu, Aixin Sun, Dengpan Ye, Xiangyang Luo

Inspired by the recent success of neural networks in many areas, in this paper, we present a simple but effective neural network framework for next POI recommendation, named NEXT.

Representation Learning

Paper
Add Code

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition

no code implementations • 17 Nov 2016 • Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai

Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace.

General Classification Speech Emotion Recognition

Paper
Add Code

Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition

no code implementations • 23 Sep 2013 • Xin Zheng, Zhiyong Wu, Helen Meng, Weifeng Li, Lianhong Cai

In this paper, we first present a new variant of Gaussian restricted Boltzmann machine (GRBM) called multivariate Gaussian restricted Boltzmann machine (MGRBM), with its definition and learning algorithm.

Robust Speech Recognition speech-recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.