Search Results for author: Ning Cheng

Found 49 papers, 6 papers with code

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

no code implementations15 Nov 2023 Jianzong Wang, Yimin Deng, ZiQi Liang, xulong Zhang, Ning Cheng, Jing Xiao

This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding.

Talking Face Generation

An In-depth Survey of Large Language Model-based Artificial Intelligence Agents

no code implementations23 Sep 2023 Pengyu Zhao, Zijian Jin, Ning Cheng

Due to the powerful capabilities demonstrated by large language model (LLM), there has been a recent surge in efforts to integrate them with AI agents to enhance their performance.

Language Modelling Large Language Model

AOSR-Net: All-in-One Sandstorm Removal Network

no code implementations16 Sep 2023 Yazhong Si, xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, Jing Xiao

Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios.

Image Enhancement Image Restoration

DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks

no code implementations14 Sep 2023 Zipeng Qi, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

Generating realistic talking faces is a complex and widely discussed task with numerous applications.

Face Generation

Machine Unlearning Methodology base on Stochastic Teacher Network

no code implementations28 Aug 2023 xulong Zhang, Jianzong Wang, Ning Cheng, Yifu Sun, Chuanyao Zhang, Jing Xiao

The rise of the phenomenon of the "right to be forgotten" has prompted research on machine unlearning, which grants data owners the right to actively withdraw data that has been used for model training, and requires the elimination of the contribution of that data to the model.

Prompt Guided Copy Mechanism for Conversational Question Answering

no code implementations7 Aug 2023 Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao, Ning Cheng, Fengying Yu, Jing Xiao

Conversational Question Answering (CQA) is a challenging task that aims to generate natural answers for conversational flow questions.

Conversational Question Answering

Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism

no code implementations7 Aug 2023 Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng, Jing Xiao

Chinese Automatic Speech Recognition (ASR) error correction presents significant challenges due to the Chinese language's unique features, including a large character set and borderless, morpheme-based structure.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

CollabKG: A Learnable Human-Machine-Cooperative Information Extraction Toolkit for (Event) Knowledge Graph Construction

no code implementations3 Jul 2023 Xiang Wei, Yufeng Chen, Ning Cheng, Xingyu Cui, Jinan Xu, Wenjuan Han

In order to construct or extend entity-centric and event-centric knowledge graphs (KG and EKG), the information extraction (IE) annotation toolkit is essential.

graph construction Knowledge Graphs +3

On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models

no code implementations15 Mar 2023 Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Zhitao Li, Jing Xiao

Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging.

Retrieval

Efficient Uncertainty Estimation with Gaussian Process for Reliable Dialog Response Retrieval

no code implementations15 Mar 2023 Tong Ye, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao

Deep neural networks have achieved remarkable performance in retrieval-based dialogue systems, but they are shown to be ill calibrated.

Conversational Search Retrieval

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

no code implementations14 Mar 2023 Haobin Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected.

Emotional Speech Synthesis

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

no code implementations14 Mar 2023 xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models.

speech-recognition Speech Recognition

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly connected and entered.

Voice Conversion

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, we proposed Adapitch, a multi-speaker TTS method that makes adaptation of the supervised module with untranscribed data.

Disentanglement

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao

In this work, we proposed two kinds of masking approaches: (1) speech-level masking, making the model to mask more speech segments than silence segments, (2) phoneme-level masking, forcing the model to mask the whole frames of the phoneme, instead of phoneme pieces.

Representation Learning Speaker Recognition

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions.

Speech Synthesis

Improving Imbalanced Text Classification with Dynamic Curriculum Learning

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Recent advances in pre-trained language models have improved the performance for text classification tasks.

Scheduling text-classification +1

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar

no code implementations13 Oct 2022 Aolan Sun, xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao

Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools.

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

no code implementations21 Sep 2022 Shijing Si, Jianzong Wang, xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao

Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios.

Contrastive Learning Voice Conversion

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

no code implementations8 Aug 2022 Huaizhen Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao

In this paper, a novel voice conversion framework, named $\boldsymbol T$ext $\boldsymbol G$uided $\boldsymbol A$utoVC(TGAVC), is proposed to more effectively separate content and timbre from speech, where an expected content embedding produced based on the text transcriptions is designed to guide the extraction of voice content.

Voice Conversion

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

no code implementations28 May 2022 Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

no code implementations28 May 2022 Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao

In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.

Keyword Spotting

Self-Attention for Incomplete Utterance Rewriting

no code implementations24 Feb 2022 Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, we propose a novel method by directly extracting the coreference and omission relationship from the self-attention weight matrix of the transformer instead of word embeddings and edit the original text accordingly to generate the complete utterance.

Word Embeddings

VU-BERT: A Unified framework for Visual Dialog

no code implementations22 Feb 2022 Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng, Jing Xiao

The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history.

Language Modelling Masked Language Modeling +2

Loss Prediction: End-to-End Active Learning Approach For Speech Recognition

no code implementations9 Jul 2021 Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive.

Active Learning Automatic Speech Recognition +2

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

no code implementations22 Dec 2020 Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages.

speech-recognition Speech Recognition

MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution

3 code implementations3 Dec 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms.

MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

no code implementations13 Aug 2020 Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

The MLNET leveraged multi-branches to extract multiple contextual speech information and investigated an effective attention block to weight the most crucial parts of the context for final classification.

Action Detection Activity Detection

Large-scale Transfer Learning for Low-resource Spoken Language Understanding

no code implementations13 Aug 2020 Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao

However, the increased complexity of a model can also introduce high risk of over-fitting, which is a major challenge in SLU tasks due to the limitation of available data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

no code implementations13 Aug 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together.

Language Modelling Prosody Prediction +1

Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model

no code implementations LREC 2020 Ning Cheng, Bin Li, Liming Xiao, Changwei Xu, Sijia Ge, Xingyue Hao, Minxuan Feng

The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition.

Lexical Analysis named-entity-recognition +5

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

no code implementations9 Apr 2020 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation.

Artist classification Music Generation +1

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

2 code implementations4 Mar 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao

Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel.

GraphTTS: graph-to-sequence modelling in neural text-to-speech

no code implementations4 Mar 2020 Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Jing Xiao

This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms.

Graph Embedding Graph-to-Sequence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.