Search Results for author: Ning Cheng

Found 22 papers, 3 papers with code

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

no code implementations28 May 2022 Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language.

Automatic Speech Recognition

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

no code implementations28 May 2022 Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao

In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.

Keyword Spotting

Self-Attention for Incomplete Utterance Rewriting

no code implementations24 Feb 2022 Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, we propose a novel method by directly extracting the coreference and omission relationship from the self-attention weight matrix of the transformer instead of word embeddings and edit the original text accordingly to generate the complete utterance.

Word Embeddings

VU-BERT: A Unified framework for Visual Dialog

no code implementations22 Feb 2022 Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng, Jing Xiao

The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history.

Language Modelling Masked Language Modeling +1

ClsVC: Learning Speech Representations with two different classification tasks.

no code implementations29 Sep 2021 Tang huaizhen, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Voice conversion(VC) aims to convert one speaker's voice to generate a new speech as it is said by another speaker.

Classification Voice Conversion

Loss Prediction: End-to-End Active Learning Approach For Speech Recognition

no code implementations9 Jul 2021 Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive.

Active Learning Automatic Speech Recognition

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

no code implementations22 Dec 2020 Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages.

Speech Recognition

MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution

4 code implementations3 Dec 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms.

Large-scale Transfer Learning for Low-resource Spoken Language Understanding

no code implementations13 Aug 2020 Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao

However, the increased complexity of a model can also introduce high risk of over-fitting, which is a major challenge in SLU tasks due to the limitation of available data.

Automatic Speech Recognition Spoken Language Understanding +1

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

no code implementations13 Aug 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together.

Language Modelling Prosody Prediction +1

MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

no code implementations13 Aug 2020 Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

The MLNET leveraged multi-branches to extract multiple contextual speech information and investigated an effective attention block to weight the most crucial parts of the context for final classification.

Action Detection Activity Detection

Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model

no code implementations LREC 2020 Ning Cheng, Bin Li, Liming Xiao, Changwei Xu, Sijia Ge, Xingyue Hao, Minxuan Feng

The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition.

Lexical Analysis named-entity-recognition +3

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

no code implementations9 Apr 2020 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation.

Artist classification Music Generation +1

GraphTTS: graph-to-sequence modelling in neural text-to-speech

no code implementations4 Mar 2020 Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Jing Xiao

This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms.

Graph Embedding Graph-to-Sequence +1

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

2 code implementations4 Mar 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao

Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel.

Cannot find the paper you are looking for? You can Submit a new open access paper.