1 code implementation • 27 Jun 2022 • Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao
In this work, we investigate the uncertainty calibration for deep audio classifiers.
no code implementations • 28 May 2022 • Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao
The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language.
no code implementations • 28 May 2022 • Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao
In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.
no code implementations • 24 Feb 2022 • Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao
In this paper, we propose a novel method by directly extracting the coreference and omission relationship from the self-attention weight matrix of the transformer instead of word embeddings and edit the original text accordingly to generate the complete utterance.
no code implementations • 22 Feb 2022 • Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng, Jing Xiao
The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history.
no code implementations • 29 Sep 2021 • Tang huaizhen, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Voice conversion(VC) aims to convert one speaker's voice to generate a new speech as it is said by another speaker.
no code implementations • 10 Jul 2021 • Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu, Jing Xiao
This paper investigates a novel task of talking face video generation solely from speeches.
no code implementations • 9 Jul 2021 • Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
We evaluated the proposed methods on phoneme classification and speaker recognition tasks.
no code implementations • 9 Jul 2021 • Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive.
no code implementations • 23 Feb 2021 • Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
We propose a novel network structure, called Memory-Self-Attention (MSA) Transducer.
no code implementations • 22 Feb 2021 • Yanfei Hui, Jianzong Wang, Ning Cheng, Fengying Yu, Tianbo Wu, Jing Xiao
Slot filling and intent detection have become a significant theme in the field of natural language understanding.
no code implementations • 22 Dec 2020 • Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu
To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages.
no code implementations • 3 Dec 2020 • Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei Kong, Jing Xiao
Graph-to-sequence model is proposed and formed by a graph encoder and an attentional decoder.
4 code implementations • 3 Dec 2020 • Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms.
no code implementations • 18 Aug 2020 • Wenqi Wei, Jianzong Wang, Jiteng Ma, Ning Cheng, Jing Xiao
The structure of our model are maintained concise to be implemented for real-time applications.
no code implementations • 13 Aug 2020 • Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao
However, the increased complexity of a model can also introduce high risk of over-fitting, which is a major challenge in SLU tasks due to the limitation of available data.
Automatic Speech Recognition
Spoken Language Understanding
+1
no code implementations • 13 Aug 2020 • Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together.
no code implementations • 13 Aug 2020 • Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao
The MLNET leveraged multi-branches to extract multiple contextual speech information and investigated an effective attention block to weight the most crucial parts of the context for final classification.
no code implementations • LREC 2020 • Ning Cheng, Bin Li, Liming Xiao, Changwei Xu, Sijia Ge, Xingyue Hao, Minxuan Feng
The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition.
no code implementations • 9 Apr 2020 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation.
no code implementations • 4 Mar 2020 • Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Jing Xiao
This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms.
2 code implementations • 4 Mar 2020 • Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao
Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel.