no code implementations • 18 Jun 2024 • Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, Jing Xiao
Our framework was designed to be communication efficient, computation can be delegated to the local client so that the server's computation burden can be lightened.
no code implementations • 28 May 2024 • Jianzong Wang, Haoxiang Shi, Kaiyi Luo, xulong Zhang, Ning Cheng, Jing Xiao
For unpaired data, to effectively capture the latent discriminative features, the high-order relationships between unpaired data and anchors are embedded into the latent subspace, which are computed by efficient linear reconstruction.
no code implementations • 28 May 2024 • Haoxiang Shi, xulong Zhang, Ning Cheng, Yong Zhang, Jun Yu, Jing Xiao, Jianzong Wang
Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information.
no code implementations • 21 May 2024 • Jing Gao, Ning Cheng, Bin Fang, Wenjuan Han
The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception.
no code implementations • 10 May 2024 • Ning Cheng, Zhaohui Yan, ZiMing Wang, Zhijie Li, Jiaming Yu, Zilong Zheng, Kewei Tu, Jinan Xu, Wenjuan Han
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias.
no code implementations • 30 Apr 2024 • Sheng Ouyang, Jianzong Wang, Yong Zhang, Zhitao Li, ZiQi Liang, xulong Zhang, Ning Cheng, Jing Xiao
Extractive Question Answering (EQA) in Machine Reading Comprehension (MRC) often faces the challenge of dealing with semantically identical but format-variant inputs.
Extractive Question-Answering Machine Reading Comprehension +1
no code implementations • 14 Mar 2024 • Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, Wenjuan Han
Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots.
no code implementations • 8 Mar 2024 • Jianzong Wang, Pengcheng Li, xulong Zhang, Ning Cheng, Jing Xiao
After combining the intent from two domains into a joint representation, the integrated intent representation is fed into a decision layer for classification.
1 code implementation • 1 Feb 2024 • Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, Tianyi Zhou
Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process.
no code implementations • 18 Jan 2024 • Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng, Ming Li, Jing Xiao, Jianzong Wang
Large Language Models (LLMs) have shown significant promise in various applications, including zero-shot and few-shot learning.
no code implementations • 16 Jan 2024 • Bingyuan Zhang, xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao, Jianzong Wang
In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions.
no code implementations • 16 Jan 2024 • Haobin Tang, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels.
no code implementations • 15 Nov 2023 • Jianzong Wang, Yimin Deng, ZiQi Liang, xulong Zhang, Ning Cheng, Jing Xiao
This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding.
no code implementations • 23 Oct 2023 • Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, Jing Xiao
The Retrieval Question Answering (ReQA) task employs the retrieval-augmented framework, composed of a retriever and generator.
no code implementations • 23 Sep 2023 • Pengyu Zhao, Zijian Jin, Ning Cheng
Due to the powerful capabilities demonstrated by large language model (LLM), there has been a recent surge in efforts to integrate them with AI agents to enhance their performance.
no code implementations • 16 Sep 2023 • Yazhong Si, xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, Jing Xiao
Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios.
no code implementations • 14 Sep 2023 • Zipeng Qi, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
Generating realistic talking faces is a complex and widely discussed task with numerous applications.
no code implementations • 28 Aug 2023 • xulong Zhang, Jianzong Wang, Ning Cheng, Yifu Sun, Chuanyao Zhang, Jing Xiao
The rise of the phenomenon of the "right to be forgotten" has prompted research on machine unlearning, which grants data owners the right to actively withdraw data that has been used for model training, and requires the elimination of the contribution of that data to the model.
2 code implementations • 23 Aug 2023 • Ming Li, Yong Zhang, Zhitao Li, Jiuhai Chen, Lichang Chen, Ning Cheng, Jianzong Wang, Tianyi Zhou, Jing Xiao
In the realm of Large Language Models (LLMs), the balance between instruction data quality and quantity is a focal point.
no code implementations • 7 Aug 2023 • Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao, Ning Cheng, Fengying Yu, Jing Xiao
Conversational Question Answering (CQA) is a challenging task that aims to generate natural answers for conversational flow questions.
no code implementations • 7 Aug 2023 • Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng, Jing Xiao
Chinese Automatic Speech Recognition (ASR) error correction presents significant challenges due to the Chinese language's unique features, including a large character set and borderless, morpheme-based structure.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 3 Jul 2023 • Xiang Wei, Yufeng Chen, Ning Cheng, Xingyu Cui, Jinan Xu, Wenjuan Han
In order to construct or extend entity-centric and event-centric knowledge graphs (KG and EKG), the information extraction (IE) annotation toolkit is essential.
no code implementations • 15 Mar 2023 • Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Zhitao Li, Jing Xiao
Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging.
no code implementations • 15 Mar 2023 • Tong Ye, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao
Deep neural networks have achieved remarkable performance in retrieval-based dialogue systems, but they are shown to be ill calibrated.
no code implementations • 14 Mar 2023 • Haobin Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected.
no code implementations • 14 Mar 2023 • xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao
Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models.
no code implementations • 14 Mar 2023 • Kexin Zhu, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Using deep learning methods to classify EEG signals can accurately identify people's emotions.
1 code implementation • 20 Feb 2023 • Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, Wenjuan Han
Zero-shot information extraction (IE) aims to build IE systems from the unannotated text.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Mengyuan Zhao, Zhiyong Zhang, Jing Xiao
We also find that in joint CTC-Attention ASR model, decoder is more sensitive to linguistic information than acoustic information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
In this paper, we proposed Adapitch, a multi-speaker TTS method that makes adaptation of the supervised module with untranscribed data.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Recent advances in pre-trained language models have improved the performance for text classification tasks.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao
In this work, we proposed two kinds of masking approaches: (1) speech-level masking, making the model to mask more speech segments than silence segments, (2) phoneme-level masking, forcing the model to mask the whole frames of the phoneme, instead of phoneme pieces.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly connected and entered.
no code implementations • 13 Oct 2022 • Aolan Sun, xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao
Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools.
no code implementations • 30 Sep 2022 • Denghao Li, Yuqiao Zeng, Jianzong Wang, Lingwei Kong, Zhangcheng Huang, Ning Cheng, Xiaoyang Qu, Jing Xiao
Buddhism is an influential religion with a long-standing history and profound philosophy.
no code implementations • 21 Sep 2022 • Shijing Si, Jianzong Wang, xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao
Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios.
1 code implementation • 18 Aug 2022 • Sicheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng
One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic.
no code implementations • 8 Aug 2022 • Huaizhen Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao
In this paper, a novel voice conversion framework, named $\boldsymbol T$ext $\boldsymbol G$uided $\boldsymbol A$utoVC(TGAVC), is proposed to more effectively separate content and timbre from speech, where an expected content embedding produced based on the text transcriptions is designed to guide the extraction of voice content.
1 code implementation • 27 Jun 2022 • Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Jing Xiao
In this work, we investigate the uncertainty calibration for deep audio classifiers.
no code implementations • 28 May 2022 • Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao
The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 28 May 2022 • Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao
In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.
no code implementations • 24 Feb 2022 • Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao
In this paper, we propose a novel method by directly extracting the coreference and omission relationship from the self-attention weight matrix of the transformer instead of word embeddings and edit the original text accordingly to generate the complete utterance.
no code implementations • 22 Feb 2022 • Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng, Jing Xiao
The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history.
no code implementations • 29 Sep 2021 • Tang huaizhen, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Voice conversion(VC) aims to convert one speaker's voice to generate a new speech as it is said by another speaker.
no code implementations • 10 Jul 2021 • Shijing Si, Jianzong Wang, Xiaoyang Qu, Ning Cheng, Wenqi Wei, Xinghua Zhu, Jing Xiao
This paper investigates a novel task of talking face video generation solely from speeches.
no code implementations • 9 Jul 2021 • Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive.
no code implementations • 9 Jul 2021 • Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
We evaluated the proposed methods on phoneme classification and speaker recognition tasks.
no code implementations • 23 Feb 2021 • Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao
We propose a novel network structure, called Memory-Self-Attention (MSA) Transducer.
no code implementations • 22 Feb 2021 • Yanfei Hui, Jianzong Wang, Ning Cheng, Fengying Yu, Tianbo Wu, Jing Xiao
Slot filling and intent detection have become a significant theme in the field of natural language understanding.
no code implementations • 22 Dec 2020 • Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu
To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages.
no code implementations • 3 Dec 2020 • Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Lingwei Kong, Jing Xiao
Graph-to-sequence model is proposed and formed by a graph encoder and an attentional decoder.
3 code implementations • 3 Dec 2020 • Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms.
no code implementations • 18 Aug 2020 • Wenqi Wei, Jianzong Wang, Jiteng Ma, Ning Cheng, Jing Xiao
The structure of our model are maintained concise to be implemented for real-time applications.
no code implementations • 13 Aug 2020 • Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao
The MLNET leveraged multi-branches to extract multiple contextual speech information and investigated an effective attention block to weight the most crucial parts of the context for final classification.
no code implementations • 13 Aug 2020 • Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao
However, the increased complexity of a model can also introduce high risk of over-fitting, which is a major challenge in SLU tasks due to the limitation of available data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 13 Aug 2020 • Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao
Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together.
no code implementations • LREC 2020 • Ning Cheng, Bin Li, Liming Xiao, Changwei Xu, Sijia Ge, Xingyue Hao, Minxuan Feng
The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition.
no code implementations • 9 Apr 2020 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation.
2 code implementations • 4 Mar 2020 • Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao
Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel.
no code implementations • 4 Mar 2020 • Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Jing Xiao
This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms.