Search Results for author: Jianzong Wang

Found 103 papers, 10 papers with code

PFID: Privacy First Inference Delegation Framework for LLMs

no code implementations18 Jun 2024 Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, Jing Xiao

Our framework was designed to be communication efficient, computation can be delegated to the local client so that the server's computation burden can be lightened.

Machine Translation

RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

no code implementations28 May 2024 Jianzong Wang, Haoxiang Shi, Kaiyi Luo, xulong Zhang, Ning Cheng, Jing Xiao

For unpaired data, to effectively capture the latent discriminative features, the high-order relationships between unpaired data and anchors are embedded into the latent subspace, which are computed by efficient linear reconstruction.

Cross-Modal Retrieval Retrieval

Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

no code implementations28 May 2024 Haoxiang Shi, xulong Zhang, Ning Cheng, Yong Zhang, Jun Yu, Jing Xiao, Jianzong Wang

Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information.

Contrastive Learning Emotion Classification +1

Task-agnostic Decision Transformer for Multi-type Agent Control with Federated Split Training

no code implementations22 May 2024 Zhiyuan Wang, Bokui Chen, Xiaoyang Qu, Zhenhou Hong, Jing Xiao, Jianzong Wang

Our findings underscore the efficacy of the FSDT framework in effectively leveraging distributed offline reinforcement learning data to enable powerful multi-type agent decision systems.

AI Agent Autonomous Driving +4

QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering

no code implementations30 Apr 2024 Sheng Ouyang, Jianzong Wang, Yong Zhang, Zhitao Li, ZiQi Liang, xulong Zhang, Ning Cheng, Jing Xiao

Extractive Question Answering (EQA) in Machine Reading Comprehension (MRC) often faces the challenge of dealing with semantically identical but format-variant inputs.

Extractive Question-Answering Machine Reading Comprehension +1

Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning

no code implementations24 Apr 2024 Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Jing Xiao

Single-model systems often suffer from deficiencies in tasks such as speaker verification (SV) and image classification, relying heavily on partial prior knowledge during decision-making, resulting in suboptimal performance.

Decision Making Image Classification +2

Retrieval-Augmented Audio Deepfake Detection

no code implementations22 Apr 2024 Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, Jing Xiao, Jianzong Wang

With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse.

Audio Deepfake Detection DeepFake Detection +6

Medical Speech Symptoms Classification via Disentangled Representation

no code implementations8 Mar 2024 Jianzong Wang, Pengcheng Li, xulong Zhang, Ning Cheng, Jing Xiao

After combining the intent from two domains into a joint representation, the integrated intent representation is fed into a decision layer for classification.

Classification

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

1 code implementation1 Feb 2024 Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, Tianyi Zhou

Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process.

Language Modelling

Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers

no code implementations24 Jan 2024 Wei Tao, Shenglin He, Kai Lu, Xiaoyang Qu, Guokuan Li, Jiguang Wan, Jianzong Wang, Jing Xiao

In addition, for patches without outlier values, we utilize value-driven quantization search (VDQS) on the feature maps of their following dataflow branches to reduce search time.

Quantization

P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer

no code implementations22 Jan 2024 Zhiyuan Wang, Xiaoyang Qu, Jing Xiao, Bokui Chen, Jianzong Wang

Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks.

Incremental Learning reinforcement-learning +1

Leveraging Biases in Large Language Models: "bias-kNN'' for Effective Few-Shot Learning

no code implementations18 Jan 2024 Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng, Ming Li, Jing Xiao, Jianzong Wang

Large Language Models (LLMs) have shown significant promise in various applications, including zero-shot and few-shot learning.

Few-Shot Learning In-Context Learning +2

ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis

no code implementations16 Jan 2024 Haobin Tang, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels.

Denoising Emotional Speech Synthesis +1

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

no code implementations16 Jan 2024 Bingyuan Zhang, xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao, Jianzong Wang

In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions.

Denoising Talking Face Generation

GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection

1 code implementation NeurIPS 2023 Jinggang Chen, Junjie Li, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Jing Xiao

This perspective is motivated by our observation that gradient-based attribution methods encounter challenges in assigning feature importance to OOD data, thereby yielding divergent explanation patterns.

Feature Importance Out-of-Distribution Detection

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

no code implementations15 Nov 2023 Jianzong Wang, Yimin Deng, ZiQi Liang, xulong Zhang, Ning Cheng, Jing Xiao

This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding.

Talking Face Generation

AOSR-Net: All-in-One Sandstorm Removal Network

no code implementations16 Sep 2023 Yazhong Si, xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, Jing Xiao

Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios.

Image Enhancement Image Restoration

DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks

no code implementations14 Sep 2023 Zipeng Qi, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

Generating realistic talking faces is a complex and widely discussed task with numerous applications.

Face Generation

Machine Unlearning Methodology base on Stochastic Teacher Network

no code implementations28 Aug 2023 xulong Zhang, Jianzong Wang, Ning Cheng, Yifu Sun, Chuanyao Zhang, Jing Xiao

The rise of the phenomenon of the "right to be forgotten" has prompted research on machine unlearning, which grants data owners the right to actively withdraw data that has been used for model training, and requires the elimination of the contribution of that data to the model.

Machine Unlearning

EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

no code implementations17 Aug 2023 Liang Wang, Nan Zhang, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Guokuan Li, Kaiyu Hu, Guilin Jiang, Jing Xiao

In this paper, we introduce EdgeMA, a practical and efficient video analytics system designed to adapt models to shifts in real-world video streams over time, addressing the data drift problem.

Prompt Guided Copy Mechanism for Conversational Question Answering

no code implementations7 Aug 2023 Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao, Ning Cheng, Fengying Yu, Jing Xiao

Conversational Question Answering (CQA) is a challenging task that aims to generate natural answers for conversational flow questions.

Conversational Question Answering

Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism

no code implementations7 Aug 2023 Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng, Jing Xiao

Chinese Automatic Speech Recognition (ASR) error correction presents significant challenges due to the Chinese language's unique features, including a large character set and borderless, morpheme-based structure.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning

no code implementations27 Jun 2023 Liang Wang, Kai Lu, Nan Zhang, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Guokuan Li, Jing Xiao

This paper proposes Shoggoth, an efficient edge-cloud collaborative architecture, for boosting inference performance on real-time video of changing scenes.

Knowledge Distillation

FedET: A Communication-Efficient Federated Class-Incremental Learning Framework Based on Enhanced Transformer

no code implementations27 Jun 2023 Chenghao Liu, Xiaoyang Qu, Jianzong Wang, Jing Xiao

To address local forgetting caused by new classes of new tasks and global forgetting brought by non-i. i. d (non-independent and identically distributed) class imbalance across different local clients, we proposed an Enhancer distillation method to modify the imbalance between old and new knowledge and repair the non-i. i. d.

class-incremental learning Class Incremental Learning +2

SVVAD: Personal Voice Activity Detection for Speaker Verification

no code implementations31 May 2023 Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao

To address this, we propose a speaker verification-based voice activity detection (SVVAD) framework that can adapt the speech features according to which are most informative for SV.

Action Detection Activity Detection +2

Personalized Federated Learning via Gradient Modulation for Heterogeneous Text Summarization

no code implementations23 Apr 2023 Rongfeng Pan, Jianzong Wang, Lingwei Kong, Zhangcheng Huang, Jing Xiao

To eliminate this concern, we propose a federated learning text summarization scheme, which allows users to share the global model in a cooperative learning manner without sharing raw data.

Personalized Federated Learning Text Summarization

Detecting Out-of-distribution Examples via Class-conditional Impressions Reappearing

no code implementations17 Mar 2023 Jinggang Chen, Xiaoyang Qu, Junjie Li, Jianzong Wang, Jiguang Wan, Jing Xiao

Out-of-distribution (OOD) detection aims at enhancing standard deep neural networks to distinguish anomalous inputs from original training data.

Out of Distribution (OOD) Detection

Efficient Uncertainty Estimation with Gaussian Process for Reliable Dialog Response Retrieval

no code implementations15 Mar 2023 Tong Ye, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao

Deep neural networks have achieved remarkable performance in retrieval-based dialogue systems, but they are shown to be ill calibrated.

Conversational Search Retrieval

On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models

no code implementations15 Mar 2023 Tong Ye, Shijing Si, Jianzong Wang, Ning Cheng, Zhitao Li, Jing Xiao

Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging.

Retrieval

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

no code implementations14 Mar 2023 Haobin Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected.

Emotional Speech Synthesis Sentence +1

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

no code implementations14 Mar 2023 xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models.

Position Sentence +2

Feature-Rich Audio Model Inversion for Data-Free Knowledge Distillation Towards General Sound Classification

no code implementations14 Mar 2023 Zuheng Kang, Yayun He, Jianzong Wang, Junqing Peng, Xiaoyang Qu, Jing Xiao

Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the academic community, especially with major breakthroughs in computer vision.

Data-free Knowledge Distillation Sound Classification

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, we proposed Adapitch, a multi-speaker TTS method that makes adaptation of the supervised module with untranscribed data.

Decoder Disentanglement +1

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao

In this work, we proposed two kinds of masking approaches: (1) speech-level masking, making the model to mask more speech segments than silence segments, (2) phoneme-level masking, forcing the model to mask the whole frames of the phoneme, instead of phoneme pieces.

Representation Learning Speaker Recognition +1

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly connected and entered.

Voice Conversion

Improving Imbalanced Text Classification with Dynamic Curriculum Learning

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Recent advances in pre-trained language models have improved the performance for text classification tasks.

Scheduling text-classification +1

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions.

Speech Synthesis Text to Speech

Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation

no code implementations15 Oct 2022 Chendong Zhao, Jianzong Wang, Xiaoyang Qu, Haoqian Wang, Jing Xiao

Unsupervised representation learning for speech audios attained impressive performances for speech recognition tasks, particularly when annotated speech is limited.

Domain Adaptation Representation Learning +2

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar

no code implementations13 Oct 2022 Aolan Sun, xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao

Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools.

Text to Speech

Pose Guided Human Image Synthesis with Partially Decoupled GAN

no code implementations7 Oct 2022 Jianhan Wu, Jianzong Wang, Shijing Si, Xiaoyang Qu, Jing Xiao

Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose.

Decoder Long-range modeling +1

Machine Unlearning Method Based On Projection Residual

no code implementations30 Sep 2022 Zihao Cao, Jianzong Wang, Shijing Si, Zhangcheng Huang, Jing Xiao

Even when data is removed from the dataset, the effects of these data persist in the model.

Machine Unlearning

RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery

no code implementations30 Sep 2022 Wen Wang, Jianzong Wang, Shijing Si, Zhangcheng Huang, Jing Xiao

The extraction of sequence patterns from a collection of functionally linked unlabeled DNA sequences is known as DNA motif discovery, and it is a key task in computational biology.

reinforcement-learning Reinforcement Learning (RL)

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

no code implementations21 Sep 2022 Shijing Si, Jianzong Wang, xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao

Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios.

Contrastive Learning Voice Conversion

Debias the Black-box: A Fair Ranking Framework via Knowledge Distillation

no code implementations24 Aug 2022 Zhitao Zhu, Shijing Si, Jianzong Wang, Yaodong Yang, Jing Xiao

Deep neural networks can capture the intricate interaction history information between queries and documents, because of their many complicated nonlinear units, allowing them to provide correct search recommendations.

Fairness Information Retrieval +2

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

no code implementations8 Aug 2022 Huaizhen Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao

In this paper, a novel voice conversion framework, named $\boldsymbol T$ext $\boldsymbol G$uided $\boldsymbol A$utoVC(TGAVC), is proposed to more effectively separate content and timbre from speech, where an expected content embedding produced based on the text transcriptions is designed to guide the extraction of voice content.

Voice Conversion

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

no code implementations27 Jun 2022 Zuheng Kang, Junqing Peng, Jianzong Wang, Jing Xiao

Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard.

Speech Emotion Recognition

A Privacy-Preserving Subgraph-Level Federated Graph Neural Network via Differential Privacy

no code implementations7 Jun 2022 Yeqing Qiu, Chenyu Huang, Jianzong Wang, Zhangcheng Huang, Jing Xiao

Currently, the federated graph neural network (GNN) has attracted a lot of attention due to its wide applications in reality without violating the privacy regulations.

Graph Neural Network Privacy Preserving

Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning

no code implementations29 May 2022 Yanxin Song, Jianzong Wang, Tianbo Wu, Zhangcheng Huang, Jing Xiao

Micro-expressions have the characteristics of short duration and low intensity, and it is difficult to train a high-performance classifier with the limited number of existing micro-expressions.

Attribute Contrastive Learning +2

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

no code implementations28 May 2022 Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Speech Augmentation Based Unsupervised Learning for Keyword Spotting

no code implementations28 May 2022 Jian Luo, Jianzong Wang, Ning Cheng, Haobin Tang, Jing Xiao

In our experiments, with augmentation based unsupervised learning, our KWS model achieves better performance than other unsupervised methods, such as CPC, APC, and MPC.

Keyword Spotting

DT-SV: A Transformer-based Time-domain Approach for Speaker Verification

no code implementations26 May 2022 Nan Zhang, Jianzong Wang, Zhenhou Hong, Chendong Zhao, Xiaoyang Qu, Jing Xiao

Therefore, we propose an approach to derive utterance-level speaker embeddings via a Transformer architecture that uses a novel loss function named diffluence loss to integrate the feature information of different Transformer layers.

Speaker Verification

Leveraging Causal Inference for Explainable Automatic Program Repair

no code implementations26 May 2022 Jianzong Wang, Shijing Si, Zhitao Zhu, Xiaoyang Qu, Zhenhou Hong, Jing Xiao

The experiments on four programming languages (Java, C, Python, and JavaScript) show that CPR can generate causal graphs for reasonable interpretations and boost the performance of bug fixing in automatic program repair.

Bug fixing Causal Inference +3

Federated Non-negative Matrix Factorization for Short Texts Topic Modeling with Mutual Information

no code implementations26 May 2022 Shijing Si, Jianzong Wang, Ruiyi Zhang, Qinliang Su, Jing Xiao

Non-negative matrix factorization (NMF) based topic modeling is widely used in natural language processing (NLP) to uncover hidden topics of short text documents.

Federated Learning text-classification +1

Federated Split BERT for Heterogeneous Text Classification

no code implementations26 May 2022 Zhengyang Li, Shijing Si, Jianzong Wang, Jing Xiao

To address this issue, we propose a framework, FedSplitBERT, which handles heterogeneous data and decreases the communication cost by splitting the BERT encoder layers into local part and global part.

Federated Learning Quantization +2

A Fair Federated Learning Framework With Reinforcement Learning

no code implementations26 May 2022 Yaqi Sun, Shijing Si, Jianzong Wang, Yuhan Dong, Zhitao Zhu, Jing Xiao

More importantly, we apply the Gini coefficient and validation accuracy of clients in each communication round to construct a reward function for the reinforcement learning.

Fairness Federated Learning +3

QSpeech: Low-Qubit Quantum Speech Application Toolkit

1 code implementation26 May 2022 Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Chendong Zhao, Wei Tao, Jing Xiao

However, Quantum Neural Network (QNN) running on low-qubit quantum devices would be difficult since it is based on Variational Quantum Circuit (VQC), which requires many qubits.

Text to Speech

Cali3F: Calibrated Fast Fair Federated Recommendation System

no code implementations26 May 2022 Zhitao Zhu, Shijing Si, Jianzong Wang, Jing Xiao

Specific to recommendation systems, many federated recommendation algorithms have been proposed to realize the privacy-preserving collaborative recommendation.

Fairness Federated Learning +2

Augmentation-induced Consistency Regularization for Classification

no code implementations25 May 2022 Jianhan Wu, Shijing Si, Jianzong Wang, Jing Xiao

In this paper, we propose a consistency regularization framework based on data augmentation, called CR-Aug, which forces the output distributions of different sub models generated by data augmentation to be consistent with each other.

Audio Classification Data Augmentation

Adaptive Few-Shot Learning Algorithm for Rare Sound Event Detection

no code implementations24 May 2022 Chendong Zhao, Jianzong Wang, Leilai Li, Xiaoyang Qu, Jing Xiao

In this work, we propose a novel task-adaptive module which is easy to plant into any metric-based few-shot learning frameworks.

Event Detection Few-Shot Learning +1

Self-Attention for Incomplete Utterance Rewriting

no code implementations24 Feb 2022 Yong Zhang, Zhitao Li, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, we propose a novel method by directly extracting the coreference and omission relationship from the self-attention weight matrix of the transformer instead of word embeddings and edit the original text accordingly to generate the complete utterance.

Word Embeddings

Towards Speaker Age Estimation with Label Distribution Learning

no code implementations23 Feb 2022 Shijing Si, Jianzong Wang, Junqing Peng, Jing Xiao

To address this, we utilize the ambiguous information among the age labels, convert each age label into a discrete label distribution and leverage the label distribution learning (LDL) method to fit the data.

Age Classification Age Estimation +2

VU-BERT: A Unified framework for Visual Dialog

no code implementations22 Feb 2022 Tong Ye, Shijing Si, Jianzong Wang, Rui Wang, Ning Cheng, Jing Xiao

The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history.

Language Modelling Masked Language Modeling +2

Loss Prediction: End-to-End Active Learning Approach For Speech Recognition

no code implementations9 Jul 2021 Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive.

Active Learning Automatic Speech Recognition +2

Federated Learning with Dynamic Transformer for Text to Speech

no code implementations9 Jul 2021 Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Jie Liu, Chendong Zhao, Jing Xiao

Text to speech (TTS) is a crucial task for user interaction, but TTS model training relies on a sizable set of high-quality original datasets.

Federated Learning Text to Speech

Efficient Client Contribution Evaluation for Horizontal Federated Learning

no code implementations26 Feb 2021 Jie Zhao, Xinghua Zhu, Jianzong Wang, Jing Xiao

In this paper an efficient method is proposed to evaluate the contributions of federated participants.

Federated Learning

Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation

no code implementations23 Feb 2021 Xiaoyang Qu, Jianzong Wang, Jing Xiao

We add an activation regularizer and a virtual interpolation method to improve the data generation efficiency.

Knowledge Distillation

MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution

3 code implementations3 Dec 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, an efficient network, named location-variable convolution, is proposed to model the dependencies of waveforms.

Large-scale Transfer Learning for Low-resource Spoken Language Understanding

no code implementations13 Aug 2020 Xueli Jia, Jianzong Wang, Zhiyong Zhang, Ning Cheng, Jing Xiao

However, the increased complexity of a model can also introduce high risk of over-fitting, which is a major challenge in SLU tasks due to the limitation of available data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

no code implementations13 Aug 2020 Zhenpeng Zheng, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

The MLNET leveraged multi-branches to extract multiple contextual speech information and investigated an effective attention block to weight the most crucial parts of the context for final classification.

Action Detection Activity Detection

Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

no code implementations13 Aug 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

Recent neural speech synthesis systems have gradually focused on the control of prosody to improve the quality of synthesized speech, but they rarely consider the variability of prosody and the correlation between prosody and semantics together.

Language Modelling Position +2

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

no code implementations9 Apr 2020 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation.

Artist classification Music Generation +1

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

2 code implementations4 Mar 2020 Zhen Zeng, Jianzong Wang, Ning Cheng, Tian Xia, Jing Xiao

Targeting at both high efficiency and performance, we propose AlignTTS to predict the mel-spectrum in parallel.

Text to Speech

GraphTTS: graph-to-sequence modelling in neural text-to-speech

no code implementations4 Mar 2020 Aolan Sun, Jianzong Wang, Ning Cheng, Huayi Peng, Zhen Zeng, Jing Xiao

This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms.

Graph Embedding Graph-to-Sequence +2

A Robust Speaker Clustering Method Based on Discrete Tied Variational Autoencoder

no code implementations4 Mar 2020 Chen Feng, Jianzong Wang, Tongxu Li, Junqing Peng, Jing Xiao

Recently, the speaker clustering model based on aggregation hierarchy cluster (AHC) is a common method to solve two main problems: no preset category number clustering and fix category number clustering.

Clustering

Dynamic Student Classiffication on Memory Networks for Knowledge Tracing

1 code implementation22 Mar 2019 Sein Minn, Michel C. Desmarais, Feida Zhu, Jing Xiao, Jianzong Wang

Knowledge Tracing (KT) is the assessment of student’s knowledge state and predicting whether that student may or may not answer the next problem correctly based on a number of previous practices and outcomes in their learning process.

Knowledge Tracing

Cannot find the paper you are looking for? You can Submit a new open access paper.