no code implementations • 3 Oct 2024 • Nikita Kuzmin, Hieu-Thi Luong, Jixun Yao, Lei Xie, Kong Aik Lee, Eng Siong Chng
In this work, we describe our submissions for the Voice Privacy Challenge 2024.
no code implementations • 2 Oct 2024 • Yuguang Yang, Yu Pan, Jixun Yao, Xiang Zhang, Jianhao Ye, Hongbin Zhou, Lei Xie, Lei Ma, Jianjun Zhao
Zero-shot voice conversion (VC) aims to transform the source speaker timbre into an arbitrary unseen one without altering the original speech content. While recent advancements in zero-shot VC methods have shown remarkable progress, there still remains considerable potential for improvement in terms of improving speaker similarity and speech naturalness. In this paper, we propose Takin-VC, a novel zero-shot VC framework based on jointly hybrid content and memory-augmented context-aware timbre modeling to tackle this challenge.
no code implementations • 17 Sep 2024 • Hongfei Xue, Wei Ren, Xuelong Geng, Kun Wei, Longhao Li, Qijie Shao, Linju Yang, Kai Diao, Lei Xie
Integrating audio encoders with LLMs through connectors has enabled these models to process and comprehend audio modalities, significantly enhancing speech-to-text tasks, including automatic speech recognition (ASR) and automatic speech translation (AST).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Sep 2024 • Ziqian Wang, Jiayao Sun, Zihan Zhang, Xingchen Li, Jie Liu, Lei Xie
Our proposed system supports both streaming and non-streaming modes.
no code implementations • 9 Sep 2024 • Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, BinBin Zhang, Bin Jia
The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Sep 2024 • Jixun Yao, Nikita Kuzmin, Qing Wang, Pengcheng Guo, Ziqian Ning, Dake Guo, Kong Aik Lee, Eng-Siong Chng, Lei Xie
Our system employs a disentangled neural codec architecture and a serial disentanglement strategy to gradually disentangle the global speaker identity and time-variant linguistic content and paralinguistic information.
no code implementations • 28 Aug 2024 • Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie
Rap, a prominent genre of vocal performance, remains underexplored in vocal generation.
no code implementations • 20 Aug 2024 • Tianyi Xu, Kaixun Huang, Pengcheng Guo, Yu Zhou, Longtao Huang, Hui Xue, Lei Xie
Pre-trained multilingual speech foundation models, like Whisper, have shown impressive performance across different languages.
1 code implementation • 5 Aug 2024 • He Wang, Lei Xie
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP (Team 237) in the second Chinese Continuous Visual Speech Recognition Challenge (CNVSRC 2024), engaging in all four tracks, including the fixed and open tracks of Single-Speaker VSR Task and Multi-Speaker VSR Task.
no code implementations • 5 Aug 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang
StreamVoice+ integrates a semantic encoder and a connector with the original StreamVoice framework, now trained using a non-streaming ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Jul 2024 • Haoyang He, Jiangning Zhang, Guanzhong Tian, Chengjie Wang, Lei Xie
This study explores the recently proposed challenging multi-view Anomaly Detection (AD) task.
no code implementations • 16 Jul 2024 • Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Yuguang Yang, Yu Pan, Lei Xie
Meanwhile, we propose a straightforward anonymization strategy that employs empty embedding with zero values to simulate the speaker identity concealment process, eliminating the need for conversion to a pseudo-speaker identity and thereby reducing the complexity of speaker anonymization process.
no code implementations • 8 Jul 2024 • You Wu, Lei Xie
Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body.
no code implementations • 12 Jun 2024 • Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi
With speaker-independent semantic tokens to guide the training of the content encoder, the dependency on ASR is removed and the model can operate under extremely small chunks, with cascading errors eliminated.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 12 Jun 2024 • Yue Li, Xinsheng Wang, Li Zhang, Lei Xie
Furthermore, a contrastive learning method is proposed to mitigate the overfitting tendencies in the training of both the fine-tuning-based method and SCDNet.
no code implementations • 11 Jun 2024 • Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, YuanJun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li
The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction.
no code implementations • 11 Jun 2024 • Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, BinBin Zhang, Jun Du, Jia Bin, Ming Li
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 10 Jun 2024 • Zihan Zhang, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie
Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal.
no code implementations • 9 Jun 2024 • Dake Guo, Xinfa Zhu, Liumeng Xue, Yongmao Zhang, Wenjie Tian, Lei Xie
Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech.
1 code implementation • 9 Jun 2024 • Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, BinBin Zhang, Lei Xie
Furthermore, we have created subsets of varying sizes, categorized by segment quality scores to allow for TTS model training and fine-tuning.
1 code implementation • 5 Jun 2024 • Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong liu
This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework that is highly extensible for new methods.
no code implementations • 17 May 2024 • Jixun Yao, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie
To address these issues and especially generate more natural and distinctive anonymized speech, we propose a novel speaker anonymization approach that models a matrix related to speaker identity and transforms it into an anonymized singular value transformation-assisted matrix to conceal the original speaker identity.
no code implementations • 6 May 2024 • Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie
Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 3 May 2024 • Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie
Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 24 Apr 2024 • Qi Zhang, Weihua Xu, Lei Xie, Hongye Su
Electrolytic hydrogen production serves as not only a vital source of green hydrogen but also a key strategy for addressing renewable energy consumption challenges.
no code implementations • 15 Apr 2024 • Qi Zhang, Lei Xie, Weihua Xu, Hongye Su
A novel robust dynamic variational Bayesian dictionary learning (RDVDL) monitoring approach is proposed to improve the reliability and safety of AWE operation.
no code implementations • 15 Apr 2024 • Qi Zhang, Lei Wang, Weihua Xu, Hongye Su, Lei Xie
Variational inference is used by NSVB-MPC to assess the predictive accuracy and make the necessary corrections to quantify system uncertainty.
2 code implementations • 9 Apr 2024 • Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie
Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches.
no code implementations • 8 Apr 2024 • He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie
Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.
no code implementations • 30 Mar 2024 • Runze Lin, Junghui Chen, Lei Xie, Hongye Su, Biao Huang
This paper provides insights into deep reinforcement learning (DRL) for process control from the perspective of transfer learning.
no code implementations • 29 Feb 2024 • Lei Xie, Qingrun Zeng, Huajun Zhou, Guoqiang Xie, Mingchu Li, Jiahao Huang, Jianan Cui, Hao Chen, Yuanjing Feng
Diffusion MRI tractography is an important tool for identifying and analyzing the intracranial course of cranial nerves (CNs).
no code implementations • 15 Feb 2024 • Wenhao Zhuang, Yuyi Mao, Hengtao He, Lei Xie, Shenghui Song, Yao Ge, Zhi Ding
Orthogonal time frequency space (OTFS) modulation has emerged as a promising solution to support high-mobility wireless communications, for which, cost-effective data detectors are critical.
no code implementations • 19 Jan 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang
Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.
no code implementations • 8 Jan 2024 • Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie
Packet loss is a common and unavoidable problem in voice over internet phone (VoIP) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
2 code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.
no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Pan Zhou, Lei Xie
While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.
Audio-Visual Speech Recognition Automatic Speech Recognition +4
no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 3 Jan 2024 • Alou Diakite, Cheng Li, Lei Xie, Yuanjing Feng, Hua Han, Shanshan Wang
Recent research has shown the potential of deep learning in multi-parametric MRI-based visual pathway (VP) segmentation.
no code implementations • 3 Jan 2024 • Hua Han, Cheng Li, Lei Xie, Yuanjing Feng, Alou Diakite, Shanshan Wang
Secondly, we propose a cross-fusion module that further enhances the fusion of information between the two modalities.
1 code implementation • CVPR 2024 • Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Hongkai Wen, Lei Xie, Sanglu Lu
In fact sign language tasks need to focus on the correlation of different regions in one frame and the interaction of different regions among adjacent frames for identifying a sign sequence.
no code implementations • 15 Dec 2023 • Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest.
no code implementations • 15 Dec 2023 • Ziqian Wang, Xinfa Zhu, Zihan Zhang, YuanJun Lv, Ning Jiang, Guoqing Zhao, Lei Xie
Given the intrinsic similarity between speech generation and speech enhancement, harnessing semantic information holds potential advantages for speech enhancement tasks.
1 code implementation • 11 Dec 2023 • Haoyang He, Jiangning Zhang, Hongxu Chen, Xuhai Chen, Zhishan Li, Xu Chen, Yabiao Wang, Chengjie Wang, Lei Xie
Reconstruction-based approaches have achieved remarkable outcomes in anomaly detection.
no code implementations • 7 Dec 2023 • Huan Zhao, Li Zhang, Yue Li, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie
The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual speaker diarization systems.
no code implementations • 5 Dec 2023 • Shuo Zhang, Lei Xie
Then we compute and update the protein-ligand interaction embedding based on the protein residue-level embeddings and ligand atom-level embeddings, and the geometric constraints in the inferred protein contact map and ligand distance map.
1 code implementation • Scientific Reports 2023 • Shuo Zhang, Yang Liu, Lei Xie
Molecular sciences address a wide range of problems involving molecules of different types and sizes and their complexes.
Ranked #1 on Drug Discovery on QM9
1 code implementation • 5 Nov 2023 • Jiangning Zhang, Haoyang He, Xuhai Chen, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong liu
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual grounding capabilities, making it possible to handle certain tasks through the Visual Question Answering (VQA) paradigm.
no code implementations • 26 Oct 2023 • Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie
This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions.
no code implementations • 22 Oct 2023 • Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie
By introducing both cross-modal and conversational representations into the decoder, our model retains context over longer sentences without information loss, achieving relative accuracy improvements of 8. 8% and 23% on Mandarin conversation datasets HKUST and MagicData-RAMC, respectively, compared to the standard Conformer model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Oct 2023 • Kaixun Huang, Ao Zhang, BinBin Zhang, Tianyi Xu, Xingchen Song, Lei Xie
However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 7 Oct 2023 • Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie
Utilization of speaker representation has extended the frontier of AEC, thus attracting many researchers' interest in personalized acoustic echo cancellation (PAEC).
no code implementations • 6 Oct 2023 • Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie
A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios.
no code implementations • 4 Oct 2023 • Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie
This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023.
no code implementations • 29 Sep 2023 • Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Jie Liu, Lei Xie
Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 27 Sep 2023 • Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi
Third, the model is unable to effectively address the noise in the unvoiced segments, lowering the sound quality.
no code implementations • 17 Sep 2023 • Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie
In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.
no code implementations • 3 Sep 2023 • Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang
In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation.
no code implementations • 5 Aug 2023 • Runze Lin, Yangyang Luo, Xialai Wu, Junghui Chen, Biao Huang, Lei Xie, Hongye Su
The Organic Rankine Cycle (ORC) is widely used in industrial waste heat recovery due to its simple structure and easy maintenance.
no code implementations • 29 Jul 2023 • Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie
However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to the challenges of cross-speaker cross-lingual emotion transfer - the heavy entanglement of speaker timbre, emotion, and language factors in the speech signal will make a system produce cross-lingual synthetic speech with an undesired foreign accent and weak emotion expressiveness.
no code implementations • 24 Jul 2023 • Shuo Zhang, Yang Liu, Li Xie, Lei Xie
To combine the DNP descriptor and chemical features in molecules, we construct the Robust Molecular Graph Convolutional Network (RoM-GCN) which is capable to take both node and edge features into consideration when generating molecule representations.
1 code implementation • 6 Jul 2023 • Yuanjing Feng, Lei Xie, Jingqiang Wang, Qiyuan Tian, Jianzhong He, Qingrun Zeng, Fei Gao
At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors.
no code implementations • 21 Jun 2023 • Renjie Cheng, Zhemin Zhuang, Shuxin Zhuang, Lei Xie, Jingfeng Guo
To address these challenges, we propose a single-layer Transformer network called Multi-Scale Shifted Windows Transformer Networks (MSW-Transformer), which uses a multi-window sliding attention mechanism at different scales to capture features in different dimensions.
no code implementations • 18 Jun 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang
An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker.
no code implementations • 1 Jun 2023 • Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie
By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words.
no code implementations • 23 May 2023 • Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu
Different from UniSpeech, UniData2vec replaces the quantized discrete representations with continuous and contextual representations from a teacher model for phonetically-aware pre-training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 23 May 2023 • Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie
The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 21 May 2023 • Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie
Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system.
no code implementations • 21 May 2023 • Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi
Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities.
no code implementations • 21 May 2023 • Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie
In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method.
no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang
Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.
no code implementations • 9 May 2023 • Yiming Xu, Dongfang Xu, Lei Xie, Shenghui Song
Different from conventional radar, the cellular network in the integrated sensing and communication (ISAC) system enables collaborative sensing by multiple sensing nodes, e. g., base stations (BSs).
no code implementations • 14 Mar 2023 • Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie
Achieving 0. 446 in the final score and 0. 517 in the P. 835 score, our system ranks 4th in the non-real-time track.
1 code implementation • 13 Mar 2023 • Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie
This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge at ICASSP 2023.
no code implementations • 13 Mar 2023 • Zihan Zhang, Shimin Zhang, Mingshuai Liu, Yanhong Leng, Zhe Han, Li Chen, Lei Xie
This paper describes a Two-step Band-split Neural Network (TBNN) approach for full-band acoustic echo cancellation.
no code implementations • 17 Jan 2023 • Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie
In this paper, we propose an efficient approach to obtain a high quality contextual list for a unified streaming/non-streaming based E2E model.
no code implementations • 8 Dec 2022 • Cameron Mura, Emma Candelier, Lei Xie
This Special Issue of Biomolecules, commissioned in honor of Dr. Philip E. Bourne, focuses on a new field of biomolecular data science.
no code implementations • 30 Nov 2022 • Yue Li, Li Zhang, Namin Wang, Jie Liu, Lei Xie
Specifically, the weight transfer fine-tuning aims to constrain the distance of the weights between the pre-trained model and the fine-tuned model, which takes advantage of the previously acquired discriminative ability from the large-scale out-domain datasets and avoids catastrophic forgetting and overfitting at the same time.
no code implementations • 19 Nov 2022 • Xinfa Zhu, Yi Lei, Kun Song, Yongmao Zhang, Tao Li, Lei Xie
This paper aims to synthesize the target speaker's speech with desired speaking style and emotion by transferring the style and emotion from reference speech recorded by other speakers.
no code implementations • 16 Nov 2022 • Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang
Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).
no code implementations • 9 Nov 2022 • Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi
We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.
no code implementations • 6 Nov 2022 • Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie
Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios.
no code implementations • 6 Nov 2022 • Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu
By directly scaling the formant and F0, the speaker distinguishability degradation of the anonymized speech caused by the introduction of other speakers is prevented.
1 code implementation • 30 Oct 2022 • Jie Wang, Menglong Xu, Jingyong Hou, BinBin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan
Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices.
1 code implementation • 28 Oct 2022 • Shuo Zhang, Yang Liu, Lei Xie
In this work, we propose a Graph Neural Network (GNN)-based scoring function trained only with the atomic types and coordinates on limited solved RNA 3D structures for distinguishing accurate structural models.
no code implementations • 26 Oct 2022 • Bowen Pang, Huan Zhao, Gaosheng Zhang, Xiaoyue Yang, Yang Sun, Li Zhang, Qing Wang, Lei Xie
In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering(SC) based diarization, target-speaker voice activity detection(TS-VAD) and end-to-end neural diarization(EEND) respectively.
no code implementations • 17 Oct 2022 • Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang
Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal.
no code implementations • 24 Sep 2022 • Jixun Yao, Qing Wang, Li Zhang, Pengcheng Guo, Yuhao Liang, Lei Xie
Our system consists of four modules, including feature extractor, acoustic model, anonymization module, and neural vocoder.
no code implementations • 14 Sep 2022 • Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie
To alleviate the difficulty in training, we propose to model linguistic and prosodic information by considering cross-sentence, embedded structure in training.
no code implementations • 22 Aug 2022 • Lei Xie, Shenghui Song
As a result, most existing methods require a large number of data samples to achieve an accurate estimate of the covariance matrix for the received signals, based on which a power spectrum is constructed for localization purposes.
1 code implementation • 17 Aug 2022 • Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan
In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level.
no code implementations • 2 Aug 2022 • Liyan Zheng, Haojie Wang, Jidong Zhai, Muyan Hu, Zixuan Ma, Tuowei Wang, Shizhi Tang, Lei Xie, Kezhao Huang, Zhihao Jia
Boosting the runtime performance of deep neural networks (DNNs) is critical due to their wide adoption in real-world tasks.
no code implementations • 30 Jul 2022 • Lei Xie, Xianghao Yu, S. H. Song
Maneuvering target sensing will be an important service of future vehicular networks, where precise velocity estimation is one of the core tasks.
no code implementations • 3 Jul 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma
Then, during the training of the conversational ASR system, the extractor will be frozen to extract the textual representation of preceding speech, while such representation is used as context fed to the ASR decoder through attention mechanism.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 15 Jun 2022 • Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su
The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech.
1 code implementation • 6 Jun 2022 • Shuo Zhang, Yang Liu, Lei Xie
On small molecule dataset for predicting quantum chemical properties, PaxNet reduces the prediction error by 15% and uses 73% less memory than the best baseline.
no code implementations • 31 Mar 2022 • Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan
As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 31 Mar 2022 • Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie
Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
3 code implementations • 29 Mar 2022 • BinBin Zhang, Di wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu
Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model.
no code implementations • 10 Mar 2022 • Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang
DeepFake based digital facial forgery is threatening the public media security, especially when lip manipulation has been used in talking face generation, the difficulty of fake video detection is further improved.
no code implementations • 8 Mar 2022 • Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha
Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction.
no code implementations • 5 Mar 2022 • Junwen Xiong, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang
Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments.
1 code implementation • 4 Mar 2022 • Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei zha
Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding.
no code implementations • 16 Feb 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma
Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 21 Jan 2022 • Ryan K. Tan, Yang Liu, Lei Xie
The challenges on harnessing reinforcement learning for systems pharmacology and personalized medicine are discussed.
no code implementations • 2 Jan 2022 • Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li
Specifically, prosody vector is first extracted from pre-trained VQ-Wav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization.
no code implementations • 23 Dec 2021 • Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan
Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.
no code implementations • 4 Dec 2021 • Ziqiang Wang, Yimao Sun, Qun Wan, Lei Xie, Ning Liu
Emitter localization is widely applied in the military and civilian _elds.
no code implementations • 24 Nov 2021 • Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi
One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.
no code implementations • 23 Nov 2021 • Tian Cai, Li Xie, Muge Chen, Yang Liu, Di He, Shuo Zhang, Cameron Mura, Philip E. Bourne, Lei Xie
Advances in biomedicine are largely fueled by exploring uncharted territories of human biology.
no code implementations • 16 Nov 2021 • Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu
In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum.
1 code implementation • 11 Nov 2021 • Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie
Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation.
no code implementations • 17 Oct 2021 • Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi
In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score.
1 code implementation • 7 Oct 2021 • BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong Peng
In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.
Ranked #5 on Speech Recognition on WenetSpeech
no code implementations • 9 Aug 2021 • Xinsheng Wang, Qicong Xie, Jihua Zhu, Lei Xie, Scharenborg
In this paper, we present an automatic method to generate synchronized speech and talking-head videos on the basis of text and a single face image of an arbitrary person as input.
no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su
Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.
no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Lei Xie, Dan Su
Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.
no code implementations • 16 Jun 2021 • Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li
Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.
1 code implementation • 16 Jun 2021 • Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie
Moreover, by including the data of variable numbers of speakers, our model can even better than the PIT-Conformer AR model with only 1/7 latency, obtaining WERs of 19. 9% and 34. 3% on WSJ0-2mix and WSJ0-3mix sets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 16 Jun 2021 • Shubo Lv, Yanxin Hu, Shimin Zhang, Lei Xie
Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020).
no code implementations • 5 Apr 2021 • Lei Xie, Zishu He, Jun Tong, Jun Li, Jiangtao Xi
We propose leave-one-out cross-validation (LOOCV) choices for the shrinkage factors to optimize the beamforming performance, referred to as $\text{S}^2$CM-CV and STE-CV.
no code implementations • 5 Apr 2021 • Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu
The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.
1 code implementation • 2 Apr 2021 • Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang
The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing.
1 code implementation • 31 Mar 2021 • Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-Yi Lee, Lei Xie
Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.
no code implementations • 17 Mar 2021 • Lei Xie, Zishu He, Jun Tong, Tianle Liu, Jun Li, Jiangtao Xi
This paper investigates regularized estimation of Kronecker-structured covariance matrices (CM) for polarization radar in sea clutter scenarios where the data are assumed to follow the complex, elliptically symmetric (CES) distributions with a Kronecker-structured CM.
1 code implementation • 26 Feb 2021 • Jingyong Hou, Li Zhang, Yihui Fu, Qing Wang, Zhanheng Yang, Qijie Shao, Lei Xie
This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge.
no code implementations • 8 Feb 2021 • Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur
Modern wake word detection systems usually rely on neural networks for acoustic modeling.
4 code implementations • 2 Feb 2021 • Zhuoyuan Yao, Di wu, Xiong Wang, BinBin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei
In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
1 code implementation • 31 Jan 2021 • Di He, Lei Xie
Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models.
5 code implementations • 10 Dec 2020 • BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei
In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
Ranked #8 on Speech Recognition on AISHELL-1
1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu
In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.
1 code implementation • 24 Nov 2020 • Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, LingHui Chen, Lei Xie, Shan Liu
On one hand, we propose to discriminate ground-truth waveform from synthetic one in frequency domain for offering more consistency guarantees instead of only in time domain.
no code implementations • 17 Nov 2020 • Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie
End-to-end models are favored in automatic speech recognition (ASR) because of its simplified system structure and superior performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 15 Nov 2020 • Shuo Zhang, Yang Liu, Lei Xie
The prediction of physicochemical properties from molecular structures is a crucial task for artificial intelligence aided molecular design.
Ranked #2 on Drug Discovery on QM9
no code implementations • 13 Nov 2020 • Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.
Sound Audio and Speech Processing
1 code implementation • 4 Nov 2020 • Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, DongYan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez
In this challenge, we open source a sizable speech, keyword, echo and noise corpus for promoting data-driven methods, particularly deep-learning approaches on KWS and SSL.
Sound Audio and Speech Processing
no code implementations • 25 Oct 2020 • Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie
The AutoSpeech challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to speech processing tasks.
no code implementations • 9 Oct 2020 • Di He, Lei Xie
An unsolved fundamental problem in biology and ecology is to predict observable traits (phenotypes) from a new genetic constitution (genotype) of an organism under environmental perturbations (e. g., drug treatment).
no code implementations • 7 Sep 2020 • Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie
Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.
1 code implementation • 17 Aug 2020 • Zhixiang Ren, Yongheng Liu, Tianhui Shi, Lei Xie, Yue Zhou, Jidong Zhai, Youhui Zhang, Yunquan Zhang, WenGuang Chen
The de facto HPC benchmark LINPACK can not reflect AI computing power and I/O performance without representative workload.
1 code implementation • 12 Aug 2020 • Haohe Liu, Lei Xie, Jian Wu, Geng Yang
We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands.
Audio and Speech Processing Sound
7 code implementations • Interspeech 2020 • Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie
Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality.
Ranked #8 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge (PESQ-NB metric)
Speech Enhancement Audio and Speech Processing Sound
no code implementations • 12 Jul 2020 • Xian Shi, Qiangze Feng, Lei Xie
The paper then presents an overview of the results and system performance in the three tracks.
no code implementations • NeurIPS 2020 • Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie
This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences.
Ranked #3 on Speech Separation on WSJ0-4mix
1 code implementation • 21 May 2020 • Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.
Sound Audio and Speech Processing
no code implementations • 21 May 2020 • Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie
Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies.
1 code implementation • 17 May 2020 • Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur
Always-on spoken language interfaces, e. g. personal digital assistants, rely on a wake word to start processing spoken input.
9 code implementations • Interspeech2020 2020 • Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie
In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.
Sound Audio and Speech Processing
no code implementations • 28 Apr 2020 • Shan Yang, Yuxuan Wang, Lei Xie
As for the speech-side noise, we propose to learn a noise-independent feature in the auto-regressive decoder through adversarial training and data augmentation, which does not need an extra speech enhancement model.
no code implementations • 29 Oct 2019 • Xinyong Zhou, Hao Che, Xiaorui Wang, Lei Xie
In this paper, we present a cross-lingual voice cloning approach.
1 code implementation • 18 Sep 2019 • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur
We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.
Ranked #1 on Speech Recognition on Hub5'00 CallHome
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
1 code implementation • 4 Jul 2019 • Shuo Zhang, Lei Xie
To improve the performance of attention-based GNNs, we propose cardinality preserved attention (CPA) models that can be applied to any kind of attention mechanisms.
Ranked #2 on Graph Classification on RE-M5K