Search Results for author: Lei Xie

In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

speech-recognition Speech Recognition

10,069

Paper
Code

The NPU System for the 2020 Personalized Voice Trigger Challenge

1 code implementation • 26 Feb 2021 • Jingyong Hou, Li Zhang, Yihui Fu, Qing Wang, Zhanheng Yang, Qijie Shao, Lei Xie

This paper describes the system developed by the NPU team for the 2020 personalized voice trigger challenge.

Small-Footprint Keyword Spotting Speaker Verification

10,069

Paper
Code

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

3 code implementations • 29 Mar 2022 • BinBin Zhang, Di wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model.

Language Modelling speech-recognition +1

3,666

Paper
Code

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

1 code implementation • 21 May 2020 • Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.

Sound Audio and Speech Processing

3,074

Paper
Code

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

7 code implementations • Interspeech 2020 • Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie

Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality.

Ranked #4 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge (PESQ-NB metric)

Speech Enhancement Audio and Speech Processing Sound

2,094

Paper
Code

The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge

1 code implementation • 13 Mar 2023 • Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie

This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge at ICASSP 2023.

Speech Enhancement

964

Paper
Code

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation • 18 Sep 2019 • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

Ranked #1 on Speech Recognition on Hub5'00 CallHome

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

941

Paper
Code

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

4 code implementations • 6 Jul 2017 • Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dong-Yan Huang, Haizhou Li

In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN).

Sound

514

Paper
Code

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

1 code implementation • 7 Oct 2021 • BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong Peng

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.

Ranked #5 on Speech Recognition on WenetSpeech

Label Error Detection Optical Character Recognition +4

450

Paper
Code

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

1 code implementation • 30 Oct 2022 • Jie Wang, Menglong Xu, Jingyong Hou, BinBin Zhang, Xiao-Lei Zhang, Lei Xie, Fuping Pan

Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices.

Keyword Spotting

377

Paper
Code

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

122

Paper
Code

Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation

1 code implementation • 11 Nov 2021 • Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie

Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation.

Speech Enhancement

Paper
Code

Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

1 code implementation • 12 Aug 2020 • Haohe Liu, Lei Xie, Jian Wu, Geng Yang

We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands.

Audio and Speech Processing Sound

Paper
Code

INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing

1 code implementation • 2 Apr 2021 • Wei Rao, Yihui Fu, Yanxin Hu, Xin Xu, Yvkai Jv, Jiangyu Han, Zhongjie Jiang, Lei Xie, Yannan Wang, Shinji Watanabe, Zheng-Hua Tan, Hui Bu, Tao Yu, Shidong Shang

The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing.

Speech Enhancement Task 2

Paper
Code

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

1 code implementation • 24 Nov 2020 • Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, LingHui Chen, Lei Xie, Shan Liu

On one hand, we propose to discriminate ground-truth waveform from synthetic one in frequency domain for offering more consistency guarantees instead of only in time domain.

Generative Adversarial Network Speech Synthesis

Paper
Code

DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection

1 code implementation • 11 Dec 2023 • Haoyang He, Jiangning Zhang, Hongxu Chen, Xuhai Chen, Zhishan Li, Xu Chen, Yabiao Wang, Chengjie Wang, Lei Xie

Reconstruction-based approaches have achieved remarkable outcomes in anomaly detection.

Anomaly Detection Denoising +1

Paper
Code

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures

2 code implementations • 15 Nov 2020 • Shuo Zhang, Yang Liu, Lei Xie

The prediction of physicochemical properties from molecular structures is a crucial task for artificial intelligence aided molecular design.

Ranked #2 on Drug Discovery on QM9

Drug Discovery Formation Energy

Paper
Code

Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

1 code implementation • 27 Mar 2018 • Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie

First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset.

Robust Speech Recognition Speech Dereverberation +1

Paper
Code

AIPerf: Automated machine learning as an AI-HPC benchmark

1 code implementation • 17 Aug 2020 • Zhixiang Ren, Yongheng Liu, Tianhui Shi, Lei Xie, Yue Zhou, Jidong Zhai, Youhui Zhang, Yunquan Zhang, WenGuang Chen

The de facto HPC benchmark LINPACK can not reflect AI computing power and I/O performance without representative workload.

AutoML Benchmarking +1

Paper
Code

Efficient and Accurate Physics-aware Multiplex Graph Neural Networks for 3D Small Molecules and Macromolecule Complexes

1 code implementation • 6 Jun 2022 • Shuo Zhang, Yang Liu, Lei Xie

On small molecule dataset for predicting quantum chemical properties, PaxNet reduces the prediction error by 15% and uses 73% less memory than the best baseline.

Molecular Property Prediction Protein-Ligand Affinity Prediction

Paper
Code

Physics-aware Graph Neural Network for Accurate RNA 3D Structure Prediction

1 code implementation • 28 Oct 2022 • Shuo Zhang, Yang Liu, Lei Xie

In this work, we propose a Graph Neural Network (GNN)-based scoring function trained only with the atomic types and coordinates on limited solved RNA 3D structures for distinguishing accurate structural models.

Drug Discovery RNA 3D STRUCTURE PREDICTION

Paper
Code

A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems

1 code implementation • Scientific Reports 2023 • Shuo Zhang, Yang Liu, Lei Xie

Molecular sciences address a wide range of problems involving molecules of different types and sizes and their complexes.

Ranked #1 on Drug Discovery on QM9

Paper
Code

Attention-based End-to-End Models for Small-Footprint Keyword Spotting

3 code implementations • 29 Mar 2018 • Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie

In this paper, we propose an attention-based end-to-end neural approach for small-footprint keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality KWS system.

Small-Footprint Keyword Spotting

Paper
Code

IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

1 code implementation • 4 Nov 2020 • Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, DongYan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

In this challenge, we open source a sizable speech, keyword, echo and noise corpus for promoting data-driven methods, particularly deep-learning approaches on KWS and SSL.

Sound Audio and Speech Processing

Paper
Code

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

2 code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.

speech-recognition Visual Speech Recognition

Paper
Code

Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation

1 code implementation • 4 Jul 2019 • Shuo Zhang, Lei Xie

To improve the performance of attention-based GNNs, we propose cardinality preserved attention (CPA) models that can be applied to any kind of attention mechanisms.

Ranked #2 on Graph Classification on RE-M5K

Graph Classification Graph Representation Learning +1

Paper
Code

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

1 code implementation • 4 Mar 2022 • Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei zha

Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding.

Multi-Task Learning Speech Enhancement

Paper
Code

Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model

1 code implementation • 27 Mar 2018 • Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie

Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities.

Paper
Code

CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics

1 code implementation • 31 Jan 2021 • Di He, Lei Xie

Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models.

Transfer Learning

Paper
Code

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

1 code implementation • 17 Aug 2022 • Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan

In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level.

Machine Translation speaker-diarization +1

Paper
Code

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

1 code implementation • 31 Mar 2021 • Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-Yi Lee, Lei Xie

Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.

AutoML BIG-bench Machine Learning +1

Paper
Code

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain

1 code implementation • 16 Jun 2021 • Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie

Moreover, by including the data of variable numbers of speakers, our model can even better than the PIT-Conformer AR model with only 1/7 latency, obtaining WERs of 19. 9% and 34. 3% on WSJ0-2mix and WSJ0-3mix sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

no code implementations • 10 Jun 2018 • Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li

We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.

Dynamic Time Warping Word Embeddings

Paper
Add Code

Training Augmentation with Adversarial Examples for Robust Speech Recognition

no code implementations • 7 Jun 2018 • Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie

This paper explores the use of adversarial examples in training speech recognition systems to increase robustness of deep neural network acoustic models.

Data Augmentation Robust Speech Recognition +1

Paper
Add Code

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

no code implementations • 16 Jun 2018 • Pengcheng Guo, Hai-Hua Xu, Lei Xie, Eng Siong Chng

In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data.

speech-recognition Speech Recognition

Paper
Add Code

Domain Adversarial Training for Accented Speech Recognition

no code implementations • 7 Jun 2018 • Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie

In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem.

Accented Speech Recognition Multi-Task Learning +1

Paper
Add Code

Attention-Based End-to-End Speech Recognition on Voice Search

no code implementations • 22 Jul 2017 • Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie

Previous attempts have shown that applying attention-based encoder-decoder to Mandarin speech recognition was quite difficult due to the logographic orthography of Mandarin, the large vocabulary and the conditional dependency of the attention model.

L2 Regularization Language Modelling +3

Paper
Add Code

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling

no code implementations • 17 Mar 2017 • Wenpeng Li, Bin-Bin Zhang, Lei Xie, Dong Yu

Deep learning models (DLMs) are state-of-the-art techniques in speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features

no code implementations • 2 Nov 2015 • Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu

Prosody affects the naturalness and intelligibility of speech.

Feature Engineering Prosody Prediction +1

Paper
Add Code

A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

no code implementations • 6 Oct 2015 • Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, Minghui Dong

State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling.

Speech Synthesis Vocal Bursts Intensity Prediction

Paper
Add Code

Bi-objective Optimization for Robust RGB-D Visual Odometry

no code implementations • 27 Nov 2014 • Tao Han, Chao Xu, Ryan Loxton, Lei Xie

This paper considers a new bi-objective optimization formulation for robust RGB-D visual odometry.

Visual Odometry

Paper
Add Code

Exploring RNN-Transducer for Chinese Speech Recognition

no code implementations • 13 Nov 2018 • Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions

no code implementations • ACL 2013 • Xiaoming Lu, Lei Xie, Cheung-Chi Leung, Bin Ma, Haizhou Li

Speech Recognition

Paper
Add Code

A New GAN-based End-to-End TTS Training Algorithm

no code implementations • 9 Apr 2019 • Haohan Guo, Frank K. Soong, Lei He, Lei Xie

However, the autoregressive module training is affected by the exposure bias, or the mismatch between the different distributions of real and predicted data.

Generative Adversarial Network Sentence +1

Paper
Add Code

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

no code implementations • 9 Apr 2019 • Haohan Guo, Frank K. Soong, Lei He, Lei Xie

The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS.

Sentence

Paper
Add Code

Building a mixed-lingual neural TTS system with only monolingual data

no code implementations • 12 Apr 2019 • Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu

When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded.

Paper
Add Code

a novel cross-lingual voice cloning approach with a few text-free samples

no code implementations • 29 Oct 2019 • Xinyong Zhou, Hao Che, Xiaorui Wang, Lei Xie

In this paper, we present a cross-lingual voice cloning approach.

Translation Voice Cloning

Paper
Add Code

Time Domain Audio Visual Speech Separation

no code implementations • 7 Apr 2019 • Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.

Audio and Speech Processing Sound

Paper
Add Code

Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise

no code implementations • 28 Apr 2020 • Shan Yang, Yuxuan Wang, Lei Xie

As for the speech-side noise, we propose to learn a noise-independent feature in the auto-regressive decoder through adversarial training and data augmentation, which does not need an extra speech enhancement model.

Clustering Data Augmentation +5

Paper
Add Code

Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

no code implementations • 21 May 2020 • Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie

Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies.

speech-recognition Speech Recognition

Paper
Add Code

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

no code implementations • NeurIPS 2020 • Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences.

Ranked #3 on Speech Separation on WSJ0-4mix

speech-recognition Speech Recognition +1

Paper
Add Code

The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results

no code implementations • 12 Jul 2020 • Xian Shi, Qiangze Feng, Lei Xie

The paper then presents an overview of the results and system performance in the three tracks.

Data Augmentation Language Identification +2

Paper
Add Code

An End-to-end Architecture of Online Multi-channel Speech Separation

no code implementations • 7 Sep 2020 • Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie

Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.

speech-recognition Speech Recognition +1

Paper
Add Code

A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

no code implementations • 9 Oct 2020 • Di He, Lei Xie

An unsolved fundamental problem in biology and ecology is to predict observable traits (phenotypes) from a new genetic constitution (genotype) of an organism under environmental perturbations (e. g., drug treatment).

Domain Adaptation Representation Learning

Paper
Add Code

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

no code implementations • 25 Oct 2020 • Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie

The AutoSpeech challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to speech processing tasks.

AutoML BIG-bench Machine Learning +1

Paper
Add Code

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

no code implementations • 13 Nov 2020 • Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao

Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.

Sound Audio and Speech Processing

Paper
Add Code

Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter

no code implementations • 17 Nov 2020 • Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie

End-to-end models are favored in automatic speech recognition (ASR) because of its simplified system structure and superior performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Wake Word Detection with Streaming Transformers

no code implementations • 8 Feb 2021 • Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Modern wake word detection systems usually rely on neural networks for acoustic modeling.

Paper
Add Code

Regularized Covariance Estimation for Polarization Radar Detection in Compound Gaussian Sea Clutter

no code implementations • 17 Mar 2021 • Lei Xie, Zishu He, Jun Tong, Tianle Liu, Jun Li, Jiangtao Xi

This paper investigates regularized estimation of Kronecker-structured covariance matrices (CM) for polarization radar in sea clutter scenarios where the data are assumed to follow the complex, elliptically symmetric (CES) distributions with a Kronecker-structured CM.

Paper
Add Code

The Multi-speaker Multi-style Voice Cloning Challenge 2021

no code implementations • 5 Apr 2021 • Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.

Benchmarking Voice Cloning

Paper
Add Code

Cross-Validated Tuning of Shrinkage Factors for MVDR Beamforming Based on Regularized Covariance Matrix Estimation

no code implementations • 5 Apr 2021 • Lei Xie, Zishu He, Jun Tong, Jun Li, Jiangtao Xi

We propose leave-one-out cross-validation (LOOCV) choices for the shrinkage factors to optimize the beamforming performance, referred to as $\text{S}^2$CM-CV and STE-CV.

Paper
Add Code

The NNI Query-by-Example System for MediaEval 2014

no code implementations • 16 Oct 2014 • Peng Yang, HaiHua Xu, Xiong Xiao, Lei Xie, Cheung-Chi Leung, Hongjie Chen, JIA YU, Hang Lv, Lei Wang, Su Jun Leow, Bin Ma, Eng Siong Chng, Haizhou Li

For both symbolic and DTW search, partial sequence matching is performed to reduce missing rate, especially for query type 2 and 3.

Ranked #6 on Keyword Spotting on QUESST

Dynamic Time Warping Keyword Spotting +1

Paper
Add Code

Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion

no code implementations • 16 Jun 2021 • Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li

Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.

Style Transfer Voice Conversion

Paper
Add Code

DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement

no code implementations • 16 Jun 2021 • Shubo Lv, Yanxin Hu, Shimin Zhang, Lei Xie

Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020).

Speech Enhancement

Paper
Add Code

The NNI Query-by-Example System for MediaEval 2015

no code implementations • MediaEval 2015 Workshop 2015 • Jingyong Hou, Van Tung Pham, Cheung-Chi Leung, Lei Wang, HaiHua Xu, Hang Lv, Lei Xie, Zhonghua Fu, Chongjia Ni, Xiong Xiao, Hongjie Chen, Shaofei Zhang, Sining Sun, Yougen Yuan, Pengcheng Li, Tin Lay Nwe, Sunil Sivadas, Bin Ma, Eng Siong Chng, Haizhou Li

This paper describes the system developed by the NNI team for the Query-by-Example Search on Speech Task (QUESST) in the MediaEval 2015 evaluation.

Ranked #9 on Keyword Spotting on QUESST

Keyword Spotting

Paper
Add Code

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Lei Xie, Dan Su

Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.

Speech Synthesis

Paper
Add Code

Controllable Context-aware Conversational Speech Synthesis

no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.

Speech Synthesis

Paper
Add Code

AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person

no code implementations • 9 Aug 2021 • Xinsheng Wang, Qicong Xie, Jihua Zhu, Lei Xie, Scharenborg

In this paper, we present an automatic method to generate synchronized speech and talking-head videos on the basis of text and a single face image of an arbitrary person as input.

Talking Head Generation

Paper
Add Code

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

no code implementations • 17 Oct 2021 • Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score.

Singing Voice Synthesis Variational Inference

Paper
Add Code

S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement

no code implementations • 16 Nov 2021 • Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu

In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum.

16k Denoising +2

Paper
Add Code

One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

no code implementations • 24 Nov 2021 • Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.

Style Transfer Voice Conversion

Paper
Add Code

Exploration of Dark Chemical Genomics Space via Portal Learning: Applied to Targeting the Undruggable Genome and COVID-19 Anti-Infective Polypharmacology

no code implementations • 23 Nov 2021 • Tian Cai, Li Xie, Muge Chen, Yang Liu, Di He, Shuo Zhang, Cameron Mura, Philip E. Bourne, Lei Xie

Advances in biomedicine are largely fueled by exploring uncharted territories of human biology.

BIG-bench Machine Learning Meta-Learning +2

Paper
Add Code

A Computational Efficient Maximum Likelihood Direct Position Determination Approach for Multiple Emitters Using Angle and Doppler Measurements

no code implementations • 4 Dec 2021 • Ziqiang Wang, Yimao Sun, Qun Wan, Lei Xie, Ning Liu

Emitter localization is widely applied in the military and civilian _elds.

Position

Paper
Add Code

Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios

no code implementations • 23 Dec 2021 • Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan

Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.

Speech Synthesis Style Transfer +1

Paper
Add Code

IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion

no code implementations • 2 Jan 2022 • Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li

Specifically, prosody vector is first extracted from pre-trained VQ-Wav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization.

Quantization Voice Conversion

Paper
Add Code

Reinforcement Learning for Personalized Drug Discovery and Design for Complex Diseases: A Systems Pharmacology Perspective

no code implementations • 21 Jan 2022 • Ryan K. Tan, Yang Liu, Lei Xie

The challenges on harnessing reinforcement learning for systems pharmacology and personalized medicine are discussed.

Drug Discovery reinforcement-learning +1

Paper
Add Code

Conversational Speech Recognition By Learning Conversation-level Characteristics

no code implementations • 16 Feb 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Audio-visual speech separation based on joint feature representation with cross-modal attention

no code implementations • 5 Mar 2022 • Junwen Xiong, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments.

Optical Flow Estimation Speech Separation

Paper
Add Code

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

no code implementations • 8 Mar 2022 • Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha

Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction.

Talking Face Generation

Paper
Add Code

An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection

no code implementations • 10 Mar 2022 • Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

DeepFake based digital facial forgery is threatening the public media security, especially when lip manipulation has been used in talking face generation, the difficulty of fake video detection is further improved.

Decision Making Face Detection +2

Paper
Add Code

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset

no code implementations • 31 Mar 2022 • Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan

As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations • 31 Mar 2022 • Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie

Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

End-to-End Voice Conversion with Information Perturbation

no code implementations • 15 Jun 2022 • Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech.

Voice Conversion

Paper
Add Code

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

no code implementations • 3 Jul 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

Then, during the training of the conversational ASR system, the extractor will be frozen to extract the textual representation of preceding speech, while such representation is used as context fed to the ASR decoder through attention mechanism.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Intelligent Reflecting Surface-Aided Maneuvering Target Sensing: True Velocity Estimation

no code implementations • 30 Jul 2022 • Lei Xie, Xianghao Yu, S. H. Song

Maneuvering target sensing will be an important service of future vehicular networks, where precise velocity estimation is one of the core tasks.

Paper
Add Code

OLLIE: Derivation-based Tensor Program Optimizer

no code implementations • 2 Aug 2022 • Liyan Zheng, Haojie Wang, Jidong Zhai, Muyan Hu, Zixuan Ma, Tuowei Wang, Shizhi Tang, Lei Xie, Kezhao Huang, Zhihao Jia

Boosting the runtime performance of deep neural networks (DNNs) is critical due to their wide adoption in real-world tasks.

Paper
Add Code

Iterative Sparse Recovery based Passive Localization in Perceptive Mobile Networks

no code implementations • 22 Aug 2022 • Lei Xie, Shenghui Song

As a result, most existing methods require a large number of data samples to achieve an accurate estimate of the covariance matrix for the received signals, based on which a power spectrum is constructed for localization purposes.

Paper
Add Code

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS

no code implementations • 14 Sep 2022 • Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie

To alleviate the difficulty in training, we propose to model linguistic and prosodic information by considering cross-sentence, embedded structure in training.

Position Sentence

Paper
Add Code

NWPU-ASLP System for the VoicePrivacy 2022 Challenge

no code implementations • 24 Sep 2022 • Jixun Yao, Qing Wang, Li Zhang, Pengcheng Guo, Yuhao Liang, Lei Xie

Our system consists of four modules, including feature extractor, acoustic model, anonymization module, and neural vocoder.

Speaker Verification

Paper
Add Code

spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement

no code implementations • 17 Oct 2022 • Shubo Lv, Yihui Fu, Yukai Jv, Lei Xie, Weixin Zhu, Wei Rao, Yannan Wang

Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal.

Denoising Speech Enhancement

Paper
Add Code

TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

no code implementations • 26 Oct 2022 • Bowen Pang, Huan Zhao, Gaosheng Zhang, Xiaoyue Yang, Yang Sun, Li Zhang, Qing Wang, Lei Xie

In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering(SC) based diarization, target-speaker voice activity detection(TS-VAD) and end-to-end neural diarization(EEND) respectively.

Action Detection Activity Detection +2

Paper
Add Code

Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

no code implementations • 6 Nov 2022 • Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu

By directly scaling the formant and F0, the speaker distinguishability degradation of the anonymized speech caused by the introduction of other speakers is prevented.

Speaker Verification

Paper
Add Code

Preserving background sound in noise-robust voice conversion via multi-task learning

no code implementations • 6 Nov 2022 • Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie

Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios.

Multi-Task Learning Voice Conversion

Paper
Add Code

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

no code implementations • 9 Nov 2022 • Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.

Voice Conversion

Paper
Add Code

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

no code implementations • 16 Nov 2022 • Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).

Voice Conversion

Paper
Add Code

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling

no code implementations • 19 Nov 2022 • Xinfa Zhu, Yi Lei, Kun Song, Yongmao Zhang, Tao Li, Lei Xie

This paper aims to synthesize the target speaker's speech with desired speaking style and emotion by transferring the style and emotion from reference speech recorded by other speakers.

Expressive Speech Synthesis

Paper
Add Code

MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

no code implementations • 30 Nov 2022 • Yue Li, Li Zhang, Namin Wang, Jie Liu, Lei Xie

Specifically, the weight transfer fine-tuning aims to constrain the distance of the weights between the pre-trained model and the fine-tuned model, which takes advantage of the previously acquired discriminative ability from the large-scale out-domain datasets and avoids catastrophic forgetting and overfitting at the same time.

Speaker Verification

Paper
Add Code

A Tribute to Phil Bourne -- Scientist and Human

no code implementations • 8 Dec 2022 • Cameron Mura, Emma Candelier, Lei Xie

This Special Issue of Biomolecules, commissioned in honor of Dr. Philip E. Bourne, focuses on a new field of biomolecular data science.

Paper
Add Code

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

no code implementations • 17 Jan 2023 • Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie

In this paper, we propose an efficient approach to obtain a high quality contextual list for a unified streaming/non-streaming based E2E model.

Paper
Add Code

Two-step Band-split Neural Network Approach for Full-band Residual Echo Suppression

no code implementations • 13 Mar 2023 • Zihan Zhang, Shimin Zhang, Mingshuai Liu, Yanhong Leng, Zhe Han, Li Chen, Lei Xie

This paper describes a Two-step Band-split Neural Network (TBNN) approach for full-band acoustic echo cancellation.

Acoustic echo cancellation

Paper
Add Code

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

no code implementations • 14 Mar 2023 • Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie

Achieving 0. 446 in the final score and 0. 517 in the P. 835 score, our system ranks 4th in the non-real-time track.

Vocal Bursts Valence Prediction

Paper
Add Code

Joint BS Selection, User Association, and Beamforming Design for Network Integrated Sensing and Communication

no code implementations • 9 May 2023 • Yiming Xu, Dongfang Xu, Lei Xie, Shenghui Song

Different from conventional radar, the cellular network in the integrated sensing and communication (ISAC) system enables collaborative sensing by multiple sensing nodes, e. g., base stations (BSs).

Paper
Add Code

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations • 12 May 2023 • Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

Paper
Add Code

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

no code implementations • 21 May 2023 • Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method.

speech-recognition Speech Recognition

Paper
Add Code

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

no code implementations • 21 May 2023 • Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi

Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities.

Data Augmentation Knowledge Distillation +1

Paper
Add Code

DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting

no code implementations • 21 May 2023 • Shubo Lv, Xiong Wang, Sining Sun, Long Ma, Lei Xie

Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system.

Denoising Multi-Task Learning +4

Paper
Add Code

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

no code implementations • 23 May 2023 • Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition

no code implementations • 23 May 2023 • Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu

Different from UniSpeech, UniData2vec replaces the quantized discrete representations with continuous and contextual representations from a teacher model for phonetically-aware pre-training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

no code implementations • 1 Jun 2023 • Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie

By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words.

speech-recognition Speech Recognition

Paper
Add Code

LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models

no code implementations • 18 Jun 2023 • Zhichao Wang, Yuanzhe Chen, Lei Xie, Qiao Tian, Yuping Wang

An intuitive approach is to follow AudioLM - Tokenizing speech into semantic and acoustic tokens respectively by HuBERT and SoundStream, and converting source semantic tokens to target acoustic tokens conditioned on acoustic tokens of the target speaker.

Audio Generation Disentanglement +2

Paper
Add Code

MSW-Transformer: Multi-Scale Shifted Windows Transformer Networks for 12-Lead ECG Classification

no code implementations • 21 Jun 2023 • Renjie Cheng, Zhemin Zhuang, Shuxin Zhuang, Lei Xie, Jingfeng Guo

To address these challenges, we propose a single-layer Transformer network called Multi-Scale Shifted Windows Transformer Networks (MSW-Transformer), which uses a multi-window sliding attention mechanism at different scales to capture features in different dimensions.

Classification ECG Classification

Paper
Add Code

Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation

no code implementations • 6 Jul 2023 • Yuanjing Feng, Lei Xie, Jingqiang Wang, Jianzhong He, Fei Gao

At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors.

Paper
Add Code

Learning Universal and Robust 3D Molecular Representations with Graph Convolutional Networks

no code implementations • 24 Jul 2023 • Shuo Zhang, Yang Liu, Li Xie, Lei Xie

To combine the DNP descriptor and chemical features in molecules, we construct the Robust Molecular Graph Convolutional Network (RoM-GCN) which is capable to take both node and edge features into consideration when generating molecule representations.

Paper
Add Code

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

no code implementations • 29 Jul 2023 • Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie

However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to the challenges of cross-speaker cross-lingual emotion transfer - the heavy entanglement of speaker timbre, emotion, and language factors in the speech signal will make a system produce cross-lingual synthetic speech with an undesired foreign accent and weak emotion expressiveness.

Disentanglement Quantization +1

Paper
Add Code

Surrogate Empowered Sim2Real Transfer of Deep Reinforcement Learning for ORC Superheat Control

no code implementations • 5 Aug 2023 • Runze Lin, Yangyang Luo, Xialai Wu, Junghui Chen, Biao Huang, Lei Xie, Hongye Su

The Organic Rankine Cycle (ORC) is widely used in industrial waste heat recovery due to its simple structure and easy maintenance.

reinforcement-learning Transfer Learning

Paper
Add Code

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

no code implementations • 3 Sep 2023 • Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang

In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation.

Data Augmentation Disentanglement +3

Paper
Add Code

PromptVC: Flexible Stylistic Voice Conversion in Latent Space Driven by Natural Language Prompts

no code implementations • 17 Sep 2023 • Jixun Yao, Yuguang Yang, Yi Lei, Ziqian Ning, Yanni Hu, Yu Pan, JingJing Yin, Hongbin Zhou, Heng Lu, Lei Xie

In this study, we propose PromptVC, a novel style voice conversion approach that employs a latent diffusion model to generate a style vector driven by natural language prompts.

Voice Conversion

Paper
Add Code

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

no code implementations • 27 Sep 2023 • Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

Third, the model is unable to effectively address the noise in the unvoiced segments, lowering the sound quality.

Knowledge Distillation Voice Conversion

Paper
Add Code

SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition

no code implementations • 29 Sep 2023 • Hongfei Xue, Qijie Shao, Kaixun Huang, Peikun Chen, Lei Xie, Jie Liu

We first analyze the different layers of the SSL model for language-related and content-related information, uncovering layers that show a stronger correlation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VITS-Based Singing Voice Conversion Leveraging Whisper and multi-scale F0 Modeling

no code implementations • 4 Oct 2023 • Ziqian Ning, Yuepeng Jiang, Zhichao Wang, Bin Zhang, Lei Xie

This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023.

Voice Conversion

Paper
Add Code

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

no code implementations • 6 Oct 2023 • Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios.

Music Source Separation Speech Enhancement

Paper
Add Code

An Exploration of Task-decoupling on Two-stage Neural Post Filter for Real-time Personalized Acoustic Echo Cancellation

no code implementations • 7 Oct 2023 • Zihan Zhang, Jiayao Sun, Xianjun Xia, Ziqian Wang, Xiaopeng Yan, Yijian Xiao, Lei Xie

Utilization of speaker representation has extended the frontier of AEC, thus attracting many researchers' interest in personalized acoustic echo cancellation (PAEC).

Acoustic echo cancellation Speech Enhancement

Paper
Add Code

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

no code implementations • 7 Oct 2023 • Kaixun Huang, Ao Zhang, BinBin Zhang, Tianyi Xu, Xingchen Song, Lei Xie

However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

no code implementations • 22 Oct 2023 • Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

By introducing both cross-modal and conversational representations into the decoder, our model retains context over longer sentences without information loss, achieving relative accuracy improvements of 8. 8% and 23% on Mandarin conversation datasets HKUST and MagicData-RAMC, respectively, compared to the standard Conformer model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning

no code implementations • 26 Oct 2023 • Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie

This paper aims to build an expressive TTS system for multi-speakers, synthesizing a target speaker's speech with multiple styles and emotions.

Contrastive Learning Expressive Speech Synthesis

Paper
Add Code

Protein Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence

no code implementations • 5 Dec 2023 • Shuo Zhang, Lei Xie

Then we compute and update the protein-ligand interaction embedding based on the protein residue-level embeddings and ligand atom-level embeddings, and the geometric constraints in the inferred protein contact map and ligand distance map.

Drug Discovery Protein Language Model

Paper
Add Code

Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization

no code implementations • 7 Dec 2023 • Huan Zhao, Li Zhang, Yue Li, Yannan Wang, Hongji Wang, Wei Rao, Qing Wang, Lei Xie

The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual speaker diarization systems.

speaker-diarization Speaker Diarization

Paper
Add Code

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

no code implementations • 15 Dec 2023 • Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest.

Keyword Spotting

Paper
Add Code

SELM: Speech Enhancement Using Discrete Tokens and Language Models

no code implementations • 15 Dec 2023 • Ziqian Wang, Xinfa Zhu, Zihan Zhang, YuanJun Lv, Ning Jiang, Guoqing Zhao, Lei Xie

Given the intrinsic similarity between speech generation and speech enhancement, harnessing semantic information holds potential advantages for speech enhancement tasks.

Self-Supervised Learning Speech Enhancement

Paper
Add Code

Modality Exchange Network for Retinogeniculate Visual Pathway Segmentation

no code implementations • 3 Jan 2024 • Hua Han, Cheng Li, Lei Xie, Yuanjing Feng, Alou Diakite, Shanshan Wang

Secondly, we propose a cross-fusion module that further enhances the fusion of information between the two modalities.

Segmentation

Paper
Add Code

LESEN: Label-Efficient deep learning for Multi-parametric MRI-based Visual Pathway Segmentation

no code implementations • 3 Jan 2024 • Alou Diakite, Cheng Li, Lei Xie, Yuanjing Feng, Hua Han, Shanshan Wang

Recent research has shown the potential of deep learning in multi-parametric MRI-based visual pathway (VP) segmentation.

Segmentation

Paper
Add Code

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Pan Zhou, Lei Xie

While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Paper
Add Code

BS-PLCNet: Band-split Packet Loss Concealment Network with Multi-task Learning Framework and Multi-discriminators

no code implementations • 8 Jan 2024 • Zihan Zhang, Jiayao Sun, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

Packet loss is a common and unavoidable problem in voice over internet phone (VoIP) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code implementations • 19 Jan 2024 • Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Zhuo Chen, Lei Xie, Yuping Wang, Yuxuan Wang

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Language Modelling Voice Conversion

Paper
Add Code

Approximate Message Passing-Enhanced Graph Neural Network for OTFS Data Detection

no code implementations • 15 Feb 2024 • Wenhao Zhuang, Yuyi Mao, Hengtao He, Lei Xie, Shenghui Song, Yao Ge, Zhi Ding

Orthogonal time frequency space (OTFS) modulation has emerged as a promising solution to support high-mobility wireless communications, for which, cost-effective data detectors are critical.

Paper
Add Code

Anatomy-guided fiber trajectory distribution estimation for cranial nerves tractography

no code implementations • 29 Feb 2024 • Lei Xie, Qingrun Zeng, Huajun Zhou, Guoqiang Xie, Mingchu Li, Jiahao Huang, Jianan Cui, Hao Chen, Yuanjing Feng

Diffusion MRI tractography is an important tool for identifying and analyzing the intracranial course of cranial nerves (CNs).

Anatomy

Paper
Add Code

Facilitating Reinforcement Learning for Process Control Using Transfer Learning: Perspectives

no code implementations • 30 Mar 2024 • Runze Lin, Junghui Chen, Lei Xie, Hongye Su, Biao Huang

This paper provides insights into deep reinforcement learning (DRL) for process control from the perspective of transfer learning.

reinforcement-learning Transfer Learning

Paper
Add Code

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

no code implementations • 8 Apr 2024 • He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie

Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.

Lipreading Lip Reading +1

Paper
Add Code

MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection

no code implementations • 9 Apr 2024 • Haoyang He, Yuhu Bai, Jiangning Zhang, Qingdong He, Hongxu Chen, Zhenye Gan, Chengjie Wang, Xiangtai Li, Guanzhong Tian, Lei Xie

Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches.

Long-range modeling Unsupervised Anomaly Detection

Paper
Add Code

Dynamic fault detection and diagnosis of industrial alkaline water electrolyzer process with variational Bayesian dictionary learning

no code implementations • 15 Apr 2024 • Qi Zhang, Lei Xie, Weihua Xu, Hongye Su

A novel robust dynamic variational Bayesian dictionary learning (RDVDL) monitoring approach is proposed to improve the reliability and safety of AWE operation.

Paper
Add Code

Nonlinear sparse variational Bayesian learning based model predictive control with application to PEMFC temperature control

no code implementations • 15 Apr 2024 • Qi Zhang, Lei Wang, Weihua Xu, Hongye Su, Lei Xie

Variational inference is used by NSVB-MPC to assess the predictive accuracy and make the necessary corrections to quantify system uncertainty.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.