Search Results for author: Lei Xie

Found 79 papers, 24 papers with code

End-to-End Voice Conversion with Information Perturbation

no code implementations15 Jun 2022 Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech.

Voice Conversion

Efficient and Accurate Physics-aware Multiplex Graph Neural Networks for 3D Small Molecules and Macromolecule Complexes

no code implementations6 Jun 2022 Shuo Zhang, Yang Liu, Lei Xie

Recent advances in applying Graph Neural Networks (GNNs) to molecular science have showcased the power of learning three-dimensional (3D) structure representations with GNNs.

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations31 Mar 2022 Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie

Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.

Automatic Speech Recognition Speaker Separation

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset

no code implementations31 Mar 2022 Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan

As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc.

Automatic Speech Recognition Speaker Diarization

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

1 code implementation29 Mar 2022 BinBin Zhang, Di wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, Jianwei Niu

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model.

Speech Recognition

An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection

no code implementations10 Mar 2022 Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

DeepFake based digital facial forgery is threatening the public media security, especially when lip manipulation has been used in talking face generation, the difficulty of fake video detection is further improved.

Decision Making Face Detection +2

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

no code implementations8 Mar 2022 Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha

Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction.

Talking Face Generation

Audio-visual speech separation based on joint feature representation with cross-modal attention

no code implementations5 Mar 2022 Junwen Xiong, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments.

Optical Flow Estimation Speech Separation

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

no code implementations4 Mar 2022 Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei zha

Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding.

Multi-Task Learning Speech Enhancement

Conversational Speech Recognition By Learning Conversation-level Characteristics

no code implementations16 Feb 2022 Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers.

Automatic Speech Recognition

IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion

no code implementations2 Jan 2022 Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li

Specifically, prosody vector is first extracted from pre-trained VQ-Wav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization.

Quantization Voice Conversion

Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios

no code implementations23 Dec 2021 Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan

Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.

Speech Synthesis Style Transfer +1

One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

no code implementations24 Nov 2021 Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.

Style Transfer Voice Conversion

S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement

no code implementations16 Nov 2021 Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu

In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum.

Denoising Speech Denoising +1

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

no code implementations17 Oct 2021 Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score.

Variational Inference

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

1 code implementation7 Oct 2021 BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong Peng

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.

Optical Character Recognition Speech Recognition +1

AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person

no code implementations9 Aug 2021 Xinsheng Wang, Qicong Xie, Jihua Zhu, Lei Xie, Scharenborg

In this paper, we present an automatic method to generate synchronized speech and talking-head videos on the basis of text and a single face image of an arbitrary person as input.

Talking Head Generation

Controllable Context-aware Conversational Speech Synthesis

no code implementations21 Jun 2021 Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.

Speech Synthesis

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

no code implementations21 Jun 2021 Jian Cong, Shan Yang, Lei Xie, Dan Su

Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.

Speech Synthesis

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain

1 code implementation16 Jun 2021 Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie

Moreover, by including the data of variable numbers of speakers, our model can even better than the PIT-Conformer AR model with only 1/7 latency, obtaining WERs of 19. 9% and 34. 3% on WSJ0-2mix and WSJ0-3mix sets.

Automatic Speech Recognition

Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion

no code implementations16 Jun 2021 Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li

Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.

Style Transfer Voice Conversion

DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement

no code implementations16 Jun 2021 Shubo Lv, Yanxin Hu, Shimin Zhang, Lei Xie

Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020).

Speech Enhancement

Cross-Validated Tuning of Shrinkage Factors for MVDR Beamforming Based on Regularized Covariance Matrix Estimation

no code implementations5 Apr 2021 Lei Xie, Zishu He, Jun Tong, Jun Li, Jiangtao Xi

We propose leave-one-out cross-validation (LOOCV) choices for the shrinkage factors to optimize the beamforming performance, referred to as $\text{S}^2$CM-CV and STE-CV.

The Multi-speaker Multi-style Voice Cloning Challenge 2021

no code implementations5 Apr 2021 Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

1 code implementation31 Mar 2021 Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-Yi Lee, Lei Xie

Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.

AutoML Keyword Spotting

Regularized Covariance Estimation for Polarization Radar Detection in Compound Gaussian Sea Clutter

no code implementations17 Mar 2021 Lei Xie, Zishu He, Jun Tong, Tianle Liu, Jun Li, Jiangtao Xi

This paper investigates regularized estimation of Kronecker-structured covariance matrices (CM) for polarization radar in sea clutter scenarios where the data are assumed to follow the complex, elliptically symmetric (CES) distributions with a Kronecker-structured CM.

Wake Word Detection with Streaming Transformers

no code implementations8 Feb 2021 Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Modern wake word detection systems usually rely on neural networks for acoustic modeling.

WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

3 code implementations2 Feb 2021 Zhuoyuan Yao, Di wu, Xiong Wang, BinBin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei

In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Speech Recognition

CODE-AE: A Coherent De-confounding Autoencoder for Predicting Patient-Specific Drug Response From Cell Line Transcriptomics

1 code implementation31 Jan 2021 Di He, Lei Xie

Thus, CODE-AE provides a useful framework to take advantage of in vitro omics data for developing generalized patient predictive models.

Transfer Learning

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

5 code implementations10 Dec 2020 BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei

In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.

Speech Recognition

Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training

1 code implementation3 Dec 2020 Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu

In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.

Audio Generation Disentanglement +1

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

no code implementations24 Nov 2020 Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, LingHui Chen, Lei Xie, Shan Liu

On one hand, we propose to discriminate ground-truth waveform from synthetic one in frequency domain for offering more consistency guarantees instead of only in time domain.

Speech Synthesis

Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter

no code implementations17 Nov 2020 Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie

End-to-end models are favored in automatic speech recognition (ASR) because of its simplified system structure and superior performance.

Automatic Speech Recognition

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures

1 code implementation15 Nov 2020 Shuo Zhang, Yang Liu, Lei Xie

The prediction of physicochemical properties from molecular structures is a crucial task for artificial intelligence aided molecular design.

Drug Discovery Formation Energy

The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

no code implementations13 Nov 2020 Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao

Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.

Sound Audio and Speech Processing

IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

1 code implementation4 Nov 2020 Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, DongYan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

In this challenge, we open source a sizable speech, keyword, echo and noise corpus for promoting data-driven methods, particularly deep-learning approaches on KWS and SSL.

Sound Audio and Speech Processing

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

no code implementations25 Oct 2020 Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie

The AutoSpeech challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to speech processing tasks.

AutoML General Classification

A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine

no code implementations9 Oct 2020 Di He, Lei Xie

An unsolved fundamental problem in biology and ecology is to predict observable traits (phenotypes) from a new genetic constitution (genotype) of an organism under environmental perturbations (e. g., drug treatment).

Domain Adaptation Representation Learning

An End-to-end Architecture of Online Multi-channel Speech Separation

no code implementations7 Sep 2020 Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie

Previously, we introduced a sys-tem, calledunmixing, fixed-beamformerandextraction(UFE), that was shown to be effective in addressing the speech over-lap problem in conversation transcription.

Speech Recognition Speech Separation

AIPerf: Automated machine learning as an AI-HPC benchmark

1 code implementation17 Aug 2020 Zhixiang Ren, Yongheng Liu, Tianhui Shi, Lei Xie, Yue Zhou, Jidong Zhai, Youhui Zhang, Yunquan Zhang, WenGuang Chen

The de facto HPC benchmark LINPACK can not reflect AI computing power and I/O performance without representative workload.


Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

1 code implementation12 Aug 2020 Haohe Liu, Lei Xie, Jian Wu, Geng Yang

We aim to address the major issues in CNN-based high-resolution MSS model: high computational cost and weight sharing between distinctly different bands.

Audio and Speech Processing Sound

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

7 code implementations Interspeech 2020 Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie

Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality.

Speech Enhancement Audio and Speech Processing Sound

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

no code implementations NeurIPS 2020 Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences.

Speech Recognition Speech Separation

Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

no code implementations21 May 2020 Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie

Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies.

Speech Recognition

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

no code implementations21 May 2020 Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.

Sound Audio and Speech Processing

Wake Word Detection with Alignment-Free Lattice-Free MMI

1 code implementation17 May 2020 Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Always-on spoken language interfaces, e. g. personal digital assistants, rely on a wake word to start processing spoken input.

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

9 code implementations Interspeech2020 2020 Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie

In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.

Sound Audio and Speech Processing

Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise

no code implementations28 Apr 2020 Shan Yang, Yuxuan Wang, Lei Xie

As for the speech-side noise, we propose to learn a noise-independent feature in the auto-regressive decoder through adversarial training and data augmentation, which does not need an extra speech enhancement model.

Data Augmentation Denoising +3

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation18 Sep 2019 Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

Automatic Speech Recognition Data Augmentation +2

Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation

1 code implementation4 Jul 2019 Shuo Zhang, Lei Xie

To improve the performance of attention-based GNNs, we propose cardinality preserved attention (CPA) models that can be applied to any kind of attention mechanisms.

Graph Classification Graph Representation Learning +1

Building a mixed-lingual neural TTS system with only monolingual data

no code implementations12 Apr 2019 Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu

When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded.

A New GAN-based End-to-End TTS Training Algorithm

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

However, the autoregressive module training is affected by the exposure bias, or the mismatch between the different distributions of real and predicted data.

Transfer Learning

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS.

Time Domain Audio Visual Speech Separation

no code implementations7 Apr 2019 Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.

Audio and Speech Processing Sound

Exploring RNN-Transducer for Chinese Speech Recognition

no code implementations13 Nov 2018 Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.

Automatic Speech Recognition

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

no code implementations16 Jun 2018 Pengcheng Guo, Hai-Hua Xu, Lei Xie, Eng Siong Chng

In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data.

Speech Recognition

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

no code implementations10 Jun 2018 Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li

We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.

Dynamic Time Warping Word Embeddings

Domain Adversarial Training for Accented Speech Recognition

no code implementations7 Jun 2018 Sining Sun, Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie

In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem.

Accented Speech Recognition Multi-Task Learning

Training Augmentation with Adversarial Examples for Robust Speech Recognition

no code implementations7 Jun 2018 Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie

This paper explores the use of adversarial examples in training speech recognition systems to increase robustness of deep neural network acoustic models.

Data Augmentation Robust Speech Recognition

Attention-based End-to-End Models for Small-Footprint Keyword Spotting

2 code implementations29 Mar 2018 Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie

In this paper, we propose an attention-based end-to-end neural approach for small-footprint keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality KWS system.

Small-Footprint Keyword Spotting

Empirical Evaluation of Speaker Adaptation on DNN based Acoustic Model

1 code implementation27 Mar 2018 Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie

Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities.

Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

1 code implementation27 Mar 2018 Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie

First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset.

Robust Speech Recognition Speech Dereverberation

Attention-Based End-to-End Speech Recognition on Voice Search

no code implementations22 Jul 2017 Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie

Previous attempts have shown that applying attention-based encoder-decoder to Mandarin speech recognition was quite difficult due to the logographic orthography of Mandarin, the large vocabulary and the conditional dependency of the attention model.

L2 Regularization Speech Recognition

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

4 code implementations6 Jul 2017 Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dong-Yan Huang, Haizhou Li

In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN).


A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

no code implementations6 Oct 2015 Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, Minghui Dong

State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling.

Speech Synthesis

Bi-objective Optimization for Robust RGB-D Visual Odometry

no code implementations27 Nov 2014 Tao Han, Chao Xu, Ryan Loxton, Lei Xie

This paper considers a new bi-objective optimization formulation for robust RGB-D visual odometry.

Visual Odometry

Cannot find the paper you are looking for? You can Submit a new open access paper.