Search Results for author: Xugang Lu

Found 25 papers, 6 papers with code

Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition

no code implementations29 Jul 2022 Peng Shen, Xugang Lu, Hisashi Kawai

For Mandarin end-to-end (E2E) automatic speech recognition (ASR) tasks, compared to character-based modeling units, pronunciation-based modeling units could improve the sharing of modeling units in model training but meet homophone problems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Partial Coupling of Optimal Transport for Spoken Language Identification

no code implementations31 Mar 2022 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

In order to reduce domain discrepancy to improve the performance of cross-domain spoken language identification (SLID) system, as an unsupervised domain adaptation (UDA) method, we have proposed a joint distribution alignment (JDA) model based on optimal transport (OT).

Language Identification Spoken language identification +1

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

1 code implementation31 Mar 2022 Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance.

Speech Enhancement

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

no code implementations17 Mar 2022 Ruiteng Zhang, Jianguo Wei, Xugang Lu, Wenhuan Lu, Di Jin, Junhai Xu, Lin Zhang, Yantao Ji, Jianwu Dang

Therefore, in the most current state-of-the-art network architectures, only a few branches corresponding to a limited number of temporal scales could be designed for speaker embeddings.

Speaker Verification

A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement

no code implementations24 Jan 2022 Tassadaq Hussain, Wei-Chien Wang, Mandar Gogate, Kia Dashtipour, Yu Tsao, Xugang Lu, Adeel Ahsan, Amir Hussain

To address this problem, we propose to integrate a novel temporal attentive-pooling (TAP) mechanism into a conventional convolutional recurrent neural network, termed as TAP-CRNN.

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport

1 code implementation NeurIPS 2021 Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao

This paper presents a novel discriminator-constrained optimal transport network (DOTN) that performs unsupervised domain adaptation for speech enhancement (SE), which is an essential regression task in speech processing.

Speech Enhancement Unsupervised Domain Adaptation

CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization

1 code implementation26 Oct 2021 Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu

Automatic speaker verification (ASV) systems, which determine whether two speeches are from the same speaker, mainly focus on verification accuracy while ignoring inference speed.

Speaker Verification

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

3 code implementations8 Apr 2021 Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

Speech Enhancement

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification

no code implementations7 Apr 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

However, in most of the discriminative training for SiamNN, only the distribution of pair-wised sample distances is considered, and the additional discriminative information in joint distribution of samples is ignored.

Binary Classification feature selection +1

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

no code implementations7 Feb 2021 Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao

Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments.

Coupling a generative model with a discriminative learning framework for speaker verification

no code implementations9 Jan 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By initializing the two-branch neural network with the generatively learned model parameters of the JB model, we train the model parameters with the pairwise samples as a binary discrimination task.

Decision Making feature selection +1

Unsupervised neural adaptation model based on optimal transport for spoken language identification

no code implementations24 Dec 2020 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By minimizing the classification loss on the training data set with the adaptation loss on both training and testing data sets, the statistical distribution difference between training and testing domains is reduced.

Language Identification Spoken language identification

A Study of Incorporating Articulatory Movement Information in Speech Enhancement

no code implementations3 Nov 2020 Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Xugang Lu, Yu Tsao

Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs).

Speech Enhancement

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

1 code implementation28 Oct 2020 Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.

Speech Enhancement

Incorporating Broad Phonetic Information for Speech Enhancement

no code implementations13 Aug 2020 Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao

In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals.

Denoising Speech Enhancement

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

1 code implementation6 Apr 2020 Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU).

Denoising Speech Denoising +2

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

no code implementations6 Jan 2020 Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao

The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.

Denoising Speech Enhancement

Cross-scale Attention Model for Acoustic Event Classification

no code implementations27 Dec 2019 Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation.

Classification General Classification

Incorporating Symbolic Sequential Modeling for Speech Enhancement

no code implementations30 Apr 2019 Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai

In this study, the symbolic sequences for acoustic signals are obtained as discrete representations with a Vector Quantized Variational Autoencoder algorithm.

Language Modelling Speech Enhancement

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

no code implementations12 Sep 2017 Szu-Wei Fu, Tao-Wei Wang, Yu Tsao, Xugang Lu, Hisashi Kawai

For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning

no code implementations27 Apr 2017 Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu

This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously.

Speech Enhancement

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

no code implementations7 Mar 2017 Szu-Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai

Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform.

Denoising Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.