Search Results for author: Yu Tsao

Found 92 papers, 23 papers with code

A Flexible and Extensible Framework for Multiple Answer Modes Question Answering

no code implementations ROCLING 2021 Cheng-Chung Fan, Chia-Chih Kuo, Shang-Bao Luo, Pei-Jun Liao, Kuang-Yu Chang, Chiao-Wei Hsu, Meng-Tse Wu, Shih-Hong Tsai, Tzu-Man Wu, Aleksandra Smolka, Chao-Chun Liang, Hsin-Min Wang, Kuan-Yu Chen, Yu Tsao, Keh-Yih Su

Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks.

Question Answering

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport

1 code implementation NeurIPS 2021 Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao

This paper presents a novel discriminator-constrained optimal transport network (DOTN) that performs unsupervised domain adaptation for speech enhancement (SE), which is an essential regression task in speech processing.

Speech Enhancement Unsupervised Domain Adaptation

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

no code implementations10 Nov 2021 Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.

Meta-Learning Speech Enhancement

HASA-net: A non-intrusive hearing-aid speech assessment network

no code implementations10 Nov 2021 Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.

Speech Quality

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

no code implementations8 Nov 2021 Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing.

Model Compression Speech Enhancement

InQSS: a speech intelligibility assessment model using a multi-task learning network

no code implementations4 Nov 2021 Yu-Wen Chen, Yu Tsao

In this study, we propose InQSS, a speech intelligibility assessment model that uses both spectrogram and scattering coefficients as input features.

Multi-Task Learning

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

no code implementations3 Nov 2021 Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

More specifically, the MOSA-Net is designed to estimate speech quality, intelligibility, and distortion assessment scores based on a test speech signal as input.

Speech Enhancement Speech Quality

Speech Enhancement Based on Cyclegan with Noise-informed Training

no code implementations19 Oct 2021 Wen-Yuan Ting, Syu-Siang Wang, Hsin-Li Chang, Borching Su, Yu Tsao

Speech enhancement (SE) approaches can be classified into supervised and unsupervised categories.

Speech Enhancement

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

1 code implementation12 Oct 2021 Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.

Speech Enhancement Speech Quality

A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming

1 code implementation8 Oct 2021 Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.

Transfer Learning

Analyzing the Robustness of Unsupervised Speech Recognition

no code implementations7 Oct 2021 Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao

In this work, we further analyze the training robustness of unsupervised ASR on the domain mismatch scenarios in which the domains of unpaired speech and text are different.

Speech Recognition Unsupervised Speech Recognition

Mutual Information Continuity-constrained Estimator

no code implementations29 Sep 2021 Tsun-An Hsieh, Cheng Yu, Ying Hung, Chung-Ching Lin, Yu Tsao

Accordingly, we propose Mutual Information Continuity-constrained Estimator (MICE).

Density Estimation

Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion

no code implementations8 Sep 2021 Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device.

Dynamic Time Warping Speech Enhancement +1

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

no code implementations25 Jul 2021 Yen-Ju Lu, Yu Tsao, Shinji Watanabe

Based on this property, we propose a diffusion probabilistic model-based speech enhancement (DiffuSE) model that aims to recover clean speech signals from noisy signals.

Speech Enhancement

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

no code implementations20 Jul 2021 Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.

Voice Conversion

Intermittent Speech Recovery

no code implementations9 Jun 2021 Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo

This paper presents a deep-learning-based speech recovery system that reconstructs intermittent speech signals from self-powered IoT devices.

Speech Quality

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

no code implementations2 Jun 2021 Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda

First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality.

Voice Conversion

Multimodal Deep Learning Framework for Image Popularity Prediction on Social Media

no code implementations18 May 2021 Fatma S. Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, Yu Tsao

In this study, motivated by multimodal learning, which uses information from various modalities, and the current success of convolutional neural networks (CNNs) in various fields, we propose a deep learning model, called visual-social convolutional neural network (VSCNN), which predicts the popularity of a posted image by incorporating various types of visual and social features into a unified network model.

Image popularity prediction Multimodal Deep Learning

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

2 code implementations8 Apr 2021 Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

Speech Enhancement

The AS-NU System for the M2VoC Challenge

no code implementations7 Apr 2021 Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang

The first track focuses on using a small number of 100 target utterances for voice cloning, while the second track focuses on using only 5 target utterances for voice cloning.

Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification

no code implementations7 Apr 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

However, in most of the discriminative training for SiamNN, only the distribution of pair-wised sample distances is considered, and the additional discriminative information in joint distribution of samples is ignored.

Feature Selection Speaker Verification

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

no code implementations7 Feb 2021 Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao

Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments.

Coupling a generative model with a discriminative learning framework for speaker verification

no code implementations9 Jan 2021 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By initializing the two-branch neural network with the generatively learned model parameters of the JB model, we train the model parameters with the pairwise samples as a binary discrimination task.

Decision Making Feature Selection +1

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

1 code implementation7 Jan 2021 Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi

In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI).

Multi-Task Learning Speaker Identification +2

Unsupervised neural adaptation model based on optimal transport for spoken language identification

no code implementations24 Dec 2020 Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

By minimizing the classification loss on the training data set with the adaptation loss on both training and testing data sets, the statistical distribution difference between training and testing domains is reduced.

Language Identification Spoken language identification

Domain-adaptive Fall Detection Using Deep Adversarial Training

no code implementations20 Dec 2020 Kai-Chun Liu, Michael Can, Heng-Cheng Kuo, Chia-Yeh Hsieh, Hsiang-Yun Huang, Chia-Tai Chan, Yu Tsao

The proposed DAFD can transfer knowledge from the source domain to the target domain by minimizing the domain discrepancy to avoid mismatch problems.

Domain Adaptation Transfer Learning

Speech Enhancement with Zero-Shot Model Selection

no code implementations17 Dec 2020 Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Experimental results confirmed that the proposed ZMOS approach can achieve better performance in both seen and unseen noise types compared to the baseline systems and other model selection systems, which indicates the effectiveness of the proposed approach in providing robust SE performance.

Ensemble Learning Model Selection +2

ECG Signal Super-resolution by Considering Reconstruction and Cardiac Arrhythmias Classification Loss

no code implementations7 Dec 2020 Tsai-Min Chen, Yuan-Hong Tsai, Huan-Hsin Tseng, Jhih-Yu Chen, Chih-Han Huang, Guo-Yuan Li, Chun-Yen Shen, Yu Tsao

In this study, we proposed a deep-learning-based ECG signal super-resolution framework (termed ESRNet) to recover compressed ECG signals by considering the joint effect of signal reconstruction and CA classification accuracies.

Electrocardiography (ECG) General Classification +1

Speech enhancement guided by contextual articulatory information

no code implementations15 Nov 2020 Yen-Ju Lu, Chia-Yu Chang, Yu Tsao, Jeih-weih Hung

Previous studies have confirmed the effectiveness of leveraging articulatory information to attain improved speech enhancement (SE) performance.

automatic-speech-recognition Speech Enhancement +1

STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

1 code implementation9 Nov 2020 Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net.

A Study of Incorporating Articulatory Movement Information in Speech Enhancement

no code implementations3 Nov 2020 Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Xugang Lu, Yu Tsao

Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs).

Speech Enhancement Speech Quality

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

1 code implementation28 Oct 2020 Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.

Speech Enhancement Speech Quality

The Academia Sinica Systems of Voice Conversion for VCC2020

no code implementations6 Oct 2020 Yu-Huai Peng, Cheng-Hung Hu, Alexander Kang, Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang

This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2).

Voice Conversion

Improved Lite Audio-Visual Speech Enhancement

no code implementations30 Aug 2020 Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao

Experimental results confirm that compared to conventional AVSE systems, iLAVSE can effectively overcome the aforementioned three practical issues and can improve enhancement performance.

Speech Enhancement

CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application

1 code implementation21 Aug 2020 Yu-Wen Chen, Kuo-Hsuan Hung, You-Jin Li, Alexander Chao-Fu Kang, Ya-Hsin Lai, Kai-Chun Liu, Sze-Wei Fu, Syu-Siang Wang, Yu Tsao

In this study, we present a deep learning-based speech signal-processing mobile application, called CITISEN, which can perform three functions: speech enhancement (SE), model adaptation (MA), and acoustic scene conversion (ASC).

Speech Enhancement Speech Quality +1

Incorporating Broad Phonetic Information for Speech Enhancement

no code implementations13 Aug 2020 Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao

In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals.

Denoising Speech Enhancement +1

Using Deep Learning and Explainable Artificial Intelligence in Patients' Choices of Hospital Levels

no code implementations24 Jun 2020 Lichin Chen, Yu Tsao, Ji-Tian Sheu

This study also used explainable artificial intelligence methods to interpret the contribution of features for the general public and individuals.

Explainable artificial intelligence

SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

1 code implementation24 May 2020 Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao

The results verify that the SERIL model can effectively adjust itself to new noise environments while overcoming the catastrophic forgetting issue.

Fine-tuning Incremental Learning +1

Lite Audio-Visual Speech Enhancement

1 code implementation24 May 2020 Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang

Previous studies have confirmed the effectiveness of incorporating visual information into speech enhancement (SE) systems.

Data Compression Denoising +1

MIMO Speech Compression and Enhancement Based on Convolutional Denoising Autoencoder

no code implementations24 May 2020 You-Jin Li, Syu-Siang Wang, Yu Tsao, Borching Su

For speech-related applications in IoT environments, identifying effective methods to handle interference noises and compress the amount of data in transmissions is essential to achieve high-quality services.

Denoising Speech Quality

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

1 code implementation6 Apr 2020 Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU).

Denoising Speech Denoising +2

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

1 code implementation Interspeech 2020 Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments.

Audio and Speech Processing Sound

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion

1 code implementation22 Jan 2020 Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech.

Voice Conversion

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

no code implementations6 Jan 2020 Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao

The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.

Denoising Speech Enhancement

Cross-scale Attention Model for Acoustic Event Classification

no code implementations27 Dec 2019 Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation.

Classification General Classification

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

no code implementations9 Dec 2019 Chao-I Tuan, Yuan-Kuei Wu, Hung-Yi Lee, Yu Tsao

Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio.

Speech Separation

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

no code implementations22 Nov 2019 Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).

Ensemble Learning Speech Enhancement

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement

no code implementations26 Sep 2019 Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang

Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts.

Denoising Speech Enhancement +1 Sound Audio and Speech Processing

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

no code implementations31 May 2019 Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao

In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids.

Denoising Quantization +1

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

5 code implementations13 May 2019 Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin

Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores.

Speech Enhancement

Learning with Learned Loss Function: Speech Enhancement with Quality-Net to Improve Perceptual Evaluation of Speech Quality

1 code implementation6 May 2019 Szu-Wei Fu, Chien-Feng Liao, Yu Tsao

Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently.

Speech Enhancement Speech Quality

Incorporating Symbolic Sequential Modeling for Speech Enhancement

no code implementations30 Apr 2019 Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai

In this study, the symbolic sequences for acoustic signals are obtained as discrete representations with a Vector Quantized Variational Autoencoder algorithm.

Language Modelling Speech Enhancement +1

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

5 code implementations17 Apr 2019 Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Voice Conversion

Boundary-Preserved Deep Denoising of the Stochastic Resonance Enhanced Multiphoton Images

no code implementations12 Apr 2019 Sheng-Yong Niu, Lun-Zhang Guo, Yue Li, Tzung-Dau Wang, Yu Tsao, Tzu-Ming Liu

As the rapid growth of high-speed and deep-tissue imaging in biomedical research, it is urgent to find a robust and effective denoising method to retain morphological features for further texture analysis and segmentation.

Denoising Texture Classification

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

no code implementations27 Nov 2018 Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang

Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation.

Voice Conversion

Robustness against the channel effect in pathological voice detection

no code implementations26 Nov 2018 Yi-Te Hsu, Zining Zhu, Chi-Te Wang, Shih-Hau Fang, Frank Rudzicz, Yu Tsao

In this study, we propose a detection system for pathological voice, which is robust against the channel effect.

Unsupervised Domain Adaptation

Speech Enhancement Based on Reducing the Detail Portion of Speech Spectrograms in Modulation Domain via Discrete Wavelet Transform

1 code implementation8 Nov 2018 Shih-kuang Lee, Syu-Siang Wang, Yu Tsao, Jeih-weih Hung

The presented DWT-based SE method with various scaling factors for the detail part is evaluated with a subset of Aurora-2 database, and the PESQ metric is used to indicate the quality of processed speech signals.

Speech Enhancement Speech Quality

Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech

1 code implementation30 Oct 2018 Li-Wei Chen, Hung-Yi Lee, Yu Tsao

This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.

Voice Conversion

Voice Conversion Based on Cross-Domain Features Using Variational Auto Encoders

1 code implementation29 Aug 2018 Wen-Chin Huang, Hsin-Te Hwang, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang

An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner.

Voice Conversion

A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)

no code implementations17 Aug 2018 Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

We evaluated the proposed EOFP quantization technique on two types of neural networks, namely, bidirectional long short-term memory (BLSTM) and fully convolutional neural network (FCN), on a speech enhancement task.

Quantization Speech Enhancement

Noise Adaptive Speech Enhancement using Domain Adversarial Training

1 code implementation19 Jul 2018 Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang

The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.

Sound Audio and Speech Processing

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

no code implementations12 Sep 2017 Szu-Wei Fu, Tao-Wei Wang, Yu Tsao, Xugang Lu, Hisashi Kawai

For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model.

automatic-speech-recognition Speech Enhancement +1

Audio-Visual Speech Enhancement based on Multimodal Deep Convolutional Neural Network

no code implementations1 Sep 2017 Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

In the proposed AVDCNN SE model, audio and visual data are first processed using individual CNNs, and then, fused into a joint network to generate enhanced speech at the output layer.

Speech Enhancement

Complex spectrogram enhancement by convolutional neural network with multi-metrics learning

no code implementations27 Apr 2017 Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu

This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously.

Speech Enhancement

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

1 code implementation4 Apr 2017 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios.

Voice Conversion

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

no code implementations30 Mar 2017 Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang

Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.

Multi-Task Learning Speech Enhancement

Raw Waveform-based Speech Enhancement by Fully Convolutional Networks

no code implementations7 Mar 2017 Szu-Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai

Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform.

Denoising Speech Enhancement +1

Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

3 code implementations13 Oct 2016 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.

Voice Conversion

Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network

no code implementations13 Oct 2016 Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task.

Speech Enhancement Speech Synthesis +1

Cannot find the paper you are looking for? You can Submit a new open access paper.