no code implementations • ROCLING 2022 • Shang-Bao Luo, Cheng-Chung Fan, Kuan-Yu Chen, Yu Tsao, Hsin-Min Wang, Keh-Yih Su
This paper also provides a baseline system and shows its performance on this dataset.
no code implementations • ROCLING 2021 • Cheng-Chung Fan, Chia-Chih Kuo, Shang-Bao Luo, Pei-Jun Liao, Kuang-Yu Chang, Chiao-Wei Hsu, Meng-Tse Wu, Shih-Hong Tsai, Tzu-Man Wu, Aleksandra Smolka, Chao-Chun Liang, Hsin-Min Wang, Kuan-Yu Chen, Yu Tsao, Keh-Yih Su
Only a few of them adopt several answer generation modules for providing different mechanisms; however, they either lack an aggregation mechanism to merge the answers from various modules, or are too complicated to be implemented with neural networks.
no code implementations • 16 Mar 2023 • Li-Chin Chen, Jung-Nien Lai, Hung-En Lin, Hsien-Te Chen, Kuo-Hsuan Hung, Yu Tsao
Low back pain (LBP) and sciatica may require surgical therapy when they are symptomatic of severe pain.
no code implementations • 13 Mar 2023 • Li-Chin Chen, Kuo-Hsuan Hung, Yi-Ju Tseng, Hsin-Yao Wang, Tse-Min Lu, Wei-Chieh Huang, Yu Tsao
In this study, we leveraged self-supervised learning (SSL) and transfer learning to overcome the above-mentioned barriers, transferring patient progress trends in cardiovascular laboratory parameters from prevalent cases to rare or specific cardiovascular events detection.
no code implementations • 7 Mar 2023 • Tin-Han Chi, Kai-Chun Liu, Chia-Yeh Hsieh, Yu Tsao, Chia-Tai Chan
The experiment results show that PreFallKD could boost the student model during the testing phase and achieves reliable F1-score (92. 66%) and lead time (551. 3 ms).
no code implementations • 3 Feb 2023 • Huan-Hsin Tseng, Hsin-Yi Lin, Kuo-Hsuan Hung, Yu Tsao
The method shows an increase in efficiency and accuracy for domain adaptation.
1 code implementation • 11 Dec 2022 • Yu-Wen Chen, Hsin-Min Wang, Yu Tsao
We converted the script into a speech corpus using two text-to-speech systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 11 Nov 2022 • Hsin-Yi Lin, Huan-Hsin Tseng, Yu Tsao
It has been shown recently that deep learning based models are effective on speech quality prediction and could outperform traditional metrics in various perspectives.
1 code implementation • Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2022 • Ammarah Hashmi, Sahibzada Adil Shahzad, Wasim Ahmad, Chia Wen Lin, Yu Tsao, Hsin-Min Wang
The recent rapid revolution in Artificial Intelligence (AI) technology has enabled the creation of hyper-realistic deepfakes, and detecting deepfake videos (also known as AIsynthesized videos) has become a critical task.
Ranked #1 on
Multimodal Forgery Detection
on FakeAVCeleb
(using extra training data)
no code implementations • APSIPA ASC 2022 2022 • Sahibzada Adil Shahzad, Ammarah Hashmi, Sarwar Khan, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang
Deepfake technology has advanced a lot, but it is a double-sided sword for the community.
Ranked #1 on
DeepFake Detection
on FakeAVCeleb
1 code implementation • 2 Nov 2022 • Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
1 code implementation • 1 Nov 2022 • Chan-Jan Hsu, Ho-Lam Chung, Hung-Yi Lee, Yu Tsao
In Spoken language understanding (SLU), a natural solution is concatenating pre-trained speech models (e. g. HuBERT) and pretrained language models (PLM, e. g. T5).
no code implementations • 31 Oct 2022 • I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou
In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HUBERT can be generalized to audio-visual regression tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+6
1 code implementation • 27 Oct 2022 • Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation.
no code implementations • 27 Oct 2022 • Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that the training of neural network models must be done in an unsupervised manner, and there is an inevitable mismatch between their training criterion and evaluation metric.
1 code implementation • 24 Oct 2022 • Kuan-Chen Wang, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao
Electrocardiogram (ECG) artifact contamination often occurs in surface electromyography (sEMG) applications when the measured muscles are in proximity to the heart.
no code implementations • 21 Sep 2022 • Yin-Ping Cho, Yu Tsao, Hsin-Min Wang, Yi-Wen Liu
Singing voice synthesis (SVS) is the computer production of a human-like singing voice from given musical scores.
1 code implementation • 19 Jul 2022 • Yen-Ju Lu, Xuankai Chang, Chenda Li, Wangyou Zhang, Samuele Cornell, Zhaoheng Ni, Yoshiki Masuyama, Brian Yan, Robin Scheibler, Zhong-Qiu Wang, Yu Tsao, Yanmin Qian, Shinji Watanabe
To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 18 Jun 2022 • Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin, Chu-Song Chen, Hsin-Min Wang, Yu Tsao
NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model.
no code implementations • 16 Jun 2022 • Li-Chin Chen, Po-Hsun Chen, Richard Tzong-Han Tsai, Yu Tsao
Further, the addition of noisy speech signals is observed to improve quality and intelligibility.
no code implementations • ACL 2022 • Chan-Jan Hsu, Hung-Yi Lee, Yu Tsao
Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks.
1 code implementation • 9 Apr 2022 • Shih-kuang Lee, Yu Tsao, Hsin-Min Wang
This study investigated the cepstrogram properties and demonstrated their effectiveness as powerful countermeasures against replay attacks.
no code implementations • 7 Apr 2022 • Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention.
1 code implementation • 7 Apr 2022 • Kuo-Hsuan Hung, Szu-Wei Fu, Huan-Hsin Tseng, Hsin-Tien Chiang, Yu Tsao, Chii-Wann Lin
We further study the relationship between the noise robustness of SSL representation via clean-noisy distance (CN distance) and the layer importance for SE.
Ranked #7 on
Speech Enhancement
on VoiceBank + DEMAND
no code implementations • 7 Apr 2022 • Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users.
no code implementations • 1 Apr 2022 • Chiang-Lin Tai, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Children speech recognition is indispensable but challenging due to the diversity of children's speech.
1 code implementation • 31 Mar 2022 • Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao
Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance.
Ranked #4 on
Speech Enhancement
on VoiceBank + DEMAND
no code implementations • 31 Mar 2022 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
In order to reduce domain discrepancy to improve the performance of cross-domain spoken language identification (SLID) system, as an unsupervised domain adaptation (UDA) method, we have proposed a joint distribution alignment (JDA) model based on optimal transport (OT).
1 code implementation • 30 Mar 2022 • Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation.
no code implementations • 28 Mar 2022 • Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, Hsin-Min Wang
Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events.
1 code implementation • 25 Mar 2022 • Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao, Hsin-Min Wang
In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 22 Feb 2022 • Syu-Siang Wang, Chi-Te Wang, Chih-Chung Lai, Yu Tsao, Shih-Hau Fang
The experiments were conducted on a large-scale database, wherein 1, 045 continuous speech were collected by the speech clinic of a hospital from 2012 to 2019.
no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen
Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.
no code implementations • 14 Feb 2022 • Haibin Wu, Heng-Cheng Kuo, Naijun Zheng, Kuo-Hsuan Hung, Hung-Yi Lee, Yu Tsao, Hsin-Min Wang, Helen Meng
Also ADD 2022 is the first challenge to propose the partially fake audio detection task.
no code implementations • 14 Feb 2022 • Kuan-Chen Wang, Kai-Chun Liu, Hsin-Min Wang, Yu Tsao
Multimodal learning has been proven to be an effective method to improve speech enhancement (SE) performance, especially in challenging situations such as low signal-to-noise ratios, speech noise, or unseen noise types.
no code implementations • 11 Feb 2022 • Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Yu Tsao, Amir Hussain
Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are often trained to minimise the feature distance between noise-free speech and enhanced speech signals.
3 code implementations • 10 Feb 2022 • Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao
Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.
no code implementations • 8 Feb 2022 • Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Yu Tsao, Amir Hussain
Current deep learning (DL) based approaches to speech intelligibility enhancement in noisy environments are generally trained to minimise the distance between clean and enhanced speech features.
no code implementations • 24 Jan 2022 • Tassadaq Hussain, Wei-Chien Wang, Mandar Gogate, Kia Dashtipour, Yu Tsao, Xugang Lu, Adeel Ahsan, Amir Hussain
To address this problem, we propose to integrate a novel temporal attentive-pooling (TAP) mechanism into a conventional convolutional recurrent neural network, termed as TAP-CRNN.
no code implementations • 7 Dec 2021 • Li-Chin Chen, Ji-Tian Sheu, Yuh-Jue Chuang, Yu Tsao
The aim of this study is to propose a deep neural network approach to model the complex decision of patient choice in travel distance to access care, which is an important indicator for policymaking in allocating resources.
no code implementations • 5 Dec 2021 • Heng-Cheng Kuo, Yu-Peng Hsieh, Huan-Hsin Tseng, Chi-Tei Wang, Shih-Hau Fang, Yu Tsao
Conclusion: By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources.
no code implementations • 26 Nov 2021 • Ting-Yang Lu, Kai-Chun Liu, Chia-Yeh Hsieh, Chih-Ya Chang, Yu Tsao, Chia-Tai Chan
Moreover, features of subtasks provided subtle information related to clinical conditions that have not been revealed in features of a complete task, especially the defined subtask 1 and 2 of each task.
1 code implementation • NeurIPS 2021 • Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao
This paper presents a novel discriminator-constrained optimal transport network (DOTN) that performs unsupervised domain adaptation for speech enhancement (SE), which is an essential regression task in speech processing.
no code implementations • 10 Nov 2021 • Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao
Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.
no code implementations • 10 Nov 2021 • Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli
Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.
no code implementations • 8 Nov 2021 • Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo
In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing.
1 code implementation • 4 Nov 2021 • Yu-Wen Chen, Yu Tsao
Speech intelligibility and quality assessment models are essential tools for researchers to evaluate and improve speech processing models.
1 code implementation • 3 Nov 2021 • Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously.
no code implementations • 19 Oct 2021 • Wen-Yuan Ting, Syu-Siang Wang, Hsin-Li Chang, Borching Su, Yu Tsao
Herein, we investigate a potential limitation of the clean-to-noisy conversion part and propose a novel noise-informed training (NIT) approach to improve the performance of the original CycleGAN SE system.
no code implementations • 19 Oct 2021 • Yun-Ju Chan, Chiang-Jen Peng, Syu-Siang Wang, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi
Numerous voice conversion (VC) techniques have been proposed for the conversion of voices among different speakers.
1 code implementation • 12 Oct 2021 • Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao
Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.
no code implementations • 9 Oct 2021 • Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-Yi Lee, Shinji Watanabe
We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 8 Oct 2021 • Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao
In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.
no code implementations • 7 Oct 2021 • Guan-Ting Lin, Chan-Jan Hsu, Da-Rong Liu, Hung-Yi Lee, Yu Tsao
In this work, we further analyze the training robustness of unsupervised ASR on the domain mismatch scenarios in which the domains of unpaired speech and text are different.
no code implementations • 29 Sep 2021 • Tsun-An Hsieh, Cheng Yu, Ying Hung, Chung-Ching Lin, Yu Tsao
Accordingly, we propose Mutual Information Continuity-constrained Estimator (MICE).
no code implementations • 8 Sep 2021 • Yi-Syuan Liou, Wen-Chin Huang, Ming-Chi Yen, Shu-Wei Tsai, Yu-Huai Peng, Tomoki Toda, Yu Tsao, Hsin-Min Wang
Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device.
1 code implementation • 25 Jul 2021 • Yen-Ju Lu, Yu Tsao, Shinji Watanabe
Based on this property, we propose a diffusion probabilistic model-based speech enhancement (DiffuSE) model that aims to recover clean speech signals from noisy signals.
no code implementations • 20 Jul 2021 • Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.
no code implementations • 10 Jun 2021 • Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda
Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available.
no code implementations • 9 Jun 2021 • Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo
The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications.
no code implementations • 2 Jun 2021 • Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu Tsao, Hsin-Min Wang, Tomoki Toda
First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality.
no code implementations • 18 May 2021 • Fatma S. Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, Yu Tsao
In this study, motivated by multimodal learning, which uses information from various modalities, and the current success of convolutional neural networks (CNNs) in various fields, we propose a deep learning model, called visual-social convolutional neural network (VSCNN), which predicts the popularity of a posted image by incorporating various types of visual and social features into a unified network model.
2 code implementations • 8 Apr 2021 • Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao
The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.
Ranked #9 on
Speech Enhancement
on VoiceBank + DEMAND
no code implementations • 7 Apr 2021 • Cheng-Hung Hu, Yi-Chiao Wu, Wen-Chin Huang, Yu-Huai Peng, Yu-Wen Chen, Pin-Jui Ku, Tomoki Toda, Yu Tsao, Hsin-Min Wang
The first track focuses on using a small number of 100 target utterances for voice cloning, while the second track focuses on using only 5 target utterances for voice cloning.
no code implementations • 7 Apr 2021 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
However, in most of the discriminative training for SiamNN, only the distribution of pair-wised sample distances is considered, and the additional discriminative information in joint distribution of samples is ignored.
no code implementations • 7 Feb 2021 • Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao
Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments.
no code implementations • 9 Jan 2021 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
By initializing the two-branch neural network with the generatively learned model parameters of the JB model, we train the model parameters with the pairwise samples as a binary discrimination task.
1 code implementation • 7 Jan 2021 • Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi
In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI).
no code implementations • 24 Dec 2020 • Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
By minimizing the classification loss on the training data set with the adaptation loss on both training and testing data sets, the statistical distribution difference between training and testing domains is reduced.
no code implementations • 20 Dec 2020 • Kai-Chun Liu, Michael Can, Heng-Cheng Kuo, Chia-Yeh Hsieh, Hsiang-Yun Huang, Chia-Tai Chan, Yu Tsao
The proposed DAFD can transfer knowledge from the source domain to the target domain by minimizing the domain discrepancy to avoid mismatch problems.
1 code implementation • 17 Dec 2020 • Ryandhimas E. Zezario, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
Experimental results confirmed that the proposed ZMOS approach can achieve better performance in both seen and unseen noise types compared to the baseline systems and other model selection systems, which indicates the effectiveness of the proposed approach in providing robust SE performance.
no code implementations • 7 Dec 2020 • Kai-Chun Liu, Kuo-Hsuan Hung, Chia-Yeh Hsieh, Hsiang-Yun Huang, Chia-Tai Chan, Yu Tsao
However, the performance of FD systems is diminished owing to low-resolution (LR) accelerometer signals.
no code implementations • 7 Dec 2020 • Tsai-Min Chen, Yuan-Hong Tsai, Huan-Hsin Tseng, Kai-Chun Liu, Jhih-Yu Chen, Chih-Han Huang, Guo-Yuan Li, Chun-Yen Shen, Yu Tsao
In our experiments, we downsampled the ECG signals from the CPSC2018 dataset and evaluated their HMC accuracies with and without the SRECG.
no code implementations • 15 Nov 2020 • Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the articulatory properties of the input speech when performing enhancement to attain performance improvements.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 9 Nov 2020 • Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang
To overcome this limitation, we propose a deep learning-based non-intrusive speech intelligibility assessment model, namely STOI-Net.
no code implementations • 3 Nov 2020 • Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Xugang Lu, Yu Tsao
Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs).
1 code implementation • 28 Oct 2020 • Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao
Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.
Ranked #9 on
Speech Enhancement
on VoiceBank + DEMAND
no code implementations • 6 Oct 2020 • Yu-Huai Peng, Cheng-Hung Hu, Alexander Kang, Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang
This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2).
1 code implementation • 30 Aug 2020 • Shang-Yi Chuang, Hsin-Min Wang, Yu Tsao
Experimental results confirm that compared to conventional AVSE systems, iLAVSE can effectively overcome the aforementioned three practical issues and can improve enhancement performance.
1 code implementation • 21 Aug 2020 • Yu-Wen Chen, Kuo-Hsuan Hung, You-Jin Li, Alexander Chao-Fu Kang, Ya-Hsin Lai, Kai-Chun Liu, Szu-Wei Fu, Syu-Siang Wang, Yu Tsao
The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC), allowing CITISEN to be used as a platform for utilizing and evaluating SE models and flexibly extend the models to address various noise environments and users.
no code implementations • 13 Aug 2020 • Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao
In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals.
no code implementations • 24 Jun 2020 • Lichin Chen, Yu Tsao, Ji-Tian Sheu
This study also used explainable artificial intelligence methods to interpret the contribution of features for the general public and individuals.
no code implementations • 18 Jun 2020 • Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-Jin Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao
The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.
no code implementations • 24 May 2020 • You-Jin Li, Syu-Siang Wang, Yu Tsao, Borching Su
For speech-related applications in IoT environments, identifying effective methods to handle interference noises and compress the amount of data in transmissions is essential to achieve high-quality services.
1 code implementation • 24 May 2020 • Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin, Hsin-Min Wang, Yu Tsao
The results verify that the SERIL model can effectively adjust itself to new noise environments while overcoming the catastrophic forgetting issue.
1 code implementation • 24 May 2020 • Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang
Previous studies have confirmed the effectiveness of incorporating visual information into speech enhancement (SE) systems.
1 code implementation • 6 Apr 2020 • Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao
In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU).
1 code implementation • Interspeech 2020 • Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi
The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments.
Audio and Speech Processing Sound
1 code implementation • 22 Jan 2020 • Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang
In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech.
no code implementations • 6 Jan 2020 • Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao
The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.
no code implementations • 27 Dec 2019 • Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai
However, a potential limitation of the network is that the discriminative features from the bottom layers (which can model the short-range dependency) are smoothed out in the final representation.
no code implementations • 9 Dec 2019 • Chao-I Tuan, Yuan-Kuei Wu, Hung-Yi Lee, Yu Tsao
Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio.
no code implementations • 22 Nov 2019 • Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung
Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).
no code implementations • 19 Nov 2019 • Syu-Siang Wang, Yu-You Liang, Jeih-weih Hung, Yu Tsao, Hsin-Min Wang, Shih-Hau Fang
Speech-related applications deliver inferior performance in complex noise environments.
no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling
Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.
no code implementations • 26 Sep 2019 • Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang
Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts.
Denoising
Speech Enhancement
+1
Sound
Audio and Speech Processing
no code implementations • 26 Sep 2019 • Chang-Le Liu, Sze-Wei Fu, You-Jin Li, Jen-Wei Huang, Hsin-Min Wang, Yu Tsao
We also propose an extended version of SDFCN, called the residual SDFCN (termed rSDFCN).
no code implementations • 26 Sep 2019 • Rung-Yu Tseng, Tao-Wei Wang, Szu-Wei Fu, Yu Tsao, Chia-Ying Lee
Speech perception is a key to verbal communication.
Speech Enhancement
Sound
Audio and Speech Processing
no code implementations • 31 May 2019 • Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao
In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids.
5 code implementations • 13 May 2019 • Szu-Wei Fu, Chien-Feng Liao, Yu Tsao, Shou-De Lin
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores.
Ranked #14 on
Speech Enhancement
on VoiceBank + DEMAND
1 code implementation • 6 May 2019 • Szu-Wei Fu, Chien-Feng Liao, Yu Tsao
Utilizing a human-perception-related objective function to train a speech enhancement model has become a popular topic recently.
1 code implementation • 2 May 2019 • Wen-Chin Huang, Yi-Chiao Wu, Chen-Chou Lo, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang
Such hypothesis implies that during the conversion phase, the latent codes and the converted features in VAE based VC are in fact source F0 dependent.
no code implementations • 30 Apr 2019 • Chien-Feng Liao, Yu Tsao, Xugang Lu, Hisashi Kawai
In this study, the symbolic sequences for acoustic signals are obtained as discrete representations with a Vector Quantized Variational Autoencoder algorithm.
6 code implementations • 17 Apr 2019 • Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.
no code implementations • 12 Apr 2019 • Sheng-Yong Niu, Lun-Zhang Guo, Yue Li, Tzung-Dau Wang, Yu Tsao, Tzu-Ming Liu
As the rapid growth of high-speed and deep-tissue imaging in biomedical research, it is urgent to find a robust and effective denoising method to retain morphological features for further texture analysis and segmentation.
no code implementations • 27 Nov 2018 • Wen-Chin Huang, Yi-Chiao Wu, Hsin-Te Hwang, Patrick Lumban Tobing, Tomoki Hayashi, Kazuhiro Kobayashi, Tomoki Toda, Yu Tsao, Hsin-Min Wang
Conventional WaveNet vocoders are trained with natural acoustic features but conditioned on the converted features in the conversion stage for VC, and such a mismatch often causes significant quality and similarity degradation.
no code implementations • 26 Nov 2018 • Yi-Te Hsu, Zining Zhu, Chi-Te Wang, Shih-Hau Fang, Frank Rudzicz, Yu Tsao
In this study, we propose a detection system for pathological voice, which is robust against the channel effect.
1 code implementation • 8 Nov 2018 • Shih-kuang Lee, Syu-Siang Wang, Yu Tsao, Jeih-weih Hung
The presented DWT-based SE method with various scaling factors for the detail part is evaluated with a subset of Aurora-2 database, and the PESQ metric is used to indicate the quality of processed speech signals.
1 code implementation • 30 Oct 2018 • Li-Wei Chen, Hung-Yi Lee, Yu Tsao
This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed.
1 code implementation • 29 Aug 2018 • Wen-Chin Huang, Hsin-Te Hwang, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang
An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner.
no code implementations • 17 Aug 2018 • Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo
We evaluated the proposed EOFP quantization technique on two types of neural networks, namely, bidirectional long short-term memory (BLSTM) and fully convolutional neural network (FCN), on a speech enhancement task.
no code implementations • 16 Aug 2018 • Szu-Wei Fu, Yu Tsao, Hsin-Te Hwang, Hsin-Min Wang
The evaluation of utterance-level quality in Quality-Net is based on the frame-level assessment.
1 code implementation • 19 Jul 2018 • Chien-Feng Liao, Yu Tsao, Hung-Yi Lee, Hsin-Min Wang
The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model.
Sound Audio and Speech Processing
no code implementations • 12 Sep 2017 • Szu-Wei Fu, Tao-Wei Wang, Yu Tsao, Xugang Lu, Hisashi Kawai
For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based minimum mean square error (MMSE) between estimated and clean speech is widely used in optimizing the model.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 1 Sep 2017 • Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang
Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.
no code implementations • 27 Apr 2017 • Szu-Wei Fu, Ting-yao Hu, Yu Tsao, Xugang Lu
This paper aims to address two issues existing in the current speech enhancement methods: 1) the difficulty of phase estimations; 2) a single objective function cannot consider multiple metrics simultaneously.
1 code implementation • 4 Apr 2017 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios.
no code implementations • 30 Mar 2017 • Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao, Hsiu-Wen Chang, Hsin-Min Wang
Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer.
no code implementations • 7 Mar 2017 • Szu-Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai
Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform.
4 code implementations • 13 Oct 2016 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora.
no code implementations • 13 Oct 2016 • Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang
In this paper, we propose a dictionary update method for Nonnegative Matrix Factorization (NMF) with high dimensional data in a spectral conversion (SC) task.
no code implementations • ROCLINGIJCLCLP 2015 • Chia-Yung Hsu, Jia-Ching Wang, Yu Tsao
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2