1 code implementation • ACL 2022 • Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang
Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems.
1 code implementation • 31 Mar 2022 • Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao
Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance.
Ranked #4 on
Speech Enhancement
on VoiceBank + DEMAND
2 code implementations • 10 Feb 2022 • Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao
Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.
no code implementations • 10 Nov 2021 • Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao
Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.
no code implementations • 10 Nov 2021 • Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli
Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.
no code implementations • 8 Nov 2021 • Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo
In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing.
1 code implementation • 12 Oct 2021 • Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao
Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.
no code implementations • 29 Sep 2021 • Tsun-An Hsieh, Cheng Yu, Ying Hung, Chung-Ching Lin, Yu Tsao
Accordingly, we propose Mutual Information Continuity-constrained Estimator (MICE).
2 code implementations • 23 Aug 2021 • Cheng Yu, Wenmin Wang
Current deep generative adversarial networks (GANs) can synthesize high-quality (HQ) images, so learning representation with GANs is favorable.
no code implementations • 9 Jun 2021 • Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo
The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications.
3 code implementations • 8 Apr 2021 • Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao
The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.
Ranked #9 on
Speech Enhancement
on VoiceBank + DEMAND
no code implementations • 8 Mar 2021 • Robin Ming Chen, Alexis F. Vasseur, Cheng Yu
In dimension $n=2$ and $3$, we show that for any initial datum belonging to a dense subset of the energy space, there exist infinitely many global-in-time admissible weak solutions to the isentropic Euler system whenever $1<\gamma\leq 1+\frac2n$.
Analysis of PDEs 35Q31, 76N10, 35L65
no code implementations • 4 Feb 2021 • Dixi Wang, Cheng Yu, Xinhua Zhao
In this paper, we consider the inviscid limit of inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$.
Analysis of PDEs
no code implementations • 4 Feb 2021 • Liang Guo, Ducati Li, Cheng Yu
The existence of dissipative solutions to the compressible isentropic Navier-Stokes equations was established in this paper.
Analysis of PDEs
1 code implementation • 7 Jan 2021 • Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi
In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI).
no code implementations • ICCV 2021 • Cheng Yu, Jiansheng Chen, Youze Xue, Yuyang Liu, Weitao Wan, Jiayu Bao, Huimin Ma
Physical-world adversarial attacks based on universal adversarial patches have been proved to be able to mislead deep convolutional neural networks (CNNs), exposing the vulnerability of real-world visual classification systems based on CNNs.
no code implementations • 18 Nov 2020 • Weitao Wan, Jiansheng Chen, Cheng Yu, Tong Wu, Yuanyi Zhong, Ming-Hsuan Yang
In this work, we propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
no code implementations • 15 Nov 2020 • Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the articulatory properties of the input speech when performing enhancement to attain performance improvements.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 28 Oct 2020 • Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao
Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.
Ranked #9 on
Speech Enhancement
on VoiceBank + DEMAND
no code implementations • 13 Oct 2020 • Cheng Yu, Bo wang, Bo Yang, Robby T. Tan
Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation.
no code implementations • 18 Jun 2020 • Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-Jin Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao
The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.
no code implementations • 6 Jan 2020 • Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao
The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.
no code implementations • 22 Nov 2019 • Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung
Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).
no code implementations • 31 May 2019 • Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao
In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids.