Search Results for author: Cheng Yu

Found 29 papers, 9 papers with code

Using fine-tuning and min lookahead beam search to improve Whisper

no code implementations19 Sep 2023 Andrea Do, Oscar Brown, Zhengjie Wang, Nikhil Mathew, Zixin Liu, Jawwad Ahmed, Cheng Yu

In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper.

Cross-Utterance Conditioned VAE for Speech Generation

no code implementations8 Sep 2023 Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.

Speech Synthesis

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

1 code implementation28 Aug 2023 Yang Liu, Cheng Yu, Lei Shang, Yongyi He, Ziheng Wu, Xingjun Wang, Chao Xu, Haoyu Xie, Weida Wang, Yuze Zhao, Lin Zhu, Chen Cheng, Weitao Chen, Yuan YAO, Wenmeng Zhou, Jiaqi Xu, Qiang Wang, Yingda Chen, Xuansong Xie, Baigui Sun

In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input.

Attribute Potrait Generation +1

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

1 code implementation ACL 2022 Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang

Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems.

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

1 code implementation31 Mar 2022 Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance.

Speech Enhancement

Conditional Diffusion Probabilistic Model for Speech Enhancement

2 code implementations10 Feb 2022 Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.

Speech Enhancement Speech Synthesis

HASA-net: A non-intrusive hearing-aid speech assessment network

no code implementations10 Nov 2021 Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

no code implementations10 Nov 2021 Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.

Meta-Learning Speech Enhancement

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

no code implementations8 Nov 2021 Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing.

Model Compression regression +1

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

2 code implementations12 Oct 2021 Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.

Speech Enhancement

Mutual Information Continuity-constrained Estimator

no code implementations29 Sep 2021 Tsun-An Hsieh, Cheng Yu, Ying Hung, Chung-Ching Lin, Yu Tsao

Accordingly, we propose Mutual Information Continuity-constrained Estimator (MICE).

Density Estimation

Diverse Similarity Encoder for Deep GAN Inversion

2 code implementations23 Aug 2021 Cheng Yu, Wenmin Wang

Current deep generative adversarial networks (GANs) can synthesize high-quality (HQ) images, so learning representation with GANs is favorable.

Image Reconstruction

Speech Recovery for Real-World Self-powered Intermittent Devices

no code implementations9 Jun 2021 Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo

The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications.

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

3 code implementations8 Apr 2021 Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

Speech Enhancement

Global ill-posedness for a dense set of initial data to the Isentropic system of gas dynamics

no code implementations8 Mar 2021 Robin Ming Chen, Alexis F. Vasseur, Cheng Yu

In dimension $n=2$ and $3$, we show that for any initial datum belonging to a dense subset of the energy space, there exist infinitely many global-in-time admissible weak solutions to the isentropic Euler system whenever $1<\gamma\leq 1+\frac2n$.

Analysis of PDEs 35Q31, 76N10, 35L65

Dissipative solutions to the compressible isentropic Navier-Stokes equations

no code implementations4 Feb 2021 Liang Guo, Ducati Li, Cheng Yu

The existence of dissipative solutions to the compressible isentropic Navier-Stokes equations was established in this paper.

Analysis of PDEs

Inviscid limit of the inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$

no code implementations4 Feb 2021 Dixi Wang, Cheng Yu, Xinhua Zhao

In this paper, we consider the inviscid limit of inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$.

Analysis of PDEs

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

1 code implementation7 Jan 2021 Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi

In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI).

Multi-Task Learning Speaker Identification +1

Defending Against Universal Adversarial Patches by Clipping Feature Norms

no code implementations ICCV 2021 Cheng Yu, Jiansheng Chen, Youze Xue, Yuyang Liu, Weitao Wan, Jiayu Bao, Huimin Ma

Physical-world adversarial attacks based on universal adversarial patches have been proved to be able to mislead deep convolutional neural networks (CNNs), exposing the vulnerability of real-world visual classification systems based on CNNs.

Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information

no code implementations15 Nov 2020 Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

1 code implementation28 Oct 2020 Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.

Speech Enhancement

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

no code implementations18 Jun 2020 Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-Jin Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.

Speech Enhancement

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

no code implementations6 Jan 2020 Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao

The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.

Denoising Speech Enhancement

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

no code implementations22 Nov 2019 Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).

Ensemble Learning Speech Enhancement

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

no code implementations31 May 2019 Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao

In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids.

Denoising Quantization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.