Search Results for author: Cheng Yu

Found 29 papers, 9 papers with code

Using fine-tuning and min lookahead beam search to improve Whisper

no code implementations • 19 Sep 2023 • Andrea Do, Oscar Brown, Zhengjie Wang, Nikhil Mathew, Zixin Liu, Jawwad Ahmed, Cheng Yu

In addition to a lack of training data on low-resource languages, we identify some limitations in the beam search algorithm used in Whisper.

Paper
Add Code

Cross-Utterance Conditioned VAE for Speech Generation

no code implementations • 8 Sep 2023 • Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.

Speech Synthesis

Paper
Add Code

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

1 code implementation • 28 Aug 2023 • Yang Liu, Cheng Yu, Lei Shang, Yongyi He, Ziheng Wu, Xingjun Wang, Chao Xu, Haoyu Xie, Weida Wang, Yuze Zhao, Lin Zhu, Chen Cheng, Weitao Chen, Yuan YAO, Wenmeng Zhou, Jiaqi Xu, Qiang Wang, Yingda Chen, Xuansong Xie, Baigui Sun

In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input.

Attribute Potrait Generation +1

8,279

Paper
Code

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech

1 code implementation • ACL 2022 • Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang

Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems.

Paper
Code

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

1 code implementation • 31 Mar 2022 • Rong Chao, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Specifically, the contrast of target features is stretched based on perceptual importance, thereby improving the overall SE performance.

Ranked #5 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

Paper
Code

Conditional Diffusion Probabilistic Model for Speech Enhancement

2 code implementations • 10 Feb 2022 • Yen-Ju Lu, Zhong-Qiu Wang, Shinji Watanabe, Alexander Richard, Cheng Yu, Yu Tsao

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs.

Speech Enhancement Speech Synthesis

187

Paper
Code

HASA-net: A non-intrusive hearing-aid speech assessment network

no code implementations • 10 Nov 2021 • Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Without the need of a clean reference, non-intrusive speech assessment methods have caught great attention for objective evaluations.

Paper
Add Code

OSSEM: one-shot speaker adaptive speech enhancement using meta learning

no code implementations • 10 Nov 2021 • Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Although deep learning (DL) has achieved notable progress in speech enhancement (SE), further research is still required for a DL-based SE system to adapt effectively and efficiently to particular speakers.

Meta-Learning Speech Enhancement

Paper
Add Code

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

no code implementations • 8 Nov 2021 • Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing.

Model Compression regression +1

Paper
Add Code

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

2 code implementations • 12 Oct 2021 • Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

Most of the deep learning-based speech enhancement models are learned in a supervised manner, which implies that pairs of noisy and clean speech are required during training.

Speech Enhancement

7,848

Paper
Code

Mutual Information Continuity-constrained Estimator

no code implementations • 29 Sep 2021 • Tsun-An Hsieh, Cheng Yu, Ying Hung, Chung-Ching Lin, Yu Tsao

Accordingly, we propose Mutual Information Continuity-constrained Estimator (MICE).

Density Estimation

Paper
Add Code

Diverse Similarity Encoder for Deep GAN Inversion

2 code implementations • 23 Aug 2021 • Cheng Yu, Wenmin Wang

Current deep generative adversarial networks (GANs) can synthesize high-quality (HQ) images, so learning representation with GANs is favorable.

Image Reconstruction

Paper
Code

Speech Recovery for Real-World Self-powered Intermittent Devices

no code implementations • 9 Jun 2021 • Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung, Cheng Yu, Harinath Garudadri, Yu Tsao, Tei-Wei Kuo

The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications.

Paper
Add Code

MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement

3 code implementations • 8 Apr 2021 • Szu-Wei Fu, Cheng Yu, Tsun-An Hsieh, Peter Plantinga, Mirco Ravanelli, Xugang Lu, Yu Tsao

The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.

Ranked #12 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

7,849

Paper
Code

Global ill-posedness for a dense set of initial data to the Isentropic system of gas dynamics

no code implementations • 8 Mar 2021 • Robin Ming Chen, Alexis F. Vasseur, Cheng Yu

In dimension $n=2$ and $3$, we show that for any initial datum belonging to a dense subset of the energy space, there exist infinitely many global-in-time admissible weak solutions to the isentropic Euler system whenever $1<\gamma\leq 1+\frac2n$.

Analysis of PDEs 35Q31, 76N10, 35L65

Paper
Add Code

Dissipative solutions to the compressible isentropic Navier-Stokes equations

no code implementations • 4 Feb 2021 • Liang Guo, Ducati Li, Cheng Yu

The existence of dissipative solutions to the compressible isentropic Navier-Stokes equations was established in this paper.

Analysis of PDEs

Paper
Add Code

Inviscid limit of the inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$

no code implementations • 4 Feb 2021 • Dixi Wang, Cheng Yu, Xinhua Zhao

In this paper, we consider the inviscid limit of inhomogeneous incompressible Navier-Stokes equations under the weak Kolmogorov hypothesis in $\mathbb{R}^3$.

Analysis of PDEs

Paper
Add Code

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

1 code implementation • 7 Jan 2021 • Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao, Tai-Shih Chi

In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI).

Multi-Task Learning Speaker Identification +1

Paper
Code

Defending Against Universal Adversarial Patches by Clipping Feature Norms

no code implementations • ICCV 2021 • Cheng Yu, Jiansheng Chen, Youze Xue, Yuyang Liu, Weitao Wan, Jiayu Bao, Huimin Ma

Physical-world adversarial attacks based on universal adversarial patches have been proved to be able to mislead deep convolutional neural networks (CNNs), exposing the vulnerability of real-world visual classification systems based on CNNs.

Paper
Add Code

Shaping Deep Feature Space towards Gaussian Mixture for Visual Classification

no code implementations • 18 Nov 2020 • Weitao Wan, Jiansheng Chen, Cheng Yu, Tong Wu, Yuanyi Zhong, Ming-Hsuan Yang

In this work, we propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.

Classification General Classification +1

Paper
Add Code

Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information

no code implementations • 15 Nov 2020 • Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement

1 code implementation • 28 Oct 2020 • Tsun-An Hsieh, Cheng Yu, Szu-Wei Fu, Xugang Lu, Yu Tsao

Speech enhancement (SE) aims to improve speech quality and intelligibility, which are both related to a smooth transition in speech segments that may carry linguistic information, e. g. phones and syllables.

Ranked #12 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

Paper
Code

Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage Optimization

no code implementations • 13 Oct 2020 • Cheng Yu, Bo wang, Bo Yang, Robby T. Tan

Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation.

2D Pose Estimation Pose Prediction +1

Paper
Add Code

Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing

no code implementations • 18 Jun 2020 • Szu-Wei Fu, Chien-Feng Liao, Tsun-An Hsieh, Kuo-Hsuan Hung, Syu-Siang Wang, Cheng Yu, Heng-Cheng Kuo, Ryandhimas E. Zezario, You-Jin Li, Shang-Yi Chuang, Yen-Ju Lu, Yu Tsao

The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.

Speech Enhancement

Paper
Add Code

Speech Enhancement based on Denoising Autoencoder with Multi-branched Encoders

no code implementations • 6 Jan 2020 • Cheng Yu, Ryandhimas E. Zezario, Jonathan Sherman, Yi-Yen Hsieh, Xugang Lu, Hsin-Min Wang, Yu Tsao

The DSDT is built based on a prior knowledge of speech and noisy conditions (the speaker, environment, and signal factors are considered in this paper), where each component of the multi-branched encoder performs a particular mapping from noisy to clean speech along the branch in the DSDT.

Denoising Speech Enhancement

Paper
Add Code

Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

no code implementations • 22 Nov 2019 • Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).

Ensemble Learning Speech Enhancement

Paper
Add Code

Increasing Compactness Of Deep Learning Based Speech Enhancement Models With Parameter Pruning And Quantization Techniques

no code implementations • 31 May 2019 • Jyun-Yi Wu, Cheng Yu, Szu-Wei Fu, Chih-Ting Liu, Shao-Yi Chien, Yu Tsao

In addition, a parameter quantization (PQ) technique was applied to reduce the size of a neural network by representing weights with fewer cluster centroids.

Denoising Quantization +1

Paper
Add Code

Pairwise FastText Classifier for Entity Disambiguation

no code implementations • ALTA 2016 • Cheng Yu, Bing Chu, Rohit Ram, James Aichinger, Lizhen Qu, Hanna Suominen

Entity Disambiguation General Classification +2

Paper
Add Code

An Introduction to BLCU Personal Attributes Extraction System

no code implementations • WS 2014 • Dong Yu, Cheng Yu, Qin Qu, Gongbo Tang, Chunhua Liu, Yue Tian, Jing Yi

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.