Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

no code implementations6 Dec 2023 Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu

Specifically, we leverage the latent representation obtained from text input as our prior, and build a fully tractable Schrodinger bridge between it and the ground-truth mel-spectrogram, leading to a data-to-data process.

Improving Diffusion Models for ECG Imputation with an Augmented Template Prior

no code implementations24 Oct 2023 Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic

In this work, to improve the imputation and forecasting accuracy for ECG with probabilistic models, we present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions.

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

3 code implementations29 Jan 2023 Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

1 code implementation30 Dec 2022 Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic

Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples.


BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

1 code implementation30 May 2022 Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu

Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training

no code implementations8 Feb 2022 Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao

In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality.


A Granular Sieving Algorithm for Deterministic Global Optimization

no code implementations14 Jul 2021 Tao Qian, Lei Dai, Liming Zhang, Zehua Chen

With straightforward mathematical formulation applicable to both univariate and multivariate objective functions, the global minimum value and all the global minimizers are located through two decreasing sequences of compact sets in, respectively, the domain and range spaces.

A-FMI: Learning Attributions from Deep Networks via Feature Map Importance

no code implementations12 Apr 2021 An Zhang, Xiang Wang, Chengfang Fang, Jie Shi, Tat-Seng Chua, Zehua Chen

Gradient-based attribution methods can aid in the understanding of convolutional neural networks (CNNs).

