3 code implementations • 29 Jan 2023 • Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley
By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.
Ranked #9 on Audio Generation on AudioCaps
1 code implementation • 30 May 2022 • Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu
Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.
1 code implementation • 30 Dec 2022 • Zehua Chen, Yihan Wu, Yichong Leng, Jiawei Chen, Haohe Liu, Xu Tan, Yang Cui, Ke Wang, Lei He, Sheng Zhao, Jiang Bian, Danilo Mandic
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples.
no code implementations • 12 Apr 2021 • An Zhang, Xiang Wang, Chengfang Fang, Jie Shi, Tat-Seng Chua, Zehua Chen
Gradient-based attribution methods can aid in the understanding of convolutional neural networks (CNNs).
no code implementations • 14 Jul 2021 • Tao Qian, Lei Dai, Liming Zhang, Zehua Chen
With straightforward mathematical formulation applicable to both univariate and multivariate objective functions, the global minimum value and all the global minimizers are located through two decreasing sequences of compact sets in, respectively, the domain and range spaces.
no code implementations • 8 Feb 2022 • Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao
In this paper, we propose InferGrad, a diffusion model for vocoder that incorporates inference process into training, to reduce the inference iterations while maintaining high generation quality.
no code implementations • 24 Oct 2023 • Alexander Jenkins, Zehua Chen, Fu Siong Ng, Danilo Mandic
In this work, to improve the imputation and forecasting accuracy for ECG with probabilistic models, we present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions.
no code implementations • 6 Dec 2023 • Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu
Specifically, we leverage the latent representation obtained from text input as our prior, and build a fully tractable Schrodinger bridge between it and the ground-truth mel-spectrogram, leading to a data-to-data process.