1 code implementation • 9 Sep 2024 • Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng
While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the application of music generation models in the real world.
1 code implementation • 26 Aug 2024 • Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, huan zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan, Shangda Wu, Shih-Lun Wu, Shuqi Dai, Shun Lei, Shiyin Kang, Simon Dixon, Wenhu Chen, Wenhao Huang, Xingjian Du, Xingwei Qu, Xu Tan, Yizhi Li, Zeyue Tian, Zhiyong Wu, Zhizheng Wu, Ziyang Ma, Ziyu Wang
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music.
2 code implementations • 21 Apr 2022 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao
Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.
Ranked #7 on Text-To-Speech Synthesis on LJSpeech (using extra training data)
1 code implementation • ICLR 2022 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective.
Ranked #1 on Speech Synthesis on LJSpeech
no code implementations • 29 Sep 2021 • Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Zhou Zhao, Yi Ren
Learning generalizable speech representations for unseen samples in different domains has been a challenge with ever increasing importance to date.
no code implementations • 26 Aug 2021 • Max W. Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu
In this paper, we propose novel bilateral denoising diffusion models (BDDMs), which take significantly fewer steps to generate high-quality samples.
no code implementations • 8 Jun 2021 • Max W. Y. Lam, Jun Wang, Chao Weng, Dan Su, Dong Yu
End-to-end speech recognition generally uses hand-engineered acoustic features as input and excludes the feature extraction module from its joint optimization.
no code implementations • 2 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference.
2 code implementations • 1 Mar 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers.
Ranked #8 on Speech Separation on WSJ0-3mix
no code implementations • 1 Mar 2021 • Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC).
2 code implementations • 13 Jan 2021 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation.
Ranked #15 on Speech Separation on WSJ0-2mix
no code implementations • 28 Oct 2019 • Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu
Deep-learning based speech separation models confront poor generalization problem that even the state-of-the-art models could abruptly fail when evaluating them in mismatch conditions.