1 code implementation • 1 Jun 2024 • Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, RuiQi Li, Zhou Zhao
By employing a non-autoregressive vector field estimator based on a feed-forward transformer and channel-level cross-modal feature fusion with strong temporal alignment, our model generates audio that is highly synchronized with the input video.
Ranked #4 on Video-to-Sound Generation on VGG-Sound
no code implementations • 14 Apr 2024 • Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, RuiQi Li, Fuming You, Zhou Zhao, Zhimeng Zhang
A song is a combination of singing voice and accompaniment.
1 code implementation • 18 Mar 2024 • Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, RuiQi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao
Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly.
1 code implementation • 13 Oct 2021 • Fuming You, Jingjing Li, Lei Zhu, Ke Lu, Zhi Chen, Zi Huang
To address these problems, we investigate domain adaptive semantic segmentation without source data, which assumes that the model is pre-trained on the source domain, and then adapting to the target domain without accessing source data anymore.
no code implementations • 6 Oct 2021 • Fuming You, Jingjing Li, Zhou Zhao
An previous solution is test time normalization, which substitutes the source statistics in BN layers with the target batch statistics.