no code implementations • 23 Apr 2024 • Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue
The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation.
1 code implementation • 23 Mar 2024 • Nishant Kumar, Ziyan Tao, Jaikirat Singh, Yang Li, Peiwen Sun, Binghui Zhao, Stefan Gumhold
Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image.
no code implementations • 12 Dec 2023 • Peiwen Sun, Yifan Zhang, Zishan Liu, Donghao Chen, Honggang Zhang
The vanilla fusion methods still dominate a large percentage of mainstream audio-visual tasks.
no code implementations • 9 Sep 2022 • Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu
It has already been observed that audio-visual embedding is more robust than uni-modality embedding for person verification.