1 code implementation • 12 Nov 2024 • Yiyang Ma, Xingchao Liu, Xiaokang Chen, Wen Liu, Chengyue Wu, Zhiyu Wu, Zizheng Pan, Zhenda Xie, Haowei Zhang, Xingkai Yu, Liang Zhao, Yisong Wang, Jiaying Liu, Chong Ruan
To further improve the performance of our unified model, we adopt two key strategies: (i) decoupling the understanding and generation encoders, and (ii) aligning their representations during unified training.
1 code implementation • 17 Oct 2024 • Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo
In this paper, we introduce Janus, an autoregressive framework that unifies multimodal understanding and generation.
Ranked #134 on Visual Question Answering on MM-Vet
no code implementations • 16 Sep 2024 • Lehong Wu, Lilang Lin, Jiahang Zhang, Yiyang Ma, Jiaying Liu
For the first time, we leverage diffusion models as effective skeleton representation learners.
no code implementations • 7 Apr 2024 • Yiyang Ma, Wenhan Yang, Jiaying Liu
We build a diffusion model and design a novel paradigm that combines the diffusion model and an end-to-end decoder, and the latter is responsible for transmitting the privileged information extracted at the encoder side.
1 code implementation • 25 Jan 2024 • Jialu Sui, Yiyang Ma, Wenhan Yang, Xiaokang Zhang, Man-on Pun, Jiaying Liu
The presence of cloud layers severely compromises the quality and effectiveness of optical remote sensing (RS) images.
no code implementations • 24 May 2023 • Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, Jiaying Liu
Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks.
no code implementations • 16 Mar 2023 • Yiyang Ma, Huan Yang, Wenjing Wang, Jianlong Fu, Jiaying Liu
Language-guided image generation has achieved great success nowadays by using diffusion models.
1 code implementation • CVPR 2023 • Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo
To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.
1 code implementation • 7 Sep 2022 • Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu
To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN.
no code implementations • 27 Jul 2022 • Shixing Yu, Yiyang Ma, Wenhan Yang, Wei Xiang, Jiaying Liu
Extensive qualitative and quantitative evaluations, as well as ablation studies, demonstrate that, via introducing meta-learning in our framework in such a well-designed way, our method not only achieves superior performance to state-of-the-art frame interpolation approaches but also owns an extended capacity to support the interpolation at an arbitrary time-step.