no code implementations • 22 Mar 2024 • Zhichao Wei, Qingkun Su, Long Qin, Weizhi Wang
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings, both of which are efficiently integrated into the diffusion model through the well-designed multimodal cross-attention mechanism.
no code implementations • 14 Jul 2023 • Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu
In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions.
no code implementations • 17 Oct 2020 • Rakesh Shrestha, Zhiwen Fan, Qingkun Su, Zuozhuo Dai, Siyu Zhu, Ping Tan
Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process.
no code implementations • 20 Nov 2018 • Lei Li, Changqing Zou, Youyi Zheng, Qingkun Su, Hongbo Fu, Chiew-Lan Tai
To bridge the gap between these two spaces in neural networks, we propose a neural line rasterization module to convert the vector sketch along with the attention estimated by RNN into a bitmap image, which is subsequently consumed by CNN.