no code implementations • 10 Oct 2023 • Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas
However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships.
2 code implementations • 7 Sep 2023 • Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao
During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder.
1 code implementation • 8 Jun 2023 • Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Anastasis Stathopoulos, Xiaoxiao He, Yuxiao Chen, Di Liu, Qilong Zhangli, Jindong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas
Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing with cross-attention control.
2 code implementations • CVPR 2023 • Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S. -H. Gary Chan
To achieve faster networks, we revisit popular operators and demonstrate that such low FLOPS is mainly due to frequent memory access of the operators, especially the depthwise convolution.
no code implementations • 21 Mar 2022 • Di Liu, Yunhe Gao, Qilong Zhangli, Ligong Han, Xiaoxiao He, Zhaoyang Xia, Song Wen, Qi Chang, Zhennan Yan, Mu Zhou, Dimitris Metaxas
Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis.
no code implementations • 6 Mar 2022 • Qilong Zhangli, Jingru Yi, Di Liu, Xiaoxiao He, Zhaoyang Xia, Qi Chang, Ligong Han, Yunhe Gao, Song Wen, Haiming Tang, He Wang, Mu Zhou, Dimitris Metaxas
Top-down instance segmentation framework has shown its superiority in object detection compared to the bottom-up framework.
no code implementations • 20 May 2021 • Haoyue Bai, Song Wen, S. -H. Gary Chan
The classification branch extracts global group priors by learning correlations among image clusters.
no code implementations • 12 Jan 2021 • Jierun Chen, Song Wen, S. -H. Gary Chan
In this paper, we propose and study Wild-JDD, a novel learning framework for joint demosaicking and denoising in the wild.
1 code implementation • 9 Sep 2019 • Haoyue Bai, Song Wen, S. -H. Gary Chan
Designing a general crowd counting algorithm applicable to a wide range of crowd images is challenging, mainly due to the possibly large variation in object scales and the presence of many isolated small clusters.