no code implementations • 7 Jan 2025 • Tianyu Cui, Jinbin Bai, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ye Shi
Recent research has revealed that the modality gap generally exists in the representation of contrastive learning-based multi-modal systems, undermining the reliability of cross-modality metrics like CLIPScore.
no code implementations • 6 Jan 2025 • Hui Sun, Shiyin Lu, Huanyu Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ming Li
Existing methods, such as uniform frame sampling and query-frame matching, do not capture all of these principles.
no code implementations • 25 Dec 2024 • Lunhao Duan, Shanshan Zhao, Wenjun Yan, Yinglun Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Mingming Gong, Gui-Song Xia
Recently, text-to-image generation models have achieved remarkable advancements, particularly with diffusion models facilitating high-quality image synthesis from textual descriptions.
no code implementations • 9 Dec 2024 • Yi-Kai Zhang, Xu-Xiang Zhong, Shiyin Lu, Qing-Guo Chen, De-Chuan Zhan, Han-Jia Ye
The rapid advancements in Large Language Models (LLMs) have significantly expanded their applications, ranging from multilingual support to domain-specific tasks and multimodal integration.
no code implementations • 4 Dec 2024 • Tianyu Chang, Xiaohao Chen. Zhichao Wei, Xuanpu Zhang, Qing-Guo Chen, Weihua Luo, Xun Yang
Then, based on the pre-acquired sparse frame-cloth and frame-frame point alignments, we design the point-enhanced spatial attention (PSA) and point-enhanced temporal attention (PTA) to further improve the try-on accuracy and video coherence of the mask-free model.
1 code implementation • 10 Oct 2024 • Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan
We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL.
no code implementations • 11 Jun 2024 • Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang
In this study, we propose an inference trajectory optimization framework based on the preference data extracted from decision trees to address this limitation.
2 code implementations • 5 Jun 2024 • Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye
Initially, image and text inputs are aligned with visual learners operating alongside the main attention, balancing focus on visual elements.
2 code implementations • 4 Jun 2024 • Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye
In this paper, we introduce Parrot, a novel method that utilizes textual guidance to drive visual token alignment at the language level.
2 code implementations • 31 May 2024 • Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye
However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly by the vision encoder -- makes challenges for a more seamless fusion of visual and textual information.
1 code implementation • 11 May 2024 • Xiangyu Wu, Qing-Yuan Jiang, Yang Yang, Yi-Feng Wu, Qing-Guo Chen, Jianfeng Lu
Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities.
1 code implementation • 8 Aug 2022 • Qianying Lin, Wen-Ji Zhou, Yanshi Wang, Qing Da, Qing-Guo Chen, Bing Wang
SAM supports efficient training and real-time inference for user behavior sequences with lengths on the scale of thousands.
no code implementations • 29 Sep 2021 • Qianying Lin, Wen-Ji Zhou, Yanshi Wang, Qing Da, Qing-Guo Chen, Bing Wang
Extensive empirical studies show that our method outperforms various state-of-the-art sequential modeling methods on both public and industrial datasets for long sequential user behavior modeling.
no code implementations • 30 Jul 2020 • He Huang, Yuanwei Chen, Wei Tang, Wenhao Zheng, Qing-Guo Chen, Yao Hu, Philip Yu
On the other hand, there is a large semantic gap between seen and unseen classes in the existing multi-label classification datasets.
no code implementations • 8 Jan 2020 • Shu-Ting Shi, Wenhao Zheng, Jun Tang, Qing-Guo Chen, Yao Hu, Jianke Zhu, Ming Li
Click-through rate (CTR) prediction is an essential task in industrial applications such as video recommendation.