Search Results for author: Qing-Guo Chen

Found 15 papers, 6 papers with code

Evaluating Image Caption via Cycle-consistent Text-to-Image Generation

no code implementations7 Jan 2025 Tianyu Cui, Jinbin Bai, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ye Shi

Recent research has revealed that the modality gap generally exists in the representation of contrastive learning-based multi-modal systems, undermining the reliability of cross-modality metrics like CLIPScore.

Contrastive Learning Diversity +2

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

no code implementations6 Jan 2025 Hui Sun, Shiyin Lu, Huanyu Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ming Li

Existing methods, such as uniform frame sampling and query-frame matching, do not capture all of these principles.

Diversity

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation

no code implementations25 Dec 2024 Lunhao Duan, Shanshan Zhao, Wenjun Yan, Yinglun Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Mingming Gong, Gui-Song Xia

Recently, text-to-image generation models have achieved remarkable advancements, particularly with diffusion models facilitating high-quality image synthesis from textual descriptions.

Text-to-Image Generation

OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions

no code implementations9 Dec 2024 Yi-Kai Zhang, Xu-Xiang Zhong, Shiyin Lu, Qing-Guo Chen, De-Chuan Zhan, Han-Jia Ye

The rapid advancements in Large Language Models (LLMs) have significantly expanded their applications, ranging from multilingual support to domain-specific tasks and multimodal integration.

Benchmarking Language Modeling +2

PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

no code implementations4 Dec 2024 Tianyu Chang, Xiaohao Chen. Zhichao Wei, Xuanpu Zhang, Qing-Guo Chen, Weihua Luo, Xun Yang

Then, based on the pre-acquired sparse frame-cloth and frame-frame point alignments, we design the point-enhanced spatial attention (PSA) and point-enhanced temporal attention (PTA) to further improve the try-on accuracy and video coherence of the mask-free model.

Virtual Try-on

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

1 code implementation10 Oct 2024 Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan

We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL.

Feature Compression Image Generation

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

no code implementations11 Jun 2024 Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Lijun Zhang

In this study, we propose an inference trajectory optimization framework based on the preference data extracted from decision trees to address this limitation.

Wings: Learning Multimodal LLMs without Text-only Forgetting

2 code implementations5 Jun 2024 Yi-Kai Zhang, Shiyin Lu, Yang Li, Yanqing Ma, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

Initially, image and text inputs are aligned with visual learners operating alongside the main attention, balancing focus on visual elements.

Question Answering Visual Question Answering

Parrot: Multilingual Visual Instruction Tuning

2 code implementations4 Jun 2024 Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

In this paper, we introduce Parrot, a novel method that utilizes textual guidance to drive visual token alignment at the language level.

Ovis: Structural Embedding Alignment for Multimodal Large Language Model

2 code implementations31 May 2024 Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye

However, the misalignment between two embedding strategies in MLLMs -- the structural textual embeddings based on an embedding look-up table and the continuous embeddings generated directly by the vision encoder -- makes challenges for a more seamless fusion of visual and textual information.

Language Modeling Multimodal Large Language Model +1

TAI++: Text as Image for Multi-Label Image Classification by Co-Learning Transferable Prompt

1 code implementation11 May 2024 Xiangyu Wu, Qing-Yuan Jiang, Yang Yang, Yi-Feng Wu, Qing-Guo Chen, Jianfeng Lu

Then, a co-learning strategy with a dual-adapter module is designed to transfer visual knowledge from pseudo-visual prompt to text prompt, enhancing their visual representation abilities.

Diversity Multi-Label Image Classification +1

Iterative Memory Network for Long Sequential User Behavior Modeling in Recommender Systems

no code implementations29 Sep 2021 Qianying Lin, Wen-Ji Zhou, Yanshi Wang, Qing Da, Qing-Guo Chen, Bing Wang

Extensive empirical studies show that our method outperforms various state-of-the-art sequential modeling methods on both public and industrial datasets for long sequential user behavior modeling.

Recommendation Systems

Multi-label Zero-shot Classification by Learning to Transfer from External Knowledge

no code implementations30 Jul 2020 He Huang, Yuanwei Chen, Wei Tang, Wenhao Zheng, Qing-Guo Chen, Yao Hu, Philip Yu

On the other hand, there is a large semantic gap between seen and unseen classes in the existing multi-label classification datasets.

Classification General Classification +4

Cannot find the paper you are looking for? You can Submit a new open access paper.