Search Results for author: Chenhang Cui

Found 12 papers, 9 papers with code

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

1 code implementation18 Nov 2024 Chenhang Cui, Gelei Deng, An Zhang, Jingnan Zheng, Yicong Li, Lianli Gao, Tianwei Zhang, Tat-Seng Chua

Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications.

Response Generation

Dual-Optimized Adaptive Graph Reconstruction for Multi-View Graph Clustering

no code implementations30 Oct 2024 Zichen Wen, Tianyi Wu, Yazhou Ren, Yawen Ling, Chenhang Cui, Xiaorong Pu, Lifang He

It mainly aims to reconstruct the graph structure adapted to traditional GNNs to deal with heterophilous graph issues while maintaining the advantages of traditional GNNs.

Clustering Graph Clustering +1

Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

no code implementations18 Oct 2024 Chenhang Cui, An Zhang, Yiyang Zhou, Zhaorun Chen, Gelei Deng, Huaxiu Yao, Tat-Seng Chua

The recent advancements in large language models (LLMs) and pre-trained vision models have accelerated the development of vision-language large models (VLLMs), enhancing the interaction between visual and linguistic modalities.

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

1 code implementation14 Oct 2024 Peng Xia, Siwei Han, Shi Qiu, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, Lijuan Wang, Huaxiu Yao

Extensive experiments demonstrate the effectiveness of our benchmark and metrics in providing a comprehensive evaluation of interleaved LVLMs.

Multiple-choice

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

1 code implementation5 Jul 2024 Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

Compared with open-source VLMs, smaller-sized scoring models can provide better feedback regarding text-image alignment and image quality, while VLMs provide more accurate feedback regarding safety and generation bias due to their stronger reasoning capabilities.

Hallucination Text-to-Image Generation

Calibrated Self-Rewarding Vision Language Models

1 code implementation23 May 2024 Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao

In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input.

Hallucination Language Modelling +1

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

1 code implementation18 Feb 2024 Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

This procedure is not perfect and can cause the model to hallucinate - provide answers that do not accurately reflect the image, even when the core LLM is highly factual and the vision backbone has sufficiently complete representations.

Hallucination Instruction Following +1

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

1 code implementation27 Nov 2023 Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

Adversarial Robustness Visual Question Answering (VQA) +1

Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

1 code implementation6 Nov 2023 Chenhang Cui, Yiyang Zhou, Xinyu Yang, Shirley Wu, Linjun Zhang, James Zou, Huaxiu Yao

To bridge this gap, we introduce a new benchmark, namely, the Bias and Interference Challenges in Visual Language Models (Bingo).

Hallucination

Bright Channel Prior Attention for Multispectral Pedestrian Detection

no code implementations22 May 2023 Chenhang Cui, Jinyu Xie, Yechenhao Yang

The method uses the V-channel of the HSV image of the thermal image as an attention map to trigger the unsupervised auto-encoder for visible light images, which gradually emphasizes pedestrian features across layers.

Image Enhancement object-detection +2

Deep Multi-View Subspace Clustering with Anchor Graph

1 code implementation11 May 2023 Chenhang Cui, Yazhou Ren, Jingyu Pu, Xiaorong Pu, Lifang He

To significantly reduce the complexity, we construct an anchor graph with small size for each view.

Clustering Contrastive Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.