Search Results for author: Xiaoyong Zhu

Found 5 papers, 0 papers with code

"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

no code implementations17 Feb 2025 Jihao Gu, Yingyao Wang, Pi Bu, Chen Wang, ZiMing Wang, Tengtao Song, Donglai Wei, Jiale Yuan, Yingxiu Zhao, Yancheng He, Shilong Li, Jiaheng Liu, Meng Cao, Jun Song, Yingshui Tan, Xiang Li, Wenbo Su, Zhicheng Zheng, Xiaoyong Zhu, Bo Zheng

The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models' knowledge capacity and reliability.

Object Recognition Question Answering +1

Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

no code implementations17 Feb 2025 Yingshui Tan, Yilei Jiang, Yanshi Li, Jiaheng Liu, Xingyuan Bu, Wenbo Su, Xiangyu Yue, Xiaoyong Zhu, Bo Zheng

Fine-tuning large language models (LLMs) based on human preferences, commonly achieved through reinforcement learning from human feedback (RLHF), has been effective in improving their performance.

Safety Alignment

PSA-VLM: Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment

no code implementations18 Nov 2024 Zhendong Liu, Yuanbi Nie, Yingshui Tan, Jiaheng Liu, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng

However, recent research shows that the visual modality in VLMs is highly vulnerable, allowing attackers to bypass safety alignment in LLMs through visually transmitted content, launching harmful attacks.

Language Modeling Language Modelling +1

Safety Alignment for Vision Language Models

no code implementations22 May 2024 Zhendong Liu, Yuanbi Nie, Yingshui Tan, Xiangyu Yue, Qiushi Cui, Chongjun Wang, Xiaoyong Zhu, Bo Zheng

To address this issue, we enhance the existing VLMs' visual modality safety alignment by adding safety modules, including a safety projector, safety tokens, and a safety head, through a two-stage training process, effectively improving the model's defense against risky images.

Red Teaming Safety Alignment

Cannot find the paper you are looking for? You can Submit a new open access paper.