RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge about how to build high-quality feedback with open-source MLLMs. In this work, we introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm. RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation for preference learning and self-feedback guidance for inference-time scaling. Extensive experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models at both preference learning and inference time. RLAIF-V 7B reduces object hallucination by 80.7\% and overall hallucination by 33.7\%. Remarkably, RLAIF-V 12B further reveals the self-alignment potential of open-source MLLMs, where the model can learn from feedback of itself to achieve super GPT-4V trustworthiness.

PDF Abstract

Datasets


Introduced in the Paper:

RLAIF-V Dataset

Used in the Paper:

MMStar Object HalBench RLHF-V Dataset

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Visual Question Answering AMBER RLAIF-V 12B Accuracy 88 # 1
F1 90.9 # 1
Visual Question Answering MMHal-Bench RLAIF-V 7B Score 3.06 # 2
Hallucination Rate 29.2 # 1
Visual Question Answering MMHal-Bench RLAIF-V 12B Score 3.36 # 1
Hallucination Rate 29.2 # 1
Image Captioning Object HalBench RLAIF-V 12B chair_s 3.3 # 3
chair_i 1.8 # 3
Image Captioning Object HalBench RLAIF-V 7B chair_s 8.5 # 2
chair_i 4.3 # 2

Methods


No methods listed for this paper. Add relevant methods here