Hallucination Evaluation

20 papers with code • 0 benchmarks • 3 datasets

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Libraries

Use these libraries to find Hallucination Evaluation models and implementations

Most implemented papers

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

tianyi-lab/hallusionbench CVPR 2024

Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs.

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

RUCAIBox/HaluEval 19 May 2023

Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i. e., content that conflicts with the source or cannot be verified by the factual knowledge.

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

hiyouga/llama-factory 25 Dec 2023

Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.

AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

wuxiyang1996/AutoHallusion 16 Jun 2024

This motivates the development of AutoHallusion, the first automated benchmark generation approach that employs several key strategies to create a diverse range of hallucination examples.

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models

wyl-willing/MindMap 17 Aug 2023

Large language models (LLMs) have achieved remarkable performance in natural language understanding and generation tasks.

Evaluation and Analysis of Hallucination in Large Vision-Language Models

junyangwang0410/haelm 29 Aug 2023

In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

yiyangzhou/lure 1 Oct 2023

Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

junyangwang0410/amber 13 Nov 2023

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

casszhao/prunehall 15 Nov 2023

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

yuqifan1117/hallucidoctor CVPR 2024

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.