Hallucination Evaluation
20 papers with code • 0 benchmarks • 3 datasets
Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.
Benchmarks
These leaderboards are used to track progress in Hallucination Evaluation
Libraries
Use these libraries to find Hallucination Evaluation models and implementationsMost implemented papers
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs.
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i. e., content that conflicts with the source or cannot be verified by the factual knowledge.
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
This motivates the development of AutoHallusion, the first automated benchmark generation approach that employs several key strategies to create a diverse range of hallucination examples.
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models
Large language models (LLMs) have achieved remarkable performance in natural language understanding and generation tasks.
Evaluation and Analysis of Hallucination in Large Vision-Language Models
In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.