Hallucination Evaluation

12 papers with code • 0 benchmarks • 1 datasets

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Benchmarks

Add a Result

These leaderboards are used to track progress in Hallucination Evaluation

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

XinhuaHallucinations

Most implemented papers

Most implemented Social Latest No code

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

RUCAIBox/HaluEval • 19 May 2023

Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i. e., content that conflicts with the source or cannot be verified by the factual knowledge.

Paper
Code

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

hiyouga/llama-factory • • 25 Dec 2023

Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.

Paper
Code

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models

wyl-willing/MindMap • 17 Aug 2023

Large language models (LLMs) have achieved remarkable performance in natural language understanding and generation tasks.

Paper
Code

Evaluation and Analysis of Hallucination in Large Vision-Language Models

junyangwang0410/haelm • • 29 Aug 2023

In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.

Paper
Code

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

yiyangzhou/lure • • 1 Oct 2023

Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.

Paper
Code

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

junyangwang0410/amber • 13 Nov 2023

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Paper
Code

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

casszhao/prunehall • • 15 Nov 2023

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.

Paper
Code

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

yuqifan1117/hallucidoctor • • 22 Nov 2023

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.

Paper
Code

UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

IAAR-Shanghai/UHGEval • 26 Nov 2023

These techniques encompass the use of directed hallucination induction and strategies that deliberately alter authentic text to produce hallucinations.

Paper
Code

Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites

anonymousanoy/fohe • • 4 Dec 2023

The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods.

Paper
Code

Hallucination Evaluation

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result