TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	TruthfulQA	Mistral-7B-Instruct-v0.2 + TruthX	MC1	0.56	# 2
Question Answering	TruthfulQA	Mistral-7B-Instruct-v0.2 + TruthX	MC2	0.75	# 1
Question Answering	TruthfulQA	LLaMa-2-7B-Chat + TruthX	MC1	0.54	# 3
Question Answering	TruthfulQA	LLaMa-2-7B-Chat + TruthX	MC2	0.74	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/truthx-alleviating-hallucinations-by-editing/question-answering-on-truthfulqa)](https://paperswithcode.com/sota/question-answering-on-truthfulqa?p=truthx-alleviating-hallucinations-by-editing)`

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space

27 Feb 2024 · Shaolei Zhang, Tian Yu, Yang Feng ·

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks. However, they sometimes suffer from producing hallucinations, particularly in cases where they may generate untruthful responses despite possessing the correct knowledge. In this paper, we propose TruthX, an inference-time method to elicit the truthfulness of LLMs by editing their internal representations in truthful space. TruthX employs an auto-encoder to map LLM's representations into semantic and truthful latent spaces respectively, and applies contrastive learning to identify a truthful editing direction within the truthful space. During inference, by editing LLM's internal representations in truthful space, TruthX effectively enhances the truthfulness of LLMs. Experiments show that TruthX effectively improves the truthfulness of 13 advanced LLMs by an average of 20% on TruthfulQA benchmark. Further analyses suggest that the truthful space acquired by TruthX plays a pivotal role in controlling LLM to produce truthful or hallucinatory responses.

PDF Abstract

Code

Add Remove Mark official

ictnlp/truthx official

Tasks

Add Remove

Contrastive Learning

Hallucination

Hallucination Evaluation

Language Modelling

Large Language Model

Question Answering

Text Generation

Datasets

Natural Questions

TriviaQA

TruthfulQA

Results from the Paper

Add Remove

Ranked #2 on Question Answering on TruthfulQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	TruthfulQA	Mistral-7B-Instruct-v0.2 + TruthX	MC1	0.56	# 2	Compare
Question Answering	TruthfulQA	Mistral-7B-Instruct-v0.2 + TruthX	MC2	0.75	# 1	Compare
Question Answering	TruthfulQA	LLaMa-2-7B-Chat + TruthX	MC1	0.54	# 3	Compare
Question Answering	TruthfulQA	LLaMa-2-7B-Chat + TruthX	MC2	0.74	# 2	Compare

Methods

Add Remove

AutoEncoder • Contrastive Learning • LLaMA

Edit Social Preview

TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove