TruthfulQA

26 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

RLHF Workflow: From Reward Modeling to Online RLHF

RLHFlow/RLHF-Reward-Modeling 13 May 2024

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.

TruthfulQA: Measuring How Models Mimic Human Falsehoods

sylinrl/truthfulqa ACL 2022

We crafted questions that some humans would answer falsely due to a false belief or misconception.

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

voidism/dola 7 Sep 2023

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

hiyouga/llama-factory 25 Dec 2023

Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.

Tuning Language Models by Proxy

alisawuffles/proxy-tuning 16 Jan 2024

Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors.

Measuring Reliability of Large Language Models through Semantic Consistency

harshraj172/measuring-reliability-of-llms 10 Nov 2022

While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them.

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

likenneth/honest_llama NeurIPS 2023

This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark.

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

declare-lab/red-instruct 18 Aug 2023

In this work, we propose a new safety evaluation benchmark RED-EVAL that carries out red-teaming.

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

ucsc-vlaa/sight-beyond-text 13 Sep 2023

Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses.

RAIN: Your Language Models Can Align Themselves without Finetuning

SafeAILab/RAIN 13 Sep 2023

We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting.