TruthfulQA
26 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in TruthfulQA
Most implemented papers
RLHF Workflow: From Reward Modeling to Online RLHF
We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature.
TruthfulQA: Measuring How Models Mimic Human Falsehoods
We crafted questions that some humans would answer falsely due to a false belief or misconception.
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.
Tuning Language Models by Proxy
Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors.
Measuring Reliability of Large Language Models through Semantic Consistency
While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them.
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark.
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
In this work, we propose a new safety evaluation benchmark RED-EVAL that carries out red-teaming.
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses.
RAIN: Your Language Models Can Align Themselves without Finetuning
We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting.