FIB (Factual Inconsistency Benchmark)

Introduced by Tam et al. in Evaluating the Factual Consistency of Large Language Models Through News Summarization

Factual Inconsistency Benchmark (FIB) is a benchmark that focuses on the task of summarization. Specifically, the benchmark involves comparing the scores an LLM assigns to a factually consistent versus a factual inconsistent summary for an input news article. For factually consistent summaries, human-written reference summaries are used to manually verify as factually consistent.

Source: Evaluating the Factual Consistency of Large Language Models Through Summarization

Homepage