Factual Inconsistency Benchmark (FIB) is a benchmark that focuses on the task of summarization. Specifically, the benchmark involves comparing the scores an LLM assigns to a factually consistent versus a factual inconsistent summary for an input news article. For factually consistent summaries, human-written reference summaries are used to manually verify as factually consistent.
Source: Evaluating the Factual Consistency of Large Language Models Through SummarizationPaper | Code | Results | Date | Stars |
---|