Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning

1 code implementation7 Nov 2023 Ruosen Li, Xinya Du

Neural models, including large language models (LLMs), achieve superior performance on multi-hop question-answering.

Multi-hop Question Answering Question Answering

FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models

1 code implementation2 Nov 2023 Liqiang Jing, Ruosen Li, Yunmo Chen, Mengzhao Jia, Xinya Du

We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of the generated free-form answers from large vision-language models (LVLMs).

Descriptive Instruction Following

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations

no code implementations6 Jul 2023 Ruosen Li, Teerth Patel, Xinya Du

Specifically, we propose the (1) peer rank (PR) algorithm that takes into account each peer LLM's pairwise preferences of all answer pairs, and outputs a final ranking of models; and (2) peer discussion (PD), where we prompt two LLMs to discuss and try to reach a mutual agreement on preferences of two answers.

Language Modelling Large Language Model +1

