TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	HotpotQA	Beam Retrieval	ANS-EM	0.727	# 1
Question Answering	HotpotQA	Beam Retrieval	ANS-F1	0.850	# 1
Question Answering	HotpotQA	Beam Retrieval	SUP-EM	0.663	# 1
Question Answering	HotpotQA	Beam Retrieval	SUP-F1	0.901	# 1
Question Answering	HotpotQA	Beam Retrieval	JOINT-EM	0.505	# 1
Question Answering	HotpotQA	Beam Retrieval	JOINT-F1	0.775	# 1
Multi-hop Question Answering	MuSiQue-Ans	Beam Retrieval	An	69.2	# 1
Multi-hop Question Answering	MuSiQue-Ans	Beam Retrieval	Sp	91.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beam-retrieval-general-end-to-end-retrieval/question-answering-on-hotpotqa)](https://paperswithcode.com/sota/question-answering-on-hotpotqa?p=beam-retrieval-general-end-to-end-retrieval)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beam-retrieval-general-end-to-end-retrieval/multi-hop-question-answering-on-musique-ans)](https://paperswithcode.com/sota/multi-hop-question-answering-on-musique-ans?p=beam-retrieval-general-end-to-end-retrieval)`

End-to-End Beam Retrieval for Multi-Hop Question Answering

17 Aug 2023 · Jiahao Zhang, Haiyang Zhang, Dongmei Zhang, Yong liu, Shen Huang ·

Multi-hop question answering (QA) involves finding multiple relevant passages and step-by-step reasoning to answer complex questions, indicating a retrieve-and-read paradigm. However, previous retrievers were customized for two-hop questions, and most of them were trained separately across different hops, resulting in a lack of supervision over the entire multi-hop retrieval process and leading to poor performance in complicated scenarios beyond two hops. In this work, we introduce Beam Retrieval, an end-to-end beam retrieval framework for multi-hop QA. This approach models the multi-hop retrieval process in an end-to-end manner by jointly optimizing an encoder and two classification heads across all hops. Moreover, Beam Retrieval maintains multiple partial hypotheses of relevant passages at each step, expanding the search space and reducing the risk of missing relevant passages. To establish a complete QA system, we incorporate a supervised reader or a large language model (LLM). Experimental results demonstrate that Beam Retrieval achieves a nearly 50% improvement compared with baselines on challenging MuSiQue-Ans, and it also surpasses all previous retrievers on HotpotQA and achieves 99.9% precision on 2WikiMultiHopQA. Providing high-quality context, Beam Retrieval helps our supervised reader achieve new state-of-the-art performance and substantially improves the few-shot QA performance of LLMs.

PDF Abstract

Code

Add Remove Mark official

canghongjian/beam_retriever official

Alab-NII/2wikimultihop official

Tasks

Add Remove

Language Modelling

Large Language Model

Multi-hop Question Answering

Question Answering

Retrieval

Datasets

HotpotQA 2WikiMultiHopQA

MuSiQue-Ans

Results from the Paper

Add Remove

Ranked #1 on Question Answering on HotpotQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	HotpotQA	Beam Retrieval	ANS-EM	0.727	# 1	Compare
			ANS-F1	0.850	# 1	Compare
			SUP-EM	0.663	# 1	Compare
			SUP-F1	0.901	# 1	Compare
			JOINT-EM	0.505	# 1	Compare
			JOINT-F1	0.775	# 1	Compare
Multi-hop Question Answering	MuSiQue-Ans	Beam Retrieval	An	69.2	# 1	Compare
Multi-hop Question Answering	MuSiQue-Ans	Beam Retrieval	Sp	91.4	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

End-to-End Beam Retrieval for Multi-Hop Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove