As demonstrated by GPT-3 and T5, transformers grow in capability as parameter spaces become larger and larger. However, for tasks that require a large amount of knowledge, non-parametric memory allows models to grow dramatically with a sub-linear increase in computational cost and GPU memory requirements. Recent models such as RAG and REALM have introduced retrieval into conditional generation. These models incorporate neural initial retrieval from a corpus of passages. We build on this line of research, proposing Re2G, which combines both neural initial retrieval and reranking into a BART-based sequence-to-sequence generation. Our reranking approach also permits merging retrieval results from sources with incomparable scores, enabling an ensemble of BM25 and neural initial retrieval. To train our system end-to-end, we introduce a novel variation of knowledge distillation to train the initial retrieval, reranker, and generation using only ground truth on the target sequence output. We find large gains in four diverse tasks: zero-shot slot filling, question answering, fact-checking, and dialog, with relative gains of 9% to 34% over the previous state-of-the-art on the KILT leaderboard. We make our code available as open source at https://github.com/IBM/kgi-slot-filling/tree/re2g.

PDF Abstract NAACL 2022 PDF NAACL 2022 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Fact Verification KILT: FEVER Re2G KILT-AC 78.53 # 1
R-Prec 88.92 # 1
Recall@5 92.52 # 1
Accuracy 89.55 # 1
Open-Domain Question Answering KILT: Natural Questions Re2G KILT-EM 43.56 # 1
R-Prec 70.78 # 1
Recall@5 76.63 # 1
EM 51.73 # 2
F1 60.97 # 2
KILT-F1 49.8 # 1
Slot Filling KILT: T-REx Re2G KILT-AC 75.84 # 1
R-Prec 80.7 # 2
Recall@5 89.0 # 2
Accuracy 87.68 # 1
F1 89.93 # 1
KILT-F1 77.05 # 1
Open-Domain Question Answering KILT: TriviaQA Re2G KILT-EM 57.91 # 1
R-Prec 72.68 # 1
Recall@5 74.23 # 4
EM 76.27 # 1
F1 81.4 # 1
KILT-F1 61.78 # 1
Open-Domain Dialog KILT: Wizard of Wikipedia Re2G KILT-RL 11.39 # 2
R-Prec 60.1 # 3
Recall@5 79.98 # 2
ROUGE-L 16.76 # 2
F1 18.9 # 2
KILT-F1 12.98 # 2

Methods