TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Natural Language Inference	ANLI test	KiC-770M	A1	36.30	# 14
Natural Language Inference	ANLI test	KiC-770M	A2	35.00	# 19
Natural Language Inference	ANLI test	KiC-770M	A3	37.60	# 20
Question Answering	COPA	KiC-770M	Accuracy	85.30	# 28
Sentence Completion	HellaSwag	KiC-770M	Accuracy	29.6	# 84
Natural Language Inference	RTE	KiC-770M	Accuracy	74.00	# 46
Question Answering	StoryCloze	KiC-770M	Accuracy	94.40	# 5
Coreference Resolution	Winograd Schema Challenge	KiC-770M	Accuracy	65.40	# 42
Common Sense Reasoning	WinoGrande	KiC-770M	Accuracy	55.30	# 60
Word Sense Disambiguation	Words in Context	KiC-770M	Accuracy	52.40	# 26

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/question-answering-on-storycloze)](https://paperswithcode.com/sota/question-answering-on-storycloze?p=knowledge-in-context-towards-knowledgeable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/natural-language-inference-on-anli-test)](https://paperswithcode.com/sota/natural-language-inference-on-anli-test?p=knowledge-in-context-towards-knowledgeable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/word-sense-disambiguation-on-words-in-context)](https://paperswithcode.com/sota/word-sense-disambiguation-on-words-in-context?p=knowledge-in-context-towards-knowledgeable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/question-answering-on-copa)](https://paperswithcode.com/sota/question-answering-on-copa?p=knowledge-in-context-towards-knowledgeable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/coreference-resolution-on-winograd-schema)](https://paperswithcode.com/sota/coreference-resolution-on-winograd-schema?p=knowledge-in-context-towards-knowledgeable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=knowledge-in-context-towards-knowledgeable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=knowledge-in-context-towards-knowledgeable)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/knowledge-in-context-towards-knowledgeable/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=knowledge-in-context-towards-knowledgeable)`

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

28 Oct 2022 · Xiaoman Pan, Wenlin Yao, Hongming Zhang, Dian Yu, Dong Yu, Jianshu Chen ·

Fully-parametric language models generally require a huge number of model parameters to store the necessary knowledge for solving multiple natural language tasks in zero/few-shot settings. In addition, it is hard to adapt to the evolving world knowledge without the costly model re-training. In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory. Specifically, the external memory contains six different types of knowledge: entity, dictionary, commonsense, event, script, and causality knowledge. For each input instance, the KiC model adaptively selects a knowledge type and retrieves the most helpful pieces of knowledge. The input instance along with its knowledge augmentation is fed into a text-to-text model (e.g., T5) to generate the output answer, where both the input and the output are in natural language forms after prompting. Interestingly, we find that KiC can be identified as a special mixture-of-experts (MoE) model, where the knowledge selector plays the role of a router that is used to determine the sequence-to-expert assignment in MoE. This key observation inspires us to develop a novel algorithm for training KiC with an instance-adaptive knowledge selector. As a knowledge-rich semi-parametric language model, KiC only needs a much smaller parametric part to achieve superior zero-shot performance on unseen tasks. By evaluating on 40+ different tasks, we show that KiC_Large with 770M parameters easily outperforms large language models (LMs) that are 4-39x larger by a large margin. We also demonstrate that KiC exhibits emergent abilities at a much smaller model scale compared to the fully-parametric models.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Common Sense Reasoning

Coreference Resolution

Language Modelling

Natural Language Inference

Natural Language Inference (Zero-Shot)

Question Answering

Sentence Completion

Word Sense Disambiguation

World Knowledge

Datasets

GLUE

IMDb Movie Reviews

MRPC

MMLU

HellaSwag

BoolQ

PIQA

OpenBookQA

WinoGrande

WSC

COPA

ANLI

WikiQA WiC

PAWS

QASC

CosmosQA

SciQ

WikiHop

DREAM RTE StoryCloze

QuaRTz

WIQA

Results from the Paper

Edit

Ranked #5 on Question Answering on StoryCloze

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	COPA	KiC-770M	Accuracy	85.30	# 28	Compare
Sentence Completion	HellaSwag	KiC-770M	Accuracy	29.6	# 84	Compare
Question Answering	StoryCloze	KiC-770M	Accuracy	94.40	# 5	Compare
Coreference Resolution	Winograd Schema Challenge	KiC-770M	Accuracy	65.40	# 42	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Natural Language Inference	ANLI test	KiC-770M	A1	36.30	# 14	See all
			A2	35.00	# 19	See all
			A3	37.60	# 20	See all
Natural Language Inference	RTE	KiC-770M	Accuracy	74.00	# 46	See all
Common Sense Reasoning	WinoGrande	KiC-770M	Accuracy	55.30	# 60	See all
Word Sense Disambiguation	Words in Context	KiC-770M	Accuracy	52.40	# 26	See all

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit