TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Entity Typing	Open Entity	K-Adapter ( fac-adapter )	F1	77.6916	# 1
Entity Typing	Open Entity	K-Adapter ( fac-adapter )	Precision	79.6712	# 1
Entity Typing	Open Entity	K-Adapter ( fac-adapter )	Recall	75.8081	# 2
Entity Typing	Open Entity	K-Adapter ( fac-adapter + lin-adapter )	F1	77.6127	# 2
Entity Typing	Open Entity	K-Adapter ( fac-adapter + lin-adapter )	Precision	78.9956	# 2
Entity Typing	Open Entity	K-Adapter ( fac-adapter + lin-adapter )	Recall	76.2774	# 1
Relation Classification	TACRED	K-Adapter	F1	72.0	# 13
Relation Classification	TACRED	RoBERTa	F1	71.3	# 8
Relation Extraction	TACRED	K-ADAPTER (F+L)	F1	72.04	# 13
Relation Extraction	TACRED	K-ADAPTER (F+L)	F1 (1% Few-Shot)	13.8	# 5
Relation Extraction	TACRED	K-ADAPTER (F+L)	F1 (5% Few-Shot)	45.1	# 4
Relation Extraction	TACRED	K-ADAPTER (F+L)	F1 (10% Few-Shot)	56.0	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-adapter-infusing-knowledge-into-pre-trained/entity-typing-on-open-entity)](https://paperswithcode.com/sota/entity-typing-on-open-entity?p=k-adapter-infusing-knowledge-into-pre-trained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-adapter-infusing-knowledge-into-pre-trained/relation-classification-on-tacred-1)](https://paperswithcode.com/sota/relation-classification-on-tacred-1?p=k-adapter-infusing-knowledge-into-pre-trained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/k-adapter-infusing-knowledge-into-pre-trained/relation-extraction-on-tacred)](https://paperswithcode.com/sota/relation-extraction-on-tacred?p=k-adapter-infusing-knowledge-into-pre-trained)`

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Findings (ACL) 2021 · Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei, Xuanjing Huang, Jianshu ji, Guihong Cao, Daxin Jiang, Ming Zhou ·

We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, the historically injected knowledge would be flushed away. To address this, we propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model. Taking RoBERTa as the backbone model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus multiple adapters can be efficiently trained in a distributed way. As a case study, we inject two kinds of knowledge in this work, including (1) factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata and (2) linguistic knowledge obtained via dependency parsing. Results on three knowledge-driven tasks, including relation classification, entity typing, and question answering, demonstrate that each adapter improves the performance and the combination of both adapters brings further improvements. Further analysis indicates that K-Adapter captures versatile knowledge than RoBERTa.

PDF Abstract Findings (ACL) 2021 PDF Findings (ACL) 2021 Abstract

Code

Add Remove Mark official

microsoft/K-Adapter

151

stevekgyang/sccl

Tasks

Add Remove

Dependency Parsing

Entity Typing

Question Answering

Relation Classification

Relation Extraction

Datasets

LAMA

TACRED

SearchQA T-REx

CosmosQA

QUASAR-T

QUASAR

Open Entity

Results from the Paper

Edit

Ranked #1 on Entity Typing on Open Entity

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Entity Typing	Open Entity	K-Adapter ( fac-adapter )	F1	77.6916	# 1	Compare
			Precision	79.6712	# 1	Compare
			Recall	75.8081	# 2	Compare
Entity Typing	Open Entity	K-Adapter ( fac-adapter + lin-adapter )	F1	77.6127	# 2	Compare
			Precision	78.9956	# 2	Compare
			Recall	76.2774	# 1	Compare
Relation Classification	TACRED	K-Adapter	F1	72.0	# 13	Compare
Relation Classification	TACRED	RoBERTa	F1	71.3	# 8	Compare
Relation Extraction	TACRED	K-ADAPTER (F+L)	F1	72.04	# 13	Compare
			F1 (1% Few-Shot)	13.8	# 5	Compare
			F1 (5% Few-Shot)	45.1	# 4	Compare
			F1 (10% Few-Shot)	56.0	# 4	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • RoBERTa • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove