K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Entity Typing Open Entity K-Adapter ( fac-adapter + lin-adapter ) F1 77.6127 # 2
Precision 78.9956 # 2
Recall 76.2774 # 1
Entity Typing Open Entity K-Adapter ( fac-adapter ) F1 77.6916 # 1
Precision 79.6712 # 1
Recall 75.8081 # 2
Relation Extraction TACRED K-ADAPTER (F+L) F1 72.04 # 5

Methods used in the Paper


METHOD TYPE
Residual Connection
Skip Connections
Attention Dropout
Regularization
Linear Warmup With Linear Decay
Learning Rate Schedules
Weight Decay
Regularization
RoBERTa
Transformers
GELU
Activation Functions
Dense Connections
Feedforward Networks
Adam
Stochastic Optimization
WordPiece
Subword Segmentation
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
BERT
Language Models