DeBERTa

Introduced by He et al. in DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa is a Transformer-based neural language model that aims to improve the BERT and RoBERTa models with two techniques: a disentangled attention mechanism and an enhanced mask decoder. The disentangled attention mechanism is where each word is represented unchanged using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangle matrices on their contents and relative positions. The enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve model’s generalization on downstream tasks.

Source: DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	21	14.58%
Sentence	10	6.94%
Question Answering	8	5.56%
Natural Language Understanding	7	4.86%
Natural Language Inference	6	4.17%
Sentiment Analysis	5	3.47%
Named Entity Recognition (NER)	4	2.78%
Self-Supervised Learning	3	2.08%
Large Language Model	3	2.08%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Disentangled Attention Mechanism	Attention Mechanisms

Categories

Add Remove

Transformers

Autoencoding Transformers