DeBERTa is a Transformer-based neural language model that aims to improve the BERT and RoBERTa models with two techniques: a disentangled attention mechanism and an enhanced mask decoder. The disentangled attention mechanism is where each word is represented unchanged using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangle matrices on their contents and relative positions. The enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve model’s generalization on downstream tasks.
Source: DeBERTa: Decoding-enhanced BERT with Disentangled AttentionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 25 | 10.64% |
Language Modeling | 19 | 8.09% |
Sentence | 11 | 4.68% |
Natural Language Inference | 9 | 3.83% |
Question Answering | 9 | 3.83% |
Sentiment Analysis | 8 | 3.40% |
Natural Language Understanding | 8 | 3.40% |
Large Language Model | 6 | 2.55% |
Named Entity Recognition (NER) | 5 | 2.13% |
Component | Type |
|
---|---|---|
![]() |
Attention Mechanisms |