Attention Dropout is a type of dropout used in attention-based architectures, where elements are randomly dropped out of the softmax in the attention equation. For example, for scaled-dot product attention, we would drop elements from the first term:
$$ {\text{Attention}}(Q, K, V) = \text{softmax}\left(\frac{QK^{T}}{\sqrt{d_k}}\right)V $$
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 93 | 12.48% |
Large Language Model | 34 | 4.56% |
Sentiment Analysis | 28 | 3.76% |
Text Classification | 26 | 3.49% |
Retrieval | 26 | 3.49% |
Question Answering | 25 | 3.36% |
Classification | 21 | 2.82% |
Prompt Engineering | 19 | 2.55% |
Decision Making | 17 | 2.28% |