Attention Dropout is a type of dropout used in attention-based architectures, where elements are randomly dropped out of the softmax in the attention equation. For example, for scaled-dot product attention, we would drop elements from the first term:
$$ {\text{Attention}}(Q, K, V) = \text{softmax}\left(\frac{QK^{T}}{\sqrt{d_k}}\right)V $$
Paper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Retrieval | 76 | 9.13% |
Language Modelling | 66 | 7.93% |
Question Answering | 48 | 5.77% |
Large Language Model | 38 | 4.57% |
Sentence | 25 | 3.00% |
Text Generation | 23 | 2.76% |
In-Context Learning | 22 | 2.64% |
Prompt Engineering | 16 | 1.92% |
Information Retrieval | 16 | 1.92% |