Regularization

Attention Dropout

Attention Dropout is a type of dropout used in attention-based architectures, where elements are randomly dropped out of the softmax in the attention equation. For example, for scaled-dot product attention, we would drop elements from the first term:

$$ {\text{Attention}}(Q, K, V) = \text{softmax}\left(\frac{QK^{T}}{\sqrt{d_k}}\right)V $$

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 91 13.56%
Question Answering 33 4.92%
Text Generation 27 4.02%
Text Classification 23 3.43%
Sentiment Analysis 17 2.53%
Speech Recognition 14 2.09%
Machine Translation 14 2.09%
Few-Shot Learning 12 1.79%
Knowledge Distillation 12 1.79%

Components


Component Type
Dropout
Regularization

Categories