Embedding Dropout is equivalent to performing dropout on the embedding matrix at a word level, where the dropout is broadcast across all the word vector’s embedding. The remaining non-dropped-out word embeddings are scaled by $\frac{1}{1-p_{e}}$ where $p_{e}$ is the probability of embedding dropout. As the dropout occurs on the embedding matrix that is used for a full forward and backward pass, this means that all occurrences of a specific word will disappear within that pass, equivalent to performing variational dropout on the connection between the one-hot embedding and the embedding lookup.

Source: Merity et al, Regularizing and Optimizing LSTM Language Models

Source: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks


Paper Code Results Date Stars


Task Papers Share
Language Modelling 22 18.18%
General Classification 15 12.40%
Text Classification 13 10.74%
Sentiment Analysis 9 7.44%
Classification 7 5.79%
Test 5 4.13%
Language Identification 4 3.31%
Machine Translation 4 3.31%
Hate Speech Detection 3 2.48%


Component Type