Embedding Dropout is equivalent to performing dropout on the embedding matrix at a word level, where the dropout is broadcast across all the word vector’s embedding. The remaining non-dropped-out word embeddings are scaled by $\frac{1}{1-p_{e}}$ where $p_{e}$ is the probability of embedding dropout. As the dropout occurs on the embedding matrix that is used for a full forward and backward pass, this means that all occurrences of a specific word will disappear within that pass, equivalent to performing variational dropout on the connection between the one-hot embedding and the embedding lookup.
Source: Merity et al, Regularizing and Optimizing LSTM Language Models
Source: A Theoretically Grounded Application of Dropout in Recurrent Neural NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 23 | 17.56% |
General Classification | 15 | 11.45% |
Text Classification | 14 | 10.69% |
Sentiment Analysis | 9 | 6.87% |
Classification | 8 | 6.11% |
Translation | 5 | 3.82% |
Language Identification | 4 | 3.05% |
Machine Translation | 4 | 3.05% |
Hate Speech Detection | 3 | 2.29% |