A Gated Linear Unit, or GLU computes:
$$ \text{GLU}\left(a, b\right) = a\otimes \sigma\left(b\right) $$
It is used in natural language processing architectures, for example the Gated CNN, because here $b$ is the gate that control what information from $a$ is passed up to the following layer. Intuitively, for a language modeling task, the gating mechanism allows selection of words or features that are important for predicting the next word. The GLU also has non-linear capabilities, but has a linear path for the gradient so diminishes the vanishing gradient problem.
Source: Language Modeling with Gated Convolutional NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 53 | 7.66% |
Question Answering | 45 | 6.50% |
Text Generation | 38 | 5.49% |
Machine Translation | 21 | 3.03% |
Pretrained Language Models | 19 | 2.75% |
Natural Language Understanding | 19 | 2.75% |
Abstractive Text Summarization | 17 | 2.46% |
Semantic Parsing | 16 | 2.31% |
Natural Language Inference | 13 | 1.88% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |