SwiGLU is an activation function which is a variant of GLU. The definition is as follows:
$$ \text{SwiGLU}\left(x, W, V, b, c, \beta\right) = \text{Swish}_{\beta}\left(xW + b\right) \otimes \left(xV + c\right) $$
Source: GLU Variants Improve TransformerPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 3 | 10.00% |
Code Generation | 2 | 6.67% |
Multiple Choice Question Answering (MCQA) | 2 | 6.67% |
Multi-task Language Understanding | 2 | 6.67% |
Question Answering | 2 | 6.67% |
Large Language Model | 1 | 3.33% |
Quantization | 1 | 3.33% |
Arithmetic Reasoning | 1 | 3.33% |
Math Word Problem Solving | 1 | 3.33% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |