Swish is an activation function, $f(x) = x \cdot \text{sigmoid}(\beta x)$, where $\beta$ a learnable parameter. Nearly all implementations do not use the learnable parameter $\beta$, in which case the activation function is $x\sigma(x)$ ("Swish-1").
The function $x\sigma(x)$ is exactly the SiLU, which was introduced by other authors before the swish. See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the same activation function was experimented with later.
Source: Searching for Activation FunctionsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 46 | 15.65% |
Object Detection | 27 | 9.18% |
General Classification | 26 | 8.84% |
Semantic Segmentation | 16 | 5.44% |
Instance Segmentation | 8 | 2.72% |
Image Generation | 5 | 1.70% |
Multi-Task Learning | 5 | 1.70% |
Quantization | 4 | 1.36% |
Self-Supervised Learning | 4 | 1.36% |