Activation Functions

Swish

Introduced by Ramachandran et al. in Searching for Activation Functions

Swish is an activation function, $f(x) = x \cdot \text{sigmoid}(\beta x)$, where $\beta$ a learnable parameter. Nearly all implementations do not use the learnable parameter $\beta$, in which case the activation function is $x\sigma(x)$ ("Swish-1").

The function $x\sigma(x)$ is exactly the SiLU, which was introduced by other authors before the swish. See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the same activation function was experimented with later.

Source: Searching for Activation Functions

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Image Classification 79 12.64%
Object Detection 34 5.44%
Classification 29 4.64%
Semantic Segmentation 27 4.32%
General Classification 27 4.32%
Instance Segmentation 11 1.76%
Decoder 10 1.60%
Object 9 1.44%
Multi-Task Learning 9 1.44%

Categories