Swish

Introduced by Ramachandran et al. in Searching for Activation Functions

Swish is an activation function, $f(x) = x \cdot \text{sigmoid}(\beta x)$, where $\beta$ a learnable parameter. Nearly all implementations do not use the learnable parameter $\beta$, in which case the activation function is $x\sigma(x)$ ("Swish-1").

The function $x\sigma(x)$ is exactly the SiLU, which was introduced by other authors before the swish. See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the same activation function was experimented with later.

Source: Searching for Activation Functions

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Classification	74	13.43%
Object Detection	34	6.17%
General Classification	27	4.90%
Classification	25	4.54%
Semantic Segmentation	24	4.36%
Instance Segmentation	11	2.00%
Decoder	9	1.63%
Multi-Task Learning	9	1.63%
Quantization	7	1.27%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Sigmoid Activation	Activation Functions

Categories

Add Remove

Activation Functions

Adaptive Activation Functions