Activation Functions

Gaussian Error Linear Units

Introduced by Hendrycks et al. in Gaussian Error Linear Units (GELUs)

The Gaussian Error Linear Unit, or GELU, is an activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their percentile, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). Consequently the GELU can be thought of as a smoother ReLU.

$$\text{GELU}\left(x\right) = x{P}\left(X\leq{x}\right) = x\Phi\left(x\right) = x \cdot \frac{1}{2}\left[1 + \text{erf}(x/\sqrt{2})\right],$$ if $X\sim \mathcal{N}(0,1)$.

One can approximate the GELU with $0.5x\left(1+\tanh\left[\sqrt{2/\pi}\left(x + 0.044715x^{3}\right)\right]\right)$ or $x\sigma\left(1.702x\right),$ but PyTorch's exact implementation is sufficiently fast such that these approximations may be unnecessary. (See also the SiLU $x\sigma(x)$ which was also coined in the paper that introduced the GELU.)

GELUs are used in GPT-3, BERT, and most other Transformers.

Source: Gaussian Error Linear Units (GELUs)

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Retrieval 75 9.08%
Language Modelling 69 8.35%
Question Answering 48 5.81%
Large Language Model 38 4.60%
Sentence 26 3.15%
Text Generation 24 2.91%
In-Context Learning 22 2.66%
Information Retrieval 16 1.94%
Prompt Engineering 16 1.94%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories