Squared ReLU Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**Squared ReLU** is an activation function used in the [Primer](https://paperswithcode.com/method/primer) architecture in the feedforward block of the [Transformer](https://paperswithcode.com/methods/category/transformers) layer. It is simply squared [ReLU](https://paperswithcode.com/method/relu) activations.

The effectiveness of higher order polynomials can also be observed in other effective [Transformer](https://paperswithcode.com/method/transformer) nonlinearities, such as [GLU](https://paperswithcode.com/method/glu) variants like [ReGLU](https://paperswithcode.com/method/reglu) and point-wise activations like [approximate GELU](https://paperswithcode.com/method/gelu). However, squared ReLU has drastically different asymptotics as $x \rightarrow \inf$ compared to the most commonly used activation functions: [ReLU](https://paperswithcode.com/method/relu), [GELU](https://paperswithcode.com/method/gelu) and [Swish](https://paperswithcode.com/method/swish). Squared ReLU does have significant overlap with ReGLU and in fact is equivalent when ReGLU’s $U$ and $V$ weight matrices are the same and squared ReLU is immediately preceded by a linear transformation with weight matrix $U$. This leads the authors to believe that squared ReLUs capture the benefits of these GLU variants, while being simpler, without additional parameters, and delivering better quality.

Code Snippet URL (optional):

Image

Currently: methods/8902e6bb-6605-4bba-8f82-182a2d9b2258.png Clear
Change:

Attached collections:

ACTIVATION FUNCTIONS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Language Modelling	4	30.77%
Protein Structure Prediction	1	7.69%
Sentiment Analysis	1	7.69%
Common Sense Reasoning	1	7.69%
Coreference Resolution	1	7.69%
Natural Language Inference	1	7.69%
Question Answering	1	7.69%
Text Classification	1	7.69%
Word Sense Disambiguation	1	7.69%

Squared ReLU

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove