A Boom Layer is a type of feedforward layer that is closely related to the feedforward layers used in Transformers. The layer takes a vector of the form $v \in \mathbb{R}^{H}$ and uses a matrix multiplication with a GeLU activation to produce a vector $u \in \mathbb{R}^{N\times{H}}$. We then break $u$ into $N$ vectors and sum those together, producing $w \in \mathbb{R}^{H}$. This minimizes computation and removes an entire matrix of parameters compared to traditional down-projection layers.

The Figure to the right shows the Boom Layer used in the context of SHA-RNN from the original paper.

Source: Single Headed Attention RNN: Stop Thinking With Your Head


Language Modelling 1 100.00%