 Gated Linear Networks

# Gaussian Gated Linear Network

Introduced by Budden et al. in Gaussian Gated Linear Networks

Gaussian Gated Linear Network, or G-GLN, is a multi-variate extension to the recently proposed GLN family of deep neural networks by reformulating the GLN neuron as a gated product of Gaussians. This Gaussian Gated Linear Network (G-GLN) formulation exploits the fact that exponential family densities are closed under multiplication, a property that has seen much use in Gaussian Process and related literature. Similar to the Bernoulli GLN, every neuron in the G-GLN directly predicts the target distribution.

Precisely, a G-GLN is a feed-forward network of data-dependent distributions. Each neuron calculates the sufficient statistics $\left(\mu, \sigma_{2}\right)$ for its associated PDF using its active weights, given those emitted by neurons in the preceding layer. It consists of consists of $L+1$ layers indexed by $i \in{0, \ldots, L}$ with $K_{i}$ neurons in each layer. The weight space for a neuron in layer $i$ is denoted by $\mathcal{W}_{i}$; the subscript is needed since the dimension of the weight space depends on $K_{i-1}$. Each neuron/distribution is indexed by its position in the network when laid out on a grid; for example, $f_{i k}$ refers to the family of PDFs defined by the $k$ th neuron in the $i$ th layer. Similarly, $c_{i k}$ refers to the context function associated with each neuron in layers $i \geq 1$, and $\mu_{i k}$ and $\sigma_{i k}^{2}$ (or $\Sigma_{i k}$ in the multivariate case) referring to the sufficient statistics for each Gaussian PDF.

There are two types of input to neurons in the network. The first is the side information, which can be thought of as the input features, and is used to determine the weights used by each neuron via half-space gating. The second is the input to the neuron, which is the PDFs output by the previous layer, or in the case of layer 0, some provided base models. To apply a G-GLN in a supervised learning setting, we need to map the sequence of input-label pairs $\left(x_{t}, y_{t}\right)$ for $t=1,2, \ldots$ onto a sequence of (side information, base Gaussian PDFs, label) triplets $\left(z_{t},\left(f_{0 i}\right)_{i}, y_{t}\right)$. The side information $z_{t}$ is set to the (potentially normalized) input features $x_{t}$. The Gaussian PDFs for layer 0 will generally include the necessary base Gaussian PDFs to span the target range, and optionally some base prediction PDFs that capture domain-specific knowledge.

#### Papers

Paper Code Results Date Stars