Gated Linear Networks

# Gated Linear Network

Introduced by Veness et al. in Gated Linear Networks

A Gated Linear Network, or GLN, is a type of backpropagation-free neural architecture. What distinguishes GLNs from contemporary neural networks is the distributed and local nature of their credit assignment mechanism; each neuron directly predicts the target, forgoing the ability to learn feature representations in favor of rapid online learning. Individual neurons can model nonlinear functions via the use of data-dependent gating in conjunction with online convex optimization.

GLNs are feedforward networks composed of many layers of gated geometric mixing neurons as shown in the Figure . Each neuron in a given layer outputs a gated geometric mixture of the predictions from the previous layer, with the final layer consisting of just a single neuron. In a supervised learning setting, a $\mathrm{GLN}$ is trained on (side information, base predictions, label) triplets $\left(z_{t}, p_{t}, x_{t}\right)_{t=1,2,3, \ldots}$ derived from input-label pairs $\left(z_{t}, x_{t}\right)$. There are two types of input to neurons in the network: the first is the side information $z_{t}$, which can be thought of as the input features; the second is the input to the neuron, which will be the predictions output by the previous layer, or in the case of layer 0 , some (optionally) provided base predictions $p_{t}$ that typically will be a function of $z_{t} .$ Each neuron will also take in a constant bias prediction, which helps empirically and is essential for universality guarantees.

Weights are learnt in a Gated Linear Network using Online Gradient Descent (OGD) locally at each neuron. They key observation is that as each neuron $(i, k)$ in layers $i>0$ is itself a gated geometric mixture, all of these neurons can be thought of as individually predicting the target. Given side information $z$ , each neuron $(i, k)$ suffers a loss convex in its active weights $u:=w_{i k c_{i k}(z)}$ of $$\ell_{t}(u):=-\log \left(\operatorname{GEO}_{u}\left(x_{t} ; p_{i-1}\right)\right)$$

Source: Gated Linear Networks

#### Papers

Paper Code Results Date Stars