Miscellaneous Components
# Highway Layer

Introduced by Srivastava et al. in Highway Networks
#### Papers

#### Tasks

#### Usage Over Time

####
Categories

A **Highway Layer** contains an information highway to other layers that helps with information flow. It is characterised by the use of a gating unit to help this information flow.

A plain feedforward neural network typically consists of $L$ layers where the $l$th layer ($l \in ${$1, 2, \dots, L$}) applies a nonlinear transform $H$ (parameterized by $\mathbf{W_{H,l}}$) on its input $\mathbf{x_{l}}$ to produce its output $\mathbf{y_{l}}$. Thus, $\mathbf{x_{1}}$ is the input to the network and $\mathbf{y_{L}}$ is the network’s output. Omitting the layer index and biases for clarity,

$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W_{H}}\right) $$

$H$ is usually an affine transform followed by a non-linear activation function, but in general it may take other forms.

For a highway network, we additionally define two nonlinear transforms $T\left(\mathbf{x},\mathbf{W_{T}}\right)$ and $C\left(\mathbf{x},\mathbf{W_{C}}\right)$ such that:

$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W_{H}}\right)·T\left(\mathbf{x},\mathbf{W_{T}}\right) + \mathbf{x}·C\left(\mathbf{x},\mathbf{W_{C}}\right)$$

We refer to T as the transform gate and C as the carry gate, since they express how much of the output is produced by transforming the input and carrying it, respectively. In the original paper, the authors set $C = 1 − T$, giving:

$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W_{H}}\right)·T\left(\mathbf{x},\mathbf{W_{T}}\right) + \mathbf{x}·\left(1-T\left(\mathbf{x},\mathbf{W_{T}}\right)\right)$$

The authors set:

$$ T\left(x\right) = \sigma\left(\mathbf{W_{T}}^{T}\mathbf{x} + \mathbf{b_{T}}\right) $$

Image: Sik-Ho Tsang

Source: Highway NetworksPaper | Code | Results | Date | Stars |
---|

Task | Papers | Share |
---|---|---|

Speech Synthesis | 35 | 30.97% |

Text-To-Speech Synthesis | 12 | 10.62% |

Speech Recognition | 9 | 7.96% |

Language Modelling | 7 | 6.19% |

Speech Quality | 4 | 3.54% |

Expressive Speech Synthesis | 3 | 2.65% |

General Classification | 3 | 2.65% |

Question Answering | 3 | 2.65% |

Speaker Verification | 2 | 1.77% |