Highway Layer Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

A **Highway Layer** contains an information highway to other layers that helps with information flow. It is characterised by the use of a gating unit to help this information flow.

A plain feedforward neural network typically consists of $L$ layers where the $l$th layer ($l \in ${$1, 2, \dots, L$}) applies a nonlinear transform $H$ (parameterized by $\mathbf{W\_{H,l}}$) on its input $\mathbf{x\_{l}}$ to produce its output $\mathbf{y\_{l}}$. Thus, $\mathbf{x\_{1}}$ is the input to the network and $\mathbf{y\_{L}}$ is the network’s output. Omitting the layer index and biases for clarity,

$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W\_{H}}\right) $$

$H$ is usually an affine transform followed by a non-linear activation function, but in general it may take other forms.

For a [highway network](https://paperswithcode.com/method/highway-network), we additionally define two nonlinear transforms $T\left(\mathbf{x},\mathbf{W\_{T}}\right)$ and $C\left(\mathbf{x},\mathbf{W\_{C}}\right)$ such that:

$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W\_{H}}\right)·T\left(\mathbf{x},\mathbf{W\_{T}}\right) + \mathbf{x}·C\left(\mathbf{x},\mathbf{W\_{C}}\right)$$

We refer to T as the transform gate and C as the carry gate, since they express how much of the output is produced by transforming the input and carrying it, respectively. In the original paper, the authors set $C = 1 − T$, giving:

$$ \mathbf{y} = H\left(\mathbf{x},\mathbf{W\_{H}}\right)·T\left(\mathbf{x},\mathbf{W\_{T}}\right) + \mathbf{x}·\left(1-T\left(\mathbf{x},\mathbf{W\_{T}}\right)\right)$$

The authors set:

$$ T\left(x\right) = \sigma\left(\mathbf{W\_{T}}^{T}\mathbf{x} + \mathbf{b\_{T}}\right) $$

Image: [Sik-Ho Tsang](https://towardsdatascience.com/review-highway-networks-gating-function-to-highway-image-classification-5a33833797b5)

Code Snippet URL (optional):

Image

Currently: methods/Screen_Shot_2020-07-01_at_10.15.32_PM_QjUnnkM.png Clear
Change:

Attached collections:

MISCELLANEOUS COMPONENTS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Speech Synthesis	43	27.56%
Text-To-Speech Synthesis	15	9.62%
Speech Recognition	10	6.41%
Language Modelling	8	5.13%
Sentence	6	3.85%
Voice Cloning	5	3.21%
Voice Conversion	4	2.56%
Translation	3	1.92%
Expressive Speech Synthesis	3	1.92%

Highway Layer

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove