Introduced by Hermann et al. in Teaching Machines to Read and Comprehend

The Deep LSTM Reader is a neural network for reading comprehension. We feed documents one word at a time into a Deep LSTM encoder, after a delimiter we then also feed the query into the encoder. The model therefore processes each document query pair as a single long sequence. Given the embedded document and query the network predicts which token in the document answers the query.

The model consists of a Deep LSTM cell with skip connections from each input $x\left(t\right)$ to every hidden layer, and from every hidden layer to the output $y\left(t\right)$:

$$x'\left(t, k\right) = x\left(t\right)||y'\left(t, k - 1\right) \text{, } y\left(t\right) = y'\left(t, 1\right)|| \dots ||y'\left(t, K\right)$$

$$i\left(t, k\right) = \left(W_{kxi}x'\left(t, k\right) + W_{khi}h(t - 1, k) + W_{kci}c\left(t - 1, k\right) + b_{ki}\right)$$

$$f\left(t, k\right) = \left(W_{kxf}x\left(t\right) + W_{khf}h\left(t - 1, k\right) + W_{kcf}c\left(t - 1, k\right) + b_{kf}\right)$$

$$c\left(t, k\right) = f\left(t, k\right)c\left(t - 1, k\right) + i\left(t, k\right)\text{tanh}\left(W_{kxc}x'\left(t, k\right) + W_{khc}h\left(t - 1, k\right) + b_{kc}\right)$$

$$o\left(t, k\right) = \left(W_{kxo}x'\left(t, k\right) + W_{kho}h\left(t - 1, k\right) + W_{kco}c\left(t, k\right) + b_{ko}\right)$$

$$h\left(t, k\right) = o\left(t, k\right)\text{tanh}\left(c\left(t, k\right)\right)$$

$$y'\left(t, k\right) = W_{kyh}\left(t, k\right) + b_{ky}$$

where || indicates vector concatenation, $h\left(t, k\right)$ is the hidden state for layer $k$ at time $t$, and $i$, $f$, $o$ are the input, forget, and output gates respectively. Thus our Deep LSTM Reader is defined by $g^{\text{LSTM}}\left(d, q\right) = y\left(|d|+|q|\right)$ with input $x\left(t\right)$ the concatenation of $d$ and $q$ separated by the delimiter |||.

#### Papers

Paper Code Results Date Stars