Integrating Attention Feedback into the Recurrent Neural Network

29 Sep 2021  ·  Heng Li ·

In this paper, an improved long short-term memory (LSTM) structure, called hidden attention long short-term memory (HA-LSTM), is proposed to reduce the long-term memory loss when updating the LSTM neural network at each time step. The HA-LSTM structure is different from the standard LSTM structure because a scaled dot-product attention-based sliding controller is introduced to the LSTM structure. The design of the sliding attention controller provides the traditional attention mechanism with a time-varying property, which makes the attention mechanism more suitable for time-series analysis tasks. Traditionally, the inputs to the attention mechanism are the hidden state vectors at all time steps. In contrast, the length of the inputs to the sliding attention controller is range-limited, which means it uses less memory than the traditional attention mechanism. In addition, the HA-LSTM structure integrates the attention mechanism into the standard LSTM structure, which provides the structure with advantages over both the standard LSTM structure and the attention mechanism. Different from most works, which perform unilateral computations on the attention mechanism after collecting the hidden state vectors at all time steps, the information of the gate vectors and the cell state vector of the HA-LSTM structure are based on feedback from the attention mechanism. In other words, the HA-LSTM structure's feedback property helps the cell state vector retain valuable information. To evaluate the performance of the HA-LSTM structure, four text benchmark datasets are used in experiments for text classification tasks. The model presented here is compared with two classic models and with a state-of-the-art model presented in recent years. Most outcomes indicate that the HA-LSTM structure is superior to the other structures.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods