Attention models have been intensively studied to improve NLP tasks such as
machine comprehension via both question-aware passage attention model and
self-matching attention model. Our research proposes phase conductor
(PhaseCond) for attention models in two meaningful ways. First, PhaseCond, an
architecture of multi-layered attention models, consists of multiple phases
each implementing a stack of attention layers producing passage representations
and a stack of inner or outer fusion layers regulating the information flow.
Second, we extend and improve the dot-product attention function for PhaseCond
by simultaneously encoding multiple question and passage embedding layers from
different perspectives. We demonstrate the effectiveness of our proposed model
PhaseCond on the SQuAD dataset, showing that our model significantly
outperforms both state-of-the-art single-layered and multiple-layered attention
models. We deepen our results with new findings via both detailed qualitative
analysis and visualized examples showing the dynamic changes through
multi-layered attention models.