Recently, neural machine translation has achieved remarkable progress by
introducing well-designed deep neural networks into its encoder-decoder
framework. From the optimization perspective, residual connections are adopted
to improve learning performance for both encoder and decoder in most of these
deep architectures, and advanced attention connections are applied as well.
Inspired by the success of the DenseNet model in computer vision problems, in
this paper, we propose a densely connected NMT architecture (DenseNMT) that is
able to train more efficiently for NMT. The proposed DenseNMT not only allows
dense connection in creating new features for both encoder and decoder, but
also uses the dense attention structure to improve attention quality. Our
experiments on multiple datasets show that DenseNMT structure is more
competitive and efficient.