Improving Machine Translation by Searching Skip Connections Efficiently

1 Jan 2021 · Chen Yang, Houfeng Wang ·

As a widely used neural network model in NLP (neural language processing), transformer model achieves state-of-the-art performance in several translation tasks. Transformer model has a fixed skip connection architecture among different layers. However, the influence of other possible skip connection architectures are not discussed completely in transformer model. We search different architectures of skip connection to discover better architectures in different datasets. To improve the efficiency of trying different skip connection architectures, we apply the idea of network morphism to add skip connections as a procedure of fine-tuning. Our fine-tuning method outperforms the best models trained by the same or smaller datasets in WMT'16 En-De, WMT'14 En-Fr and WMT'18 En-De with 226M back-translation sentences. We also make experiment on transferring searched skip connection architectures to new transformer models.

PDF Abstract