Learning Monotonic Alignments with Source-Aware GMM Attention

1 Jan 2021 · Tae Gyoon Kang, Ho-Gyeong Kim, Min-Joong Lee, Jihyun Lee, Seongmin Ok, Hoshik Lee, Young Sang Choi ·

Transformers with soft attention have been widely adopted to various sequence-to-sequence tasks. While the soft attention is effective for learning semantic similarities between queries and keys based on their contents, it does not explicitly model the order of elements in sequences which are crucial for monotonic sequence-to-sequence tasks. Learning monotonic alignments between input and output sequences could be beneficial for long-form and online inference applications that are still challenging problem for the conventional soft attention algorithm. In this paper, we focus on monotonic sequence-to-sequence task and propose Source-Aware Gaussian Mixture Model (SAGMM) attention in which the attention scores are monotonically calculated considering both the content and order of the source sequence. We show that the proposed attention mechanism solves the online and long-form speech recognition problems without performance degradation in offline in-distribution speech recognition through experiments.

PDF Abstract