Learning Monotonic Alignments with Source-Aware GMM Attention

Transformers with soft attention have been widely adopted to various sequence-to-sequence tasks. While the soft attention is effective for learning semantic similarities between queries and keys based on their contents, it does not explicitly model the order of elements in sequences which are crucial for monotonic sequence-to-sequence tasks. Learning monotonic alignments between input and output sequences could be beneficial for long-form and online inference applications that are still challenging problem for the conventional soft attention algorithm. In this paper, we focus on monotonic sequence-to-sequence task and propose Source-Aware Gaussian Mixture Model (SAGMM) attention in which the attention scores are monotonically calculated considering both the content and order of the source sequence. We show that the proposed attention mechanism solves the online and long-form speech recognition problems without performance degradation in offline in-distribution speech recognition through experiments.

PDF Abstract


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here