Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization

NeurIPS Workshop AIPLANS 2021 · Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber ·

Despite successes across a broad range of applications, Transformers have limited capability in systematic generalization. The situation is especially frustrating for algorithmic tasks, where they often fail to find intuitive solutions that can be simply expressed in terms of attention patterns. Here we propose two modifications to the Transformer architecture, copy gate and geometric attention, which facilitate learning such intuitive and interpretable solutions to algorithmic problems. Our novel Transformer, called Transformer Control Flow (TCF) achieves 100% length generalization accuracy on the classic compositional table lookup task. The resulting attention and gating patterns are interpretable, demonstrating that the model implements adaptive control flow.

PDF Abstract