Topic-aware Contextualized Transformers

1 Jan 2021 · Ruiying Lu, Bo Chen, Dan dan Guo, Dongsheng Wang, Mingyuan Zhou ·

Training on disjoint fixed-length segments, Transformers successfully transform static word embeddings into contextualized word representations. However, they often restrict the context of a token to the segment it resides in and hence neglect the flow of contextual information across segments, failing to capture longer-term dependencies beyond the predefined segment length. This paper uses a probabilistic deep topic model to provide contextualized embeddings at both the token and segment levels. It also introduces topic self-attention and a contextual next-word embedding guided topic select-attention, injecting contextualized topic information into Transformer-based architectures. Moving beyond conventional Transformers that ignore longer-range word dependencies and contextualize their word representations at the segment level, the proposed method not only captures global semantic coherence of all segments and global word concurrence patterns, but also enriches the representation of each token by adapting it to its local context, which is not limited to the segment it resides in and can be flexibly defined according to the task. Experiments on various corpora show that adding only a few extra parameters, the proposed topic-aware contextualized transformers consistently outperform their conventional counterparts, and can be used to generate coherent sentences and paragraphs.

PDF Abstract