Single Layers of Attention Suffice to Predict Protein Contacts

ICLR Workshop EBM 2021 · Nick Bhattacharya, Neil Thomas, Roshan Rao, Justas Daupras, Peter K Koo, David Baker, Yun S. Song, Sergey Ovchinnikov ·

The established approach to protein contact prediction frames the task as one of graph selection, extracting contacts by estimating the parameters of a Potts model. Another approach has recently appeared which leverages large pretrained Transformers, producing contacts by combining attention maps from various heads. In this work, we provide evidence that these approaches are not as different as they initially seem by providing a theoretical connection between attention and Potts models. To do so, we introduce a simplified attention model called \textit{factored attention}. On the one hand, factored attention is a direct simplification of multihead scaled dot-product attention in the Transformer. On the other hand, factored attention defines a valid Pairwise Markov Random Field and includes Potts models as a sparse special case. Examining factored attention allows us to explore the relative merits of each model class when learning contacts from an aligned protein family. We go on to empirically assess the performance of factored attention by training on a wide range of alignments of individual protein families. We further compare to a large pretrained Transformer trained on a corpus of unaligned protein sequences. We find that a single layer of attention is comparable to state-of-the-art Potts model at contact prediction. Taken together, these results provide motivation for training Transformers on large protein datasets.