Don’t throw away that linear head: Few-shot protein fitness prediction with generative models
Predicting the fitness, i.e. functional value, of a protein sequence is an important and challenging task in biology, particularly due to the scarcity of assay-labeled data. Traditional approaches utilize transfer learning from evolutionary data, yet discard useful information from a generative model’s learned probability distribution. We propose generative fitness fine-tuning, termed gf-tuning, to utilize the generative model’s log probabilities as logits for a pairwise ranking loss---allowing for the full distribution learned in unsupervised training to be repurposed for fine-tuning on assay-labeled fitness data. We demonstrate that gf-tuning achieves better performance than existing baselines across a variety of few-shot fitness prediction settings, including both low homology and highly epistatic systems as well as generalizing from single to multiple mutations. Generative fitness finetuning offers an effective strategy for few-shot fitness prediction which could enable advances to better understand and engineer proteins.
PDF Abstract