Don’t throw away that linear head: Few-shot protein fitness prediction with generative models

29 Sep 2021  ·  Ben Krause, Nikhil Naik, Wenhao Liu, Ali Madani ·

Predicting the fitness, i.e. functional value, of a protein sequence is an important and challenging task in biology, particularly due to the scarcity of assay-labeled data. Traditional approaches utilize transfer learning from evolutionary data, yet discard useful information from a generative model’s learned probability distribution. We propose generative fitness fine-tuning, termed gf-tuning, to utilize the generative model’s log probabilities as logits for a pairwise ranking loss---allowing for the full distribution learned in unsupervised training to be repurposed for fine-tuning on assay-labeled fitness data. We demonstrate that gf-tuning achieves better performance than existing baselines across a variety of few-shot fitness prediction settings, including both low homology and highly epistatic systems as well as generalizing from single to multiple mutations. Generative fitness finetuning offers an effective strategy for few-shot fitness prediction which could enable advances to better understand and engineer proteins.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here