Protein Design

46 papers with code • 2 benchmarks • 3 datasets

Formally, given the design requirements of users, models are required to generate protein amino acid sequences that align with those requirements.

Most implemented papers

ProGen2: Exploring the Boundaries of Protein Language Models

salesforce/progen 27 Jun 2022

Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design.

Learning from Protein Structure with Geometric Vector Perceptrons

drorlab/gvp-pytorch ICLR 2021

Learning on 3D structures of large biomolecules is emerging as a distinct area in machine learning, but there has yet to emerge a unifying network architecture that simultaneously leverages the graph-structured and geometric aspects of the problem domain.

RITA: a Study on Scaling Up Generative Protein Sequence Models

lightonai/rita 11 May 2022

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1. 2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database.

Geometry-Complete Diffusion for 3D Molecule Generation and Optimization

bioinfomachinelearning/bio-diffusion 8 Feb 2023

However, such methods are unable to learn important geometric and physical properties of 3D molecules during molecular graph generation, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which negatively impacts their ability to effectively scale to datasets of large 3D molecules.

X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design

ericlbuehler/mistral.rs 11 Feb 2024

Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks.

Variational auto-encoding of protein sequences

samsinai/VAE_protein_function 9 Dec 2017

Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function.

Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate Folding Landscape and Protein Structure Prediction

mindspore-ai/mindscience 20 Aug 2022

Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development.

TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation

linzy19/taxdiff 27 Feb 2024

In this work, we propose TaxDiff, a taxonomic-guided diffusion model for controllable protein sequence generation that combines biological species information with the generative capabilities of diffusion models to generate structurally stable proteins within the sequence space.

mGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion

emmijokinen/mgpfusion 8 Feb 2018

We introduce a Bayesian data fusion model that re-calibrates the experimental and in silico data sources and then learns a predictive GP model from the combined data.

Conditioning by adaptive sampling for robust design

jacquesboitreaud/optimol 29 Jan 2019

We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e. g., protein sequences) design space to a distribution over a property of interest (e. g. protein fluorescence).