Protein Function Prediction
24 papers with code • 3 benchmarks • 2 datasets
For GO terms prediction, given the specific function prediction instruction and a protein sequence, models characterize the protein functions using the GO terms presented in three different domains (cellular component, biological process, and molecular function).
Most implemented papers
Multi-Scale Representation Learning on Proteins
This paper introduces a multi-scale graph construction of a protein -- HoloProt -- connecting surface to structure and sequence.
Robust deep learning based protein sequence design using ProteinMPNN
While deep learning has revolutionized protein structure prediction, almost all experimentally characterized de novo protein designs have been generated using physically based approaches such as Rosetta.
PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding
However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field.
Deep learning-based rapid generation of broadly reactive antibodies against SARS-CoV-2 and its Omicron variant
The COVID-19 pandemic has been ongoing for nearly two and half years, and new variants of concern (VOCs) of SARS-CoV-2 continue to emerge, which urges the development of broadly neutralizing antibodies.
Galactica: A Large Language Model for Science
We believe these results demonstrate the potential for language models as a new interface for science.
EurNet: Efficient Multi-Range Relational Modeling of Spatial Multi-Relational Data
We study EurNets in two important domains for image and protein structure modeling.
Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
As opposed to scaling-up protein language models (PLMs), we seek improving performance via protein-specific optimization.
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields.
MD-HIT: Machine learning for materials property prediction with dataset redundancy control
This issue is well known in the field of bioinformatics for protein function prediction, in which a redundancy reduction procedure (CD-Hit) is always applied to reduce the sample redundancy by ensuring no pair of samples has a sequence similarity greater than a given threshold.
Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers
These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate function prediction of existing as well as first-to-see proteins.