Protein Function Prediction
24 papers with code • 3 benchmarks • 2 datasets
For GO terms prediction, given the specific function prediction instruction and a protein sequence, models characterize the protein functions using the GO terms presented in three different domains (cellular component, biological process, and molecular function).
Latest papers
Structure-Informed Protein Language Model
To address this issue, we introduce the integration of remote homology detection to distill structural information into protein language models without requiring explicit protein structures as input.
Endowing Protein Language Models with Structural Knowledge
Drawing from recent advances in graph transformers, our approach refines the self-attention mechanisms of pretrained language transformers by integrating structural information with structure extractor modules.
Insights Into the Inner Workings of Transformer Models for Protein Function Prediction
Motivation: We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too.
Biomedical Knowledge Graph Embeddings with Negative Statements
Explicitly considering negative statements has been shown to improve performance on tasks such as entity summarization and question answering or domain-specific tasks such as protein function prediction.
Prot2Text: Multimodal Protein's Function Generation with GNNs and Transformers
These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate function prediction of existing as well as first-to-see proteins.
MD-HIT: Machine learning for materials property prediction with dataset redundancy control
This issue is well known in the field of bioinformatics for protein function prediction, in which a redundancy reduction procedure (CD-Hit) is always applied to reduce the sample redundancy by ensuring no pair of samples has a sequence similarity greater than a given threshold.
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields.
A Systematic Study of Joint Representation Learning on Protein Sequences and Structures
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Linear-scaling kernels for protein sequences and small molecules outperform deep learning while providing uncertainty quantitation and improved interpretability
We compare the performance of xGPR with the reported performance of various deep learning models on 20 benchmarks, including small molecule, protein sequence and tabular data.
Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
As opposed to scaling-up protein language models (PLMs), we seek improving performance via protein-specific optimization.