Protein Design
47 papers with code • 2 benchmarks • 3 datasets
Formally, given the design requirements of users, models are required to generate protein amino acid sequences that align with those requirements.
Most implemented papers
ProGen2: Exploring the Boundaries of Protein Language Models
Attention-based models trained on protein sequences have demonstrated incredible success at classification and generation tasks relevant for artificial intelligence-driven protein design.
Learning from Protein Structure with Geometric Vector Perceptrons
Learning on 3D structures of large biomolecules is emerging as a distinct area in machine learning, but there has yet to emerge a unifying network architecture that simultaneously leverages the graph-structured and geometric aspects of the problem domain.
RITA: a Study on Scaling Up Generative Protein Sequence Models
In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1. 2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database.
Geometry-Complete Diffusion for 3D Molecule Generation and Optimization
Importantly, we demonstrate that the geometry-complete denoising process of GCDM learned for 3D molecule generation enables the model to generate a significant proportion of valid and energetically-stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn.
X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design
Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks.
Variational auto-encoding of protein sequences
Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function.
Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate Folding Landscape and Protein Structure Prediction
Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development.
TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation
In this work, we propose TaxDiff, a taxonomic-guided diffusion model for controllable protein sequence generation that combines biological species information with the generative capabilities of diffusion models to generate structurally stable proteins within the sequence space.
mGPfusion: Predicting protein stability changes with Gaussian process kernel learning and data fusion
We introduce a Bayesian data fusion model that re-calibrates the experimental and in silico data sources and then learns a predictive GP model from the combined data.
Conditioning by adaptive sampling for robust design
We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e. g., protein sequences) design space to a distribution over a property of interest (e. g. protein fluorescence).