Accelerating protein engineering with fitness landscape modelling and reinforcement learning

Sun, Haoran; He, Liang; Deng, Pan; Liu, Guoqing; Zhao, Zhiyu; Jiang, Yuliang; Cao, Chuan; Ju, Fusong; Wu, Lijun; Liu, Haiguang; Qin, Tao; Liu, Tie-Yan

doi:10.1038/s42256-025-01103-w

Article
Published: 08 September 2025

Accelerating protein engineering with fitness landscape modelling and reinforcement learning

Nature Machine Intelligence volume 7, pages 1446–1460 (2025)Cite this article

2638 Accesses
2 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Protein engineering holds substantial promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited laboratory capacity constrains the discovery of optimal sequences. Here, to address this, we present the μProtein framework, which accelerates protein engineering by combining μFormer, a deep learning model for accurate mutational effect prediction, with μSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using μFormer as an oracle. μProtein leverages single-mutation data to predict optimal sequences with complex, multi-amino-acid mutations through its modelling of epistatic interactions and a multi-step search strategy. In addition to strong performance on benchmark datasets, μProtein identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, surpassing one of the highest-known activity levels, in wet laboratory, trained solely on single-mutation data. These results demonstrate μProtein’s capability to discover impactful mutations across the vast protein sequence space, offering a robust, efficient approach for protein optimization.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Quantitative comparison of μFormer with alternative mutational effect prediction approaches.**

**Fig. 3: Effective modelling of epistasis effects in high-order mutants.**

**Fig. 4: μFormer effectively identifies high-functioning variants with multi-point mutations.**

**Fig. 5: Quantitative comparison of μSearch with prevalent fitness landscape exploration algorithms.**

**Fig. 6: Design high-functioning sequences with μProtein.**

Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning

Article Open access 02 July 2024

Machine learning for functional protein design

Article 15 February 2024

Low-N protein engineering with data-efficient deep learning

Article 07 April 2021

Data availability

The FLIP benchmark is publicly available at http://data.bioembeddings.com/public/FLIP/. The ProteinGym v0.1 data collection can be accessed via Hugging Face at https://huggingface.co/datasets/OATML-Markslab/ProteinGym_v0.1. The dataset under our split scheme is available via figshare at https://doi.org/10.6084/m9.figshare.26892355 (ref. ⁶⁹). The list of ESBLs are available at http://bldb.eu/ (ref. ⁴⁴). The fitness scores for β-lactamase variants against cefotaxime, validated by our wet-laboratory experiments, are provided in the Supplementary Information. Source data are provided with this paper.

Code availability

The code to reproduce the results of this paper is publicly available via GitHub at https://github.com/microsoft/Mu-Protein and via Zenodo at https://doi.org/10.5281/zenodo.15836168 (ref. ⁷⁶). The data-split scheme is available via figshare at https://doi.org/10.6084/m9.figshare.26892355 (ref. ⁶⁹).

References

Miton, C. M. & Tokuriki, N. Insertions and deletions (indels): a missing piece of the protein engineering jigsaw. Biochemistry 62, 148–157 (2023).
Article Google Scholar
Gray, V. E., Hause, R. J., Luebeck, J., Shendure, J. & Fowler, D. M. Quantitative missense variant effect prediction using large-scale mutagenesis data. Cell Syst. 6, 116–124 (2018).
Article Google Scholar
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).
Article Google Scholar
Biswas, S., Khimulya, G., Alley, E. C., Esvelt, K. M. & Church, G. M. Low-n protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021).
Article Google Scholar
Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proc. 39th International Conference on Machine Learning 16990–17017 (PMLR, 2022).
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Article Google Scholar
Weile, J. & Roth, F. P. Multiplexed assays of variant effects contribute to a growing genotype–phenotype atlas. Hum. Genet. 137, 665–678 (2018).
Article Google Scholar
Wittmund, M., Cadet, F. & Davari, M. D. Learning epistasis and residue coevolution patterns: current trends and future perspectives for advancing enzyme engineering. ACS Catal. 12, 14243–14263 (2022).
Article Google Scholar
Judge, A. et al. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc. Natl Acad. Sci. USA 121, e2313513121 (2024).
Article Google Scholar
Gelman, S., Fahlberg, S. A., Heinzelman, P., Romero, P. A. & Gitter, A. Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proc. Natl Acad. Sci. USA 118, e2104878118 (2021).
Article Google Scholar
Kim, H. Y. & Kim, D. Prediction of mutation effects using a deep temporal convolutional network. Bioinformatics 36, 2047–2052 (2020).
Article Google Scholar
Shanehsazzadeh, A., Belanger, D. & Dohan, D. Is transfer learning necessary for protein landscape prediction? In Proc. Machine Learning for Structural Biology Workshop in the Thirty-Fourth Annual Conference on Neural Information Processing Systems (NeurIPS) https://www.mlsb.io/papers/MLSB2020_Is_Transfer_Learning_Necessary.pdf (2020).
Yang, K. K., Lu, A. X. & Fusi, N. Convolutions are competitive with transformers for protein sequence pretraining. Cell System. 15, 286–294.e2 (2024).
Article Google Scholar
Luo, Y. et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun. 12, 5743 (2021).
Article Google Scholar
Hsu, C., Nisonoff, H., Fannjiang, C. & Listgarten, J. Learning protein fitness models from evolutionary and assay-labeled data. Nat. Biotechnol. 40, 1114–1122 (2022).
Sim, N.-L. et al. Sift web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).
Article Google Scholar
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).
Article Google Scholar
Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4171–4186 (Association for Computational Linguistics, 2019).
Floridi, L. & Chiriatti, M. GPT-3: its nature, scope, limits, and consequences. Minds Mach. 30, 681–694 (2020).
Article Google Scholar
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Article Google Scholar
Hie, B., Zhong, E. D., Berger, B. & Bryson, B. Learning the language of viral evolution and escape. Science 371, 284–288 (2021).
Article MathSciNet Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article MathSciNet Google Scholar
Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 34, 29287–29303 (2021).
Google Scholar
Suzek, B. E. et al. Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
Article Google Scholar
He, L. et al. Pre-training co-evolutionary protein representation via a pairwise masked language model. Preprint at https://arxiv.org/abs/2110.15527 (2021).
Dallago, C. et al. FLIP: benchmark tasks in fitness landscape inference for proteins. Preprint at bioRxiv https://doi.org/10.1101/2021.11.09.467890 (2021).
Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. 41, 275–283 (2023).
Lu, H. et al. Machine learning-aided engineering of hydrolases for pet depolymerization. Nature 604, 662–667 (2022).
Article Google Scholar
Hughes, D. & Andersson, D. I. Evolutionary trajectories to antibiotic resistance. Annu. Rev. Microbiol. 71, 579–596 (2017).
Article Google Scholar
Wang, X., Zhang, H. & Chen, X. Drug resistance and combating drug resistance in cancer. Cancer Drug Resist. 2, 141 (2019).
Google Scholar
Notin, P., Weitzman, R., Marks, D. & Gal, Y. ProteinNPT: improving protein property prediction and design with non-parametric transformers. Adv. Neural Inf. Process. Syst. 36, 33529–33563 (2023).
Google Scholar
Zhao, J., Zhang, C. & Luo, Y. Contrastive fitness learning: reprogramming protein language models for low-n learning of protein fitness landscape. In Proc. 32nd International Conference on Pattern Recognition 470–474 (Springer, 2024).
Bryant, D. H. et al. Deep diversification of an aav capsid protein by machine learning. Nat. Biotechnol. 39, 691–696 (2021).
Article Google Scholar
Jacquier, H. et al. Capturing the mutational landscape of the beta-lactamase TEM-1. Proc. Natl Acad. Sci. USA 110, 13067–13072 (2013).
Article Google Scholar
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Article Google Scholar
Seuma, M., Faure, A. J., Badia, M., Lehner, B. & Bolognesi, B. The genetic landscape for amyloid beta fibril nucleation accurately discriminates familial Alzheimer’s disease mutations. eLife 10, e63364 (2021).
Article Google Scholar
Faure, A. J. et al. Mapping the energetic and allosteric landscapes of protein binding domains. Nature 604, 175–183 (2022).
Article Google Scholar
Araya, C. L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl Acad. Sci. USA 109, 16858–16863 (2012).
Article Google Scholar
Melamed, D., Young, D. L., Gamble, C. E., Miller, C. R. & Fields, S. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA 19, 1537–1551 (2013).
Article Google Scholar
Olson, C. A., Wu, N. C. & Sun, R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24, 2643–2651 (2014).
Article Google Scholar
Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016).
Article Google Scholar
Sailer, Z. R. & Harms, M. J. High-order epistasis shapes evolutionary trajectories. PLoS Comput. Biol. 13, e1005541 (2017).
Article Google Scholar
Naas, T. et al. Beta-lactamase database (BLDB)–structure and function. J. Enzyme Inhib. Med. Chem. 32, 917–919 (2017).
Article Google Scholar
Stiffler, M. A., Hekstra, D. R. & Ranganathan, R. Evolvability as a function of purifying selection in TEM-1 β-lactamase. Cell 160, 882–892 (2015).
Article Google Scholar
Weinreich, D. M., Delaney, N. F., DePristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006).
Article Google Scholar
Sinai, S. et al. AdaLead: a simple and robust adaptive greedy search algorithm for sequence design. Preprint at https://arxiv.org/abs/2010.02141 (2020).
Barrera, L. A. et al. Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 351, 1450–1454 (2016).
Article Google Scholar
Chaudhury, S., Lyskov, S. & Gray, J. J. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics 26, 689–691 (2010).
Article Google Scholar
Ogden, P. J., Kelsic, E. D., Sinai, S. & Church, G. M. Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366, 1139–1143 (2019).
Article Google Scholar
Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Article Google Scholar
Angermueller, C. et al. Model-based reinforcement learning for biological sequence design. In Proc. International Conference on Learning Representations https://openreview.net/forum?id=HklxbgBKvr (ICLR, 2020).
Brookes, D., Park, H. & Listgarten, J. Conditioning by adaptive sampling for robust design. In Proc. 36th International Conference on Machine Learning 773–782 (PMLR, 2019).
Hansen, N. The CMA evolution strategy: a tutorial. Preprint at https://arxiv.org/abs/1604.00772 (2016).
Kirjner, A. et al. Improving protein optimization with smoothed fitness landscapes. In Proc. International Conference on Learning Representations https://openreview.net/forum?id=rxlF2Zv8x0 (ICLR, 2024).
Wang, Y. et al. Self-play reinforcement learning guides protein engineering. Nat. Mach. Intell. 5, 845–860 (2023).
Article Google Scholar
Wittmann, B. J., Yue, Y. & Arnold, F. H. Informed training set design enables efficient machine learning-assisted directed protein evolution. Cell Syst. 12, 1026–1045 (2021).
Article Google Scholar
Qiu, Y., Hu, J. & Wei, G.-W. Cluster learning-assisted directed evolution. Nat. Comput. Sci. 1, 809–818 (2021).
Article Google Scholar
De Visser, J. A. G. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet. 15, 480–490 (2014).
Article Google Scholar
Tokuriki, N. & Tawfik, D. S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 19, 596–604 (2009).
Article Google Scholar
Lenski, R. E., Barrick, J. E. & Ofria, C. Balancing robustness and evolvability. PLoS Biol. 4, e428 (2006).
Article Google Scholar
Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29, 1–2 (2022).
Article Google Scholar
Wu, L. et al. SPRoBERTa: protein embedding learning with local fragment modeling. Brief. Bioinform. 23, bbac365 (2022).
Vaswani, A. et al. Attention is all you need. In Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Hie, B., Zhong, E., Bryson, B. & Berger, B. Learning mutational semantics. Adv. Neural Inf. Process. Syst. 33, 9109–9121 (2020).
Google Scholar
Bileschi, M. L. et al. Using deep learning to annotate the protein universe. Nat. Biotechnol. 40, 932–937 (2022).
Kulmanov, M. & Hoehndorf, R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 36, 422–429 (2020).
Article Google Scholar
Elnaggar, A. et al. ProtTrans: towards cracking the language of life’s code through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2021).
Article Google Scholar
He, L., Deng, P. & Liu, G. μFormer encoder. figshare https://doi.org/10.6084/m9.figshare.26892355 (2024).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).
OpenAI. Introducing ChatGPT; https://openai.com/blog/chatgpt
Ouyang, L. et al. Training language models to follow instructions with human feedback. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf (2022).
Firnberg, E., Labonte, J. W., Gray, J. J. & Ostermeier, M. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol. Biol. Evol. 31, 1581–1592 (2014).
Article Google Scholar
Gonzalez, C. E. & Ostermeier, M. Pervasive pairwise intragenic epistasis among sequential mutations in tem-1 β-lactamase. J. Mol. Biol. 431, 1981–1992 (2019).
Article Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article Google Scholar
He, L., Deng, P. & Liu, G. microsoft/Mu-Protein: MuProtein 0.1.1. Zenodo https://doi.org/10.5281/zenodo.15836168 (2025).

Download references

Acknowledgements

We gratefully acknowledge M. Ostermeier for generously providing detailed information on the plasmids pSkunk3-TEM-1 and pTS42, and for sharing the pTS42 plasmid itself. We also extend our thanks to T. Peng for developing the demonstration webpage and J. Bai for creating the graphic illustrations (the copyright for which is held by Microsoft as the work was completed during her employment). Last, we appreciate B. Kruft and U. Munir for their invaluable support in program management and coordination.

Author information

These authors contributed equally: Haoran Sun, Liang He, Pan Deng, Guoqing Liu.

Authors and Affiliations

Microsoft Research AI for Science, Beijing, China
Haoran Sun, Liang He, Pan Deng, Guoqing Liu, Chuan Cao, Fusong Ju, Lijun Wu, Haiguang Liu & Tao Qin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhiyu Zhao
Tsinghua University, Haidian, China
Yuliang Jiang
Zhongguancun Academy, Haidian, China
Tie-Yan Liu

Authors

Haoran Sun
View author publications
Search author on:PubMed Google Scholar
Liang He
View author publications
Search author on:PubMed Google Scholar
Pan Deng
View author publications
Search author on:PubMed Google Scholar
Guoqing Liu
View author publications
Search author on:PubMed Google Scholar
Zhiyu Zhao
View author publications
Search author on:PubMed Google Scholar
Yuliang Jiang
View author publications
Search author on:PubMed Google Scholar
Chuan Cao
View author publications
Search author on:PubMed Google Scholar
Fusong Ju
View author publications
Search author on:PubMed Google Scholar
Lijun Wu
View author publications
Search author on:PubMed Google Scholar
Haiguang Liu
View author publications
Search author on:PubMed Google Scholar
Tao Qin
View author publications
Search author on:PubMed Google Scholar
Tie-Yan Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: L.H., H.S. and P.D. Methodology and modelling: L.H., H.S., G.L., Z.Z., Y.J., F.J. and L.W. Data curation: L.H., H.S. and P.D. Result interpretation: P.D., H.L., L.H. and C.C. Writing—original draft: L.H., P.D., H.S. and G.L. Writing—review: H.L., T.Q. and C.C. Supervision: T.-Y.L. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Liang He, Pan Deng, Haiguang Liu or Tao Qin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Yuchi Qiu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Ablation and analysis on μFormer components.

a) Ablation study to evaluate the importance of each component in μFormer. The change in performance after removing various components from the model relative to a full model is shown. Negative numbers (blue) indicate a loss of performance and positive numbers (red) indicate an improvement in performance. The last row displays the average performance change over 9 proteins. The plus/minus signs at the bottom indicate the presence/removal of the corresponding component. b) Spearman ρ statistics on 3 FLIP GB1 datasets of μFormer, ECNet, and their variants. ECNet w/ μFormer encoder replaces the language model in ECNet with μFormer’s language model. μFormer-S (Methods) is a variation with a model size similar to ECNet. 1-vs-rest: a train-test split where single-point mutants are used for training, and multi-point mutants are reserved for testing. 2-vs-rest: a train-test split where single- and double-point mutants are used for training, and all higher-order mutants are reserved for testing. 3-vs-rest: a train-test split where single-, double-, and triple-point mutants are used for training, and all higher-order mutants are reserved for testing. See Supplementary Notes for details.

Extended Data Fig. 2 Analysis of μProtein.

a) Performance of μFormer and Ridge on GB1 double mutants with varying training data size. Here, μFormer is a μFormer variation with a smaller supervised scorer module size (μFormer-SS). Training data ratio indicates the number of residues used for training versus the total number of amino acids in GB1. The training data size equals 209, 418, 627, 836, and 1045 for 20%, 40%, 60%, 80%, and 100%, respectively. All scores were evaluated on GB1 saturated double mutants (n=535,917). Center: mean. Error bands: standard deviation. Five experiments are performed for each setting with random selection on training data. b) Illustration of test data split, using a protein of 10 residues and the 40% setting as an example. 2/2 unseen: neither of the mutated residues in double mutants are seen by the model. 1/2 unseen: one and only one of the mutated residues in double mutants are seen by the model. c) Performance of μFormer and Ridge on different splits of GB1 double mutants. Training data split criteria are the same as in a). Center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. Five experiments are performed for each setting with random selection on training data.

Extended Data Table 1 Size of Training (Single-point mutants) and Test (Multi-point mutants) Datasets in One-to-Multi Setting

Full size table

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Figs. 1–11 and Tables 1–3.

Reporting Summary

Supplementary Data 1

Source data for Supplementary Figs. 1, 3, 5, 8, 9 and 11.

Supplementary Table 4

Source data for valid experiment results.

Source data

Source Data Figs. 2–6 and Extended Data Figs. 1 and 2

Source data for Figs. 2–6 and Extended Data Figs. 1 and 2.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, H., He, L., Deng, P. et al. Accelerating protein engineering with fitness landscape modelling and reinforcement learning. Nat Mach Intell 7, 1446–1460 (2025). https://doi.org/10.1038/s42256-025-01103-w

Download citation

Received: 01 November 2024
Accepted: 31 July 2025
Published: 08 September 2025
Issue date: September 2025
DOI: https://doi.org/10.1038/s42256-025-01103-w

This article is cited by

An integrated framework to accelerate protein design through mutagenesis
- Yuchi Qiu
Nature Machine Intelligence (2025)