Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Generalist biological artificial intelligence in modeling the language of life

Abstract

Generalist biological artificial intelligence (GBAI) represents a transformative approach to modeling the ‘language of life’—the flow of information from DNA to cellular function. This Review synthesizes rapid advances in biological AI to interpret and generate DNA, RNA, proteins and cellular systems. We chart a course toward comprehensive systems that can concurrently process and predict across these domains, performing several critical biological tasks simultaneously. Substantial opportunities lie in synergizing language and structural AI, leveraging specialized models and improving AI agents for autonomous discovery. After addressing challenges in data, biological complexity, scalability and experimental validation, GBAI has the potential to deepen our understanding of disease pathways and biomarkers, advance automated therapeutic design and evaluation, and integrate within virtual cells to meaningfully simulate biological activity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Vision for GBAI.
Fig. 2: Applications of biological AI across different dimensions of cellular processing.
Fig. 3: Overview of the challenges faced by current biological AI algorithms.

Similar content being viewed by others

References

  1. Crick, F. Central dogma of molecular biology. Nature 227, 561–563 (1970).

    Article  CAS  PubMed  Google Scholar 

  2. Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).

  3. Naveed, H. et al. A comprehensive overview of large language models. ACM Trans. Intell. Syst. Technol. 16, 106 (2025).

    Article  Google Scholar 

  4. Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M. & Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2008).

    Article  PubMed  Google Scholar 

  5. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).

    Google Scholar 

  6. Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).

    Article  CAS  PubMed  Google Scholar 

  7. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  Google Scholar 

  8. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Jiang, K. et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 387, eadr6006 (2025).

    Article  CAS  PubMed  Google Scholar 

  10. Brixi, G., Durrant, M.G., Ku, J. et al. Genome modelling and design across all domains of life with Evo 2. Nature https://doi.org/10.1038/s41586-026-10176-5 (2026).

  11. Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Zou, S. et al. A large-scale foundation model for RNA function and structure prediction. Preprint at bioRxiv https://doi.org/10.1101/2024.11.28.625345 (2024).

  13. Dalla-Torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025).

    Article  CAS  PubMed  Google Scholar 

  14. Zhang, Z., Shen, W. X., Liu, Q. & Zitnik, M. Efficient generation of protein pockets with PocketGen. Nat. Mach. Intell. 6, 1382–1395 (2024).

    Article  Google Scholar 

  15. Ying, K. et al. MethylGPT: a foundation model for the DNA methylome. Preprint at bioRxiv https://doi.org/10.1101/2024.10.30.621013 (2024).

  16. Bunne, C. et al. How to build the virtual cell with artificial intelligence: priorities and opportunities. Cell 187, 7045–7063 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Song, L., Segal, E. & Xing, E. Toward AI-driven digital organism: multiscale foundation models for predicting, simulating and programming biology at all levels. Prerpint at arXiv https://arxiv.org/abs/2412.06993 (2024).

  18. He, Y. et al. Generalized biological foundation model with unified nucleic acid and protein language. Nat. Mach. Intell. 7, 942–953 (2025).

    Article  Google Scholar 

  19. Bank, P. D. Protein Data Bank. Nat. New Biol. 233, 223 (1971).

    Article  Google Scholar 

  20. Topol, E. J. Learning the language of life with AI. Science 387, eadv4414 (2025).

    Article  PubMed  Google Scholar 

  21. Merchant, A. T. et al. Semantic design of functional de novo genes from a genomic language model. Nature 649, 749–758 (2026).

    Article  CAS  PubMed  Google Scholar 

  22. King, S. H. et al. Generative design of novel bacteriophages with genome language models. Preprint at bioRxiv https://doi.org/10.1101/2025.09.12.675911 (2025).

  23. Zhou, Z. et al. DNABERT-S: Pioneering species differentiation with species-aware DNA embeddings. Bioinformatics 41, i255–i264 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Ellington, C. N. et al. Accurate and general DNA representations emerge from genome foundation models at scale. Preprint at bioRxiv https://doi.org/10.1101/2024.12.01.625444 (2024).

  25. Zhao, Q., Zhang, C. & Zhang, W. dnaGrinder: a lightweight and high-capacity genomic foundation model. Preprint at arXiv https://arxiv.org/abs/2409.15697 (2024).

  26. Benegas, G., Albors, C., Aw, A. J., Ye, C. & Song, Y. S. A DNA language model based on multispecies alignment predicts the effects of genome-wide variants. Nat. Biotechnol. 43, 1960–1965 (2025).

    Article  CAS  PubMed  Google Scholar 

  27. Saberi, A. et al. A long-context RNA foundation model for predicting transcriptome architecture. Preprint at bioRxiv https://doi.org/10.1101/2024.08.26.609813 (2024).

  28. Tahmid, M. T. et al. BiRNA-BERT allows efficient RNA language modeling with adaptive tokenization. Commun. Biol. 8, 1621 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Yu, H. et al. An interpretable RNA foundation model for exploring functional RNA motifs in plants. Nat. Mach. Intell. 6, 1616–1625 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Yang, H. & Li, K. M. MP-RNA: unleashing multi-species RNA foundation model via calibrated secondary structure prediction. In Findings of the Association for Computational Linguistics: EMNLP 2024 (eds Al-Onaizan, Y. et al.) 5278–5296 (Association for Computational Linguistics, 2024).

  31. Chen, J. et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. Preprint at arXiv https://arxiv.org/abs/2204.00300 (2022).

  32. Zhang, Z. et al. RNAGenesis: foundation model for enhanced RNA sequence generation and structural insights. Preprint at bioRxiv https://doi.org/10.1101/2024.12.30.630826 (2024).

  33. De Lima Camillo, L. P. et al. CpGPT: a foundation model for DNA methylation. Preprint at bioRxiv https://doi.org/10.1101/2024.10.24.619766 (2024).

  34. Zhou, H. et al. A foundation language model to decipher diverse regulation of RNAs. Genome Biol. 26, 301 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 57, 949–961 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Fu, X. et al. A foundation model of transcription across human cell types. Nature 637, 965–973 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978 (2023).

    Article  CAS  PubMed  Google Scholar 

  38. Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Peng, F. Z. et al. PTM-Mamba: a PTM-aware protein language model with bidirectional gated Mamba blocks. Nat. Methods 22, 945–949 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Zhang, Y., Bian, B. & Okumura, M. Hyena architecture enables fast and efficient protein language modeling. IMetaOmics 2, e45 (2025).

    Article  CAS  PubMed  Google Scholar 

  41. Zhuo, L. et al. ProtLLM: an interleaved protein-language LLM with protein-as-word pre-training. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics Vol. 1 (eds Ku, L.-W. et al.) 8950–8963 (Association for Computational Linguistics, 2024).

  42. Xu, M., Yuan, X., Miret, S. & Tang, J. ProtST: multi-modality learning of protein sequences and biomedical texts. In Proc. 40th Int. Conf. Machine Learning (eds Krause, A. et al.) 38749–38767 (PMLR, 2023).

  43. Queen, O. et al. ProCyon: a multimodal foundation model for protein phenotypes. Preprint at bioRxiv https://doi.org/10.1101/2024.12.10.627665 (2025).

  44. Consortium UniProt. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).

    Article  Google Scholar 

  45. Xiong, P., Xu, H. & Zheng, H. Supervised contrastive learning leads to more reasonable spectral embeddings. Anal. Chem. 97, 20137–20146 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Zhang, H. et al. MSBERT: embedding tandem mass spectra into chemically rational space by mask learning and contrastive learning. Anal. Chem. 96, 16599–16608 (2024).

    Article  CAS  PubMed  Google Scholar 

  47. Huber, F. et al. Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships. PLoS Comput. Biol. 17, e1008724 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Bushuiev, R. et al. Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02663-3 (2025).

  49. Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).

    Article  CAS  PubMed  Google Scholar 

  50. Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 21, 1481–1491 (2024).

    Article  CAS  PubMed  Google Scholar 

  51. Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kalfon, J., Samaran, J., Peyré, G. & Cantini, L. scPRINT: pre-training on 50 million cells allows robust gene network predictions. Nat. Commun. 16, 3607 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Rizvi, S. A. et al. Scaling large language models for next-generation single-cell analysis. Preprint at bioRxiv https://doi.org/10.1101/2025.04.14.648850 (2025).

  54. Wen, H. et al. Single cells are spatial tokens: transformers for spatial transcriptomic data imputation. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.03038 (2023).

  55. Hao, M. et al. GeST: towards building a generative pretrained transformer for learning cellular spatial context. Proc. 20th Machine Learning in Computational Biology Meeting 311, 1–11 (PMLR, 2025).

  56. Tejada-Lapuerta, A. et al. Nicheformer: a foundation model for single-cell and spatial omics. Nat. Methods 22, 2525–2538 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Wang, C. X. et al. scGPT-spatial: continual pretraining of single-cell foundation model for spatial transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2025.02.05.636714 (2025).

  58. Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model. Cell Res. 34, 830–845 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Hu, L. et al. RegFormer: a single-cell foundation model powered by gene regulatory hierarchies. Preprint at bioRxiv https://doi.org/10.1101/2025.01.24.634217 (2025).

  60. Yang, Z. et al. Multiomic foundation model predicts epigenetic regulation by zero-shot. Preprint at bioRxiv https://doi.org/10.1101/2024.12.19.629561 (2024).

  61. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  CAS  PubMed  Google Scholar 

  62. Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025).

    Article  CAS  PubMed  Google Scholar 

  63. Rosen, Y. et al. Universal cell embeddings: a foundation model for cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.11.28.568918 (2023).

  64. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Wang, W. et al. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nat. Commun. 14, 7266 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Shen, T. et al. Accurate RNA 3D structure prediction using a language model-based deep learning approach. Nat. Methods 21, 2287–2298 (2024).

  67. Pearce, R., Omenn, G. S. & Zhang, Y. De novo RNA tertiary structure prediction at atomic resolution using geometric potentials from deep learning. Preprint at bioRxiv https://doi.org/10.1101/2022.05.15.491755 (2022).

  68. Krishna, R. et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 384, eadl2528 (2024).

  69. Wu, K. E. et al. Protein structure generation via folding diffusion. Nat. Commun. 15, 1059 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Jing, B. et al. Eigenfold: generative protein structure prediction with diffusion models. In Workshop on Machine Learning for Drug Discovery (MLDD) at the 11th International Conference on Learning Representations (ICLR) (eds. Notin, P. et al.) (2023).

  71. Fu, C. et al. A latent diffusion model for protein structure generation. In Proc. Second Learning on Graphs Conference (eds Villar, S. & Chamberlain, B.) 29:1–29:17 (PMLR, 2024).

  72. Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. In Rao, R., Adler, J., Anand, N., Ingraham, J., Ovchinnikov, S. & Zhong, E. (eds) Workshop on Machine Learning in Structural Biology at the 36th Conference on Neural Information Processing Systems (NeurIPS) (2022).

  73. Wang, Z. et al. RNADiffFold: generative RNA secondary structure prediction using discrete diffusion models. Brief. Bioinform. 26, bbae618 (2025).

    Article  CAS  Google Scholar 

  74. Fang, A., Zhang, Z., Zhou, A. & Zitnik, M. ATOMICA: learning universal representations of intermolecular interactions. Preprint at bioRxiv https://doi.org/10.1101/2025.04.02.646906 (2025).

  75. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Butcher, J. et al. De novo design of All-Atom biomolecular interactions with RFdiffusion3. Preprint at bioRxiv https://doi.org/10.1101/2025.09.18.676967 (2025).

  77. Ahern, W. et al. Atom-level enzyme active site scaffolding using RFdiffusion2. Nat. Methods 23, 96–105 (2026).

  78. Lisanza, S. L. et al. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat. Biotechnol. 43, 1288–1298 (2025).

  79. Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Gruver, N. et al. Protein design with guided discrete diffusion. Adv. Neural Inf. Process. Syst. 36, 12489–12517 (2023).

    Google Scholar 

  81. Ni, B., Kaplan, D. L. & Buehler, M. J. Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model. Chem 9, 1828–1849 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Liu, Y. et al. De novo protein design with a denoising diffusion network independent of pretrained structure prediction models. Nat. Methods 21, 2107–2116 (2024).

    Article  CAS  PubMed  Google Scholar 

  83. Sarkar, A., Tang, Z., Zhao, C. & Koo, P. K. Designing DNA with tunable regulatory activity using discrete diffusion. In Workshop on AI for New Drug Modalities at the 37th Conference on Neural Information Processing Systems (NeurIPS) (eds Uehara, M. et al.) (2024).

  84. Hou, D. et al. A hyperbolic discrete diffusion 3D RNA inverse folding model for functional RNA design. J. Chem. Inf. Model. 65, 6568–6584 (2025).

  85. Zhao, Y., Oono, K., Takizawa, H. & Kotera, M. GenerRNA: a generative pre-trained language model for de novo RNA design. PLoS ONE 19, e0310814 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Huang, B. et al. A backbone-centred energy function of neural networks for protein design. Nature 602, 523–528 (2022).

    Article  CAS  PubMed  Google Scholar 

  87. Wallach, H. et al. (eds). Generative models for graph-based protein design. Proceedings of the 33rd Conference on Neural Information Processing Systems Vol. 32 (Curran Associates, 2019).

  88. Strokach, A., Becerra, D., Corbi-Verge, C., Perez-Riba, A. & Kim, P. M. Fast and flexible protein design using deep graph neural networks. Cell Syst. 11, 402–411 (2020).

    Article  CAS  PubMed  Google Scholar 

  89. Zhang, X., Yin, H., Ling, F., Zhan, J. & Zhou, Y. SPIN-CGNN: improved fixed backbone protein design with contact map-based graph construction and contact graph neural network. PLoS Comput. Biol. 19, e1011330 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. CellPose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).

    Article  CAS  PubMed  Google Scholar 

  91. Archit, A. et al. Segment anything for microscopy. Nat. Methods 22, 579–591 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. He, L., Shi, R., Wang, W., Cai, Y. & Ma, L. Unifying the electron microscopy multiverse through a large-scale foundation model. Preprint at bioRxiv https://doi.org/10.1101/2025.04.13.648639 (2025).

  93. Pachitariu, M. & Stringer, C. CellPose 2.0: how to train your own model. Nat. Methods 19, 1634–1641 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Kirillov, A. et al. Segment anything. Proc. IEEE/CVF Int. Conf. Computer Vision 4015–4026 (IEEE, 2023).

  95. Gupta, A. et al. SubCell: vision foundation models for microscopy capture single-cell biology. Preprint at bioRxiv https://doi.org/10.1101/2024.12.06.627299 (2024).

  96. Ma, C., Tan, W., He, R. & Yan, B. Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration. Nat. Methods 21, 1558–1567 (2024).

    Article  CAS  PubMed  Google Scholar 

  97. Bilodeau, A. et al. A self-supervised foundation model for robust and generalizable representation learning in STED microscopy. Preprint at bioRxiv https://doi.org/10.1101/2025.06.06.656993 (2025).

  98. Bilal, M. et al. Foundation models in computational pathology: a review of challenges, opportunities, and impact. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.08333 (2025).

  99. Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Zimmermann, E. et al. Virchow2: scaling self-supervised mixed magnification models in pathology. Preprint at arXiv https://doi.org/10.48550/arXiv.2408.00738 (2024).

  101. Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Nicke, T. et al. Tissue Concepts v2: a supervised foundation model for whole slide images. Preprint at arXiv https://doi.org/10.48550/arXiv.2507.05742 (2025).

  103. Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).

    Article  CAS  PubMed  Google Scholar 

  104. Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Chen, Y. et al. Slidechat: a large vision-language assistant for whole-slide pathology image understanding. Proc. Computer Vision and Pattern Recognition Conf. 5134–5143 (2025).

  106. Vaidya, A. et al. Molecular-driven foundation model for oncologic pathology. Preprint at arXiv https://doi.org/10.48550/arXiv.2501.16652 (2025).

  107. Xu, Y. et al. A multimodal knowledge-enhanced whole-slide pathology foundation model. Nat Commun 16, 11406 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Senan, S. et al. DNA-Diffusion: leveraging generative models for controlling chromatin accessibility and gene expression via synthetic regulatory elements. In Workshop on Machine Learning for Genomics Explorations (MLGenX) at the 12th International Conference on Learning Representations (ICLR) (eds Hajiramezanali, E. et al.) (2024).

  109. Bai, Y., Zhong, H., Wang, T. & Lu, Z. J. OligoFormer: an accurate and robust prediction method for siRNA design. Bioinformatics 40, btae577 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Wong, F. et al. Deep generative design of RNA aptamers using structural predictions. Nat. Comput. Sci. 4, 829–839 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Xiong, D. et al. A structurally informed human protein–protein interactome reveals proteome-wide perturbations caused by disease mutations. Nat. Biotechnol. 43, 1510–1524 (2025).

    Article  CAS  PubMed  Google Scholar 

  112. Réau, M., Renaud, N., Xue, L. C. & Bonvin, A. M. J. J. DeepRank-GNN: a graph neural network framework to learn patterns in protein–protein interfaces. Bioinformatics 39, btac759 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  113. Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).

    Article  CAS  PubMed  Google Scholar 

  114. Rajwade, D. et al. Understanding protein–DNA interactions by paying attention to protein and genomics foundation models. In Proc. NeurIPS 2024 Workshop on Foundation Models for Science: Progress, Opportunities, and Challenges (eds Chen, W. et al.) (2024).

  115. Gainza, P. et al. De novo design of protein interactions with learned surface fingerprints. Nature 617, 176–184 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Marchand, A. et al. Targeting protein–ligand neosurfaces with a generalizable deep learning tool. Nature 639, 522–531 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. 42, 275–283 (2024).

    Article  CAS  PubMed  Google Scholar 

  118. Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Nat. Comput. Sci. 4, 899–909 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Igashov, I. et al. Equivariant 3D-conditional diffusion model for molecular linker design. Nat. Mach. Intell. 6, 417–427 (2024).

    Article  Google Scholar 

  120. Li, X.-S. et al. Multiphysical graph neural network (MP-GNN) for COVID-19 drug design. Brief. Bioinform. 23, bbac231 (2022).

    Article  PubMed  Google Scholar 

  121. Shanker, V. R., Bruun, T. U., Hie, B. L. & Kim, P. S. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science 385, 46–53 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Jonas, E. & Kuhn, S. Rapid prediction of NMR spectral properties with quantified uncertainty. J. Cheminform. 11, 50 (2019).

    Article  PubMed  Google Scholar 

  123. Kang, S., Kwon, Y., Lee, D. & Choi, Y.-S. Predictive modeling of NMR chemical shifts without using atomic-level annotations. J. Chem. Inf. Model. 60, 3765–3769 (2020).

    Article  CAS  PubMed  Google Scholar 

  124. Young, A., Röst, H. & Wang, B. Tandem mass spectrum prediction for small molecules using graph transformers. Nat. Mach. Intell. 6, 404–416 (2024).

    Article  Google Scholar 

  125. Hu, F., Chen, M. S., Rotskoff, G. M., Kanan, M. W. & Markland, T. E. Accurate and efficient structure elucidation from routine one-dimensional NMR spectra using multitask machine learning. ACS Cent. Sci. 10, 2162–2170 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Yilmaz, M., Fondrie, W., Bittremieux, W., Oh, S. & Noble, W. S. De novo mass spectrometry peptide sequencing with a transformer model. In Proc. Int. Conf. Machine Learning (eds Chaudhuri, K. et al.) 25514–25522 (PMLR, 2022).

  127. Liang, Y., Li, D., Xu, A. G., Shao, Y. & Tang, K. GeneBag: training a cell foundation model for broad-spectrum cancer diagnosis and prognosis with bulk RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/2024.06.27.601098 (2024).

  128. Theus, A., Barkmann, F., Wissel, D. & Boeva, V. CancerFoundation: a single-cell RNA sequencing foundation model to decipher drug resistance in cancer. Preprint at bioRxiv https://doi.org/10.1101/2024.11.01.621087 (2024).

  129. Maleki, S. et al. Efficient fine-tuning of single-cell foundation models enables zero-shot molecular perturbation prediction. In Workshop on Machine Learning for Genomics Explorations (MLGenX) at the 13th International Conference on Learning Representations (ICLR) (eds Hajiramezanali, E. et al.) (2025).

  130. Sumanaweera, D. et al. Gene-level alignment of single-cell trajectories. Nat. Methods 22, 68–81 (2025).

    Article  CAS  PubMed  Google Scholar 

  131. Ergen, C. et al. Consensus prediction of cell type labels in single-cell data with popV. Nat. Genet. 56, 2731–2738 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Schuster, V., Dann, E., Krogh, A. & Teichmann, S. A. multiDGD: a versatile deep generative model for multi-omics data. Nat. Commun. 15, 10031 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Zinati, Y., Takiddeen, A. & Emad, A. GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks. Nat. Commun. 15, 4055 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Wan, J. et al. TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images. IEEE J. Biomed. Health Inform. 29, 8246–8255 (2025).

    Article  PubMed  Google Scholar 

  135. Zhuo, Z. et al. Segment anything for dendrites from electron microscopy. Proceedings of the 2025 IEEE 6th International Conference on Image Processing, Applications and Systems (IPAS) pp. 1–6 (IEEE, 2025).

  136. Van Gent, D. C. & Kanaar, R. Exploiting DNA repair defects for novel cancer therapies. Mol. Biol. Cell 27, 2145–2148 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Silverstein, R. A. et al. Custom CRISPR–Cas9 PAM variants via scalable engineering and machine learning. Nature 643, 539–550 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 22, 71–88 (2021).

    Article  CAS  PubMed  Google Scholar 

  139. Singh, R. et al. Learning the language of antibody hypervariability. Proc. Natl Acad. Sci. USA 122, e2418918121 (2025).

    Article  CAS  PubMed  Google Scholar 

  140. Guan, C., Fernandes, F. C., Franco, O. L. & de la Fuente-Nunez, C. Leveraging large language models for peptide antibiotic design. Cell Rep. Phys. Sci. 6, 102359 (2025).

    Article  CAS  PubMed  Google Scholar 

  141. Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).

    Article  CAS  PubMed  Google Scholar 

  142. Roney, M. & Aluwi, M. F. F. M. The importance of in-silico studies in drug discovery. Intell. Pharm. 2, 578–579 (2024).

    Google Scholar 

  143. Gottweis, J. et al. Towards an AI co-scientist. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.18864 (2025).

  144. Gridach, M., Nanavati, J., Abidine, K. Z. E., Mendes, L. & Mack, C. Agentic AI for scientific discovery: a survey of progress, challenges, and future directions. Preprint at arXiv https://doi.org/10.48550/arXiv.2503.08979 (2025).

  145. Yamada, Y. et al. The AI Scientist-v2: workshop-level automated scientific discovery via agentic tree search. Preprint at arXiv https://doi.org/10.48550/arXiv.2504.08066 (2025).

  146. Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

    Article  CAS  PubMed  Google Scholar 

  147. Swanson, K. et al. The virtual lab of AI agents designs new SARS-CoV-2 nanobodies. Nature 646, 716–723 (2025).

    Article  CAS  PubMed  Google Scholar 

  148. Huang, K. et al. Biomni: a general-purpose biomedical AI agent. Preprint at bioRxiv https://doi.org/10.1101/2025.05.30.656746 (2025).

  149. Wang, H. et al. SpatialAgent: an autonomous AI agent for spatial biology. Preprint at bioRxiv https://doi.org/10.1101/2025.04.03.646459 (2025).

  150. Youngblut, N. D. et al. scBaseCamp: an AI agent-curated, uniformly processed, and continually expanding single cell data repository. Preprint at bioRxiv https://doi.org/10.1101/2025.02.27.640494 (2025).

  151. Loew, L. M. & Schaff, J. C. The virtual cell: a software environment for computational cell biology. Trends Biotechnol. 19, 401–406 (2001).

    Article  CAS  PubMed  Google Scholar 

  152. Heimberg, G. et al. A cell atlas foundation model for scalable search of similar human cells. Nature 638, 1085–1094 (2025).

    Article  CAS  PubMed  Google Scholar 

  153. Fischer, F. et al. scTab: scaling cross-tissue single-cell annotation models. Nat. Commun. 15, 6611 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Gao, H. et al. Building a learnable universal coordinate system for single-cell atlas with a joint-VAE model. Commun. Biol. 7, 977 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  155. Zhang, J. et al. Tahoe-100M: a giga-scale single-cell perturbation atlas for context-dependent gene function and cellular modeling. Preprint at bioRxiv https://doi.org/10.1101/2025.02.20.639398 (2025).

  156. Adduri, A. et al. Predicting cellular responses to perturbation across diverse contexts with STATE. In Workshop on AI Virtual Cells and Instruments: A New Era in Drug Discovery and Development at the 38th Conference on Neural Information Processing Systems (NeurIPS) (eds Gu, Q. et al.) (2024).

  157. Roohani, Y. H. et al. Virtual Cell Challenge: toward a turing test for the virtual cell. Cell 188, 3370–3374 (2025).

    Article  CAS  PubMed  Google Scholar 

  158. Wenckstern, J. et al. AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery. Clin. Cancer Res. 31(13 Suppl.), B037 (2025).

    Article  Google Scholar 

  159. He, S. et al. Learning single-cell spatial context through integrated spatial multiomics with CORAL. Preprint at bioRxiv https://doi.org/10.1101/2025.02.01.636038 (2025).

  160. Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci. 2, 399–408 (2022).

    Article  CAS  PubMed  Google Scholar 

  161. Bao, X., Bai, X., Liu, X., Shi, Q. & Zhang, C. Spatially informed graph transformers for spatially resolved transcriptomics. Commun. Biol. 8, 574 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Zhang, D. et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat. Biotechnol. 42, 1372–1377 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. He, S. et al. Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor-immune hubs. Nat. Biotechnol. 43, 223–235 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  164. Lee, Y., Liu, X., Hao, M., Liu, T. & Regev, A. PathOmCLIP: connecting tumor histology with spatial gene expression via locally enhanced contrastive learning of pathology and single-cell foundation model. Preprint at bioRxiv https://doi.org/10.1101/2024.12.10.627865 (2024).

  165. Almagro-Pérez, C. et al. AI-driven 3D spatial transcriptomics. Preprint at arXiv https://doi.org/10.48550/arXiv.2502.17761 (2025).

  166. Chen, W. et al. A visual–omics foundation model to bridge histopathology with spatial transcriptomics. Nat. Methods 22, 1568–158 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  167. Cui, H. et al. Towards multimodal foundation models in molecular cell biology. Nature 640, 623–633 (2025).

    Article  CAS  PubMed  Google Scholar 

  168. Tang, Z., Somia, N., Yu, Y. & Koo, P. K. Evaluating the representational power of pre-trained DNA language models for regulatory genomics. Genome Biol. 26, 203 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  169. Tsishyn, M., Hermans, P., Rooman, M. & Pucci, F. Residue conservation and solvent accessibility are (almost) all you need for predicting mutational effects in proteins. Bioinformatics 41, btaf322 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Atti, S. & Subramaniam, S. Fundamental limitations of foundation models in single-cell transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2025.06.26.661767 (2025).

  171. Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Zero-shot evaluation reveals limitations of single-cell foundation models. Genome Biol. 26, 101 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  172. Ahlmann-Eltze, C., Huber, W. & Anders, S. Deep-learning-based gene perturbation effect prediction does not yet outperform simple linear baselines. Nat. Methods 22, 1657–1661 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. Märtens, K., Donovan-Maiye, R. & Ferkinghoff-Borg, J. Enhancing generative perturbation models with LLM-informed gene embeddings. In Proc. ICLR 2024 Workshop on Machine Learning for Genomics Explorations (eds Theis, F. et al.) (2024).

  174. Csendes, G., Sanz, G., Szalay, K. Z. & Szalai, B. Benchmarking foundation cell models for post-perturbation RNA-seq prediction. BMC Genomics 26, 393 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  175. Liu, Z. et al. Genbench: a benchmarking suite for systematic evaluation of genomic foundation models. Preprint at arXiv https://doi.org/10.48550/arXiv.2406.01627 (2024).

  176. Gao, Z. et al. PFMBench: protein foundation model benchmark. Preprint at arXiv https://doi.org/10.48550/arXiv.2506.14796 (2025).

  177. Qiu, P. et al. BioLLM: a standardized framework for integrating and benchmarking single-cell foundation models. Patterns (N.Y.) 6, 101326 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  178. Theodoris, C. V. Perspectives on benchmarking foundation models for network biology. Quant. Biol. 12, 335–338 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  179. Fishman, V. et al. GENA-LM: a family of open-source foundational DNA language models for long sequences. Nucleic Acids Res. 53, gkae1310 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  180. Koonin, E. V., Wolf, Y. I. & Karev, G. P. The structure of the protein universe and genome evolution. Nature 420, 218–223 (2002).

    Article  CAS  PubMed  Google Scholar 

  181. Rood, J. E., Hupalowska, A. & Regev, A. Toward a foundation model of causal cell and tissue biology with a perturbation cell and tissue atlas. Cell 187, 4520–4545 (2024).

    Article  CAS  PubMed  Google Scholar 

  182. Li, C. et al. Benchmarking AI models for in silico gene perturbation of cells. Preprint at bioRxiv https://doi.org/10.1101/2024.12.20.629581 (2024).

  183. Yuan, B. et al. CellBox: interpretable machine learning for perturbation biology with application to the design of cancer combination therapy. Cell Syst. 12, 128–140 (2021).

    Article  CAS  PubMed  Google Scholar 

  184. Qian, L. et al. AI-empowered perturbation proteomics for complex biological systems. Cell Genomics 4, 100691 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  185. Pearce, J. D. et al. A cross-species generative cell atlas across 1.5 billion years of evolution: the TranscriptFormer Single-cell Model. Preprint at bioRxiv https://doi.org/10.1101/2025.04.25.650731(2025).

  186. Zhang, Q., Stelzer, A. C., Fisher, C. K. & Al-Hashimi, H. M. Visualizing spatially correlated dynamics that directs RNA conformational transitions. Nature 450, 1263–1267 (2007).

    Article  CAS  PubMed  Google Scholar 

  187. Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).

    Article  CAS  PubMed  Google Scholar 

  188. Börner, K. et al. Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas construction and usage. Nat. Methods 22, 845–860 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  189. Rozenblatt-Rosen, O. et al. The Human Tumor Atlas Network: charting tumor transitions across space and time at single-cell resolution. Cell 181, 236–249 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  190. Coleman, K. et al. Resolving tissue complexity by multimodal spatial omics modeling with MISO. Nat. Methods 22, 530–538 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  191. Long, Y. et al. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nat. Methods 21, 1658–1667 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  192. Zeng, Y. et al. Imputing spatial transcriptomics through gene network constructed from protein language model. Commun. Biol. 7, 1271 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  193. Chen, T. et al. SELF-Former: multi-scale gene filtration transformer for single-cell spatial reconstruction. Brief. Bioinform. 25, bbae523 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  194. Schroeder, A. et al. Scaling up spatial transcriptomics for large-sized tissues: uncovering cellular-level tissue architecture beyond conventional platforms with iSCALE. Nat Methods 22, 1911–1922 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  195. Gandin, V. et al. Deep-tissue transcriptomics and subcellular imaging at high spatial resolution. Science 388, eadq2084 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  196. U.S. Food & Drug Administration. FDA announces plan to phase out animal testing requirement for monoclonal antibodies and other drugs. https://www.fda.gov/news-events/press-announcements/fda-announces-plan-phase-out-animal-testing-requirement-monoclonal-antibodies-and-other-drugs (2025).

  197. Kim, J., Koo, B.-K. & Knoblich, J. A. Human organoids: model systems for human biology and medicine. Nat. Rev. Mol. Cell Biol. 21, 571–584 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  198. Passaro, S. et al. Boltz-2: towards accurate and efficient binding affinity prediction. Preprint at bioRxiv https://doi.org/10.1101/2025.06.14.659707 (2025).

  199. Wechsler, H. (ed.). Neural Networks for Perception pp. 65–93 (Academic Press, 1992).

  200. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction pp. 485–585 (Springer, 2008).

  201. Cord, M. & Cunningham, P. (eds). Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval pp. 21–49 (Springer, 2008).

  202. O’shea, K. & Nash, R. An introduction to convolutional neural networks. Preprint at arXiv https://doi.org/10.48550/arXiv.1511.08458 (2015).

  203. Xing, F., Xie, Y., Su, H., Liu, F. & Yang, L. Deep learning in microscopy image analysis: A survey. IEEE Trans. Neural Netw. Learn. Syst. 29, 4550–4568 (2017).

    Article  Google Scholar 

  204. Banerji, S. & Mitra, S. Deep learning in histopathology: a review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 12, e1439 (2022).

    Article  Google Scholar 

  205. Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In 9th Int. Conf. Learning Representations (ICLR) (2021).

  206. Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4171–4186 (Association for Computational Linguistics, 2019).

Download references

Author information

Authors and Affiliations

Authors

Contributions

V.M.R., E.J.T. and P.R. conceptualized the study. V.M.R. performed the investigation and contributed to writing the original draft. V.M.R., S.Z., B.S.P., P.D.H., B.W., J.Z., M.Z., E.J.T. and P.R. contributed to writing, reviewing and editing. S.Z. and V.M.R. were responsible for visualization. E.J.T. and P.R. were responsible for supervision.

Corresponding authors

Correspondence to Eric J. Topol or Pranav Rajpurkar.

Ethics declarations

Competing interests

P.D.H. acknowledges outside interest as a cofounder of Stylus Medicine, Terrain Biosciences and Monet AI; serves on the board of directors at Stylus Medicine; is a board observer at EvolutionaryScale and Terrain Biosciences; is a scientific advisory board member at Arbor Biosciences and Veda Bio; and is an advisor to NFDG, Varda Space and Vial Health. J.Z. is an advisor for Amgen, Together AI, InVision and Fidocure. E.J.T. reports receiving personal fees from Abridge, Danaher, Mercor and Flagship Pioneering. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rao, V.M., Zhang, S., Plosky, B.S. et al. Generalist biological artificial intelligence in modeling the language of life. Nat Biotechnol (2026). https://doi.org/10.1038/s41587-026-03064-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41587-026-03064-w

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research