Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Convergence of machine learning and genomics for precision oncology

Abstract

The number of data points per patient considered at the point-of-care in precision cancer medicine continues to increase, and it is accompanied by a growing challenge of translating these observations into clinical insights. This is a time-intensive and laborious process for oncology professionals and molecular tumour boards. As large clinicogenomic datasets and data-sharing protocols mature alongside machine learning methods, molecular diagnostic workflows have an opportunity to integrate these tools. This integration can help extract more information from next-generation sequencing data, enhance cancer variant interpretation, streamline case review and generate therapeutic hypotheses for biomarker-negative patients at the point-of-care. Although machine learning holds promise for precision oncology, responsible implementation and model evaluation remain essential for clinical adoption.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Opportunities for machine learning along a precision oncology molecular diagnostic workflow.
Fig. 2: Data sharing enables higher sample sizes to facilitate clinicogenomic discovery and model training.
Fig. 3: Patient similarity approaches for precision cancer medicine.

Similar content being viewed by others

References

  1. Suehnholz, S. P. et al. Quantifying the expanding landscape of clinical actionability for patients with cancer. Cancer Discov. 14, 49–65 (2023).

    Article  PubMed Central  Google Scholar 

  2. Horak, P. & Fröhling, S. Measuring progress in precision oncology. Cancer Discov. 14, 18–19 (2024).

    Article  PubMed  Google Scholar 

  3. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017).

    Article  Google Scholar 

  4. Griffith, M. et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 49, 170–174 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Reardon, B. et al. Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology. Nat. Cancer 2, 1102–1112 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Luchini, C., Lawlor, R. T., Milella, M. & Scarpa, A. Molecular tumor boards in clinical practice. Trends Cancer 6, 738–744 (2020).

    Article  PubMed  Google Scholar 

  7. Gladstone, B. P. et al. Systematic review and meta-analysis of molecular tumor board data on clinical effectiveness and evaluation gaps. NPJ Precis. Oncol. 9, 96 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Nichetti, F. et al. Real-world outcomes of molecular tumor board treatment recommendations. JCO Precis. Oncol. 9, e2400387 (2025).

    Article  PubMed  Google Scholar 

  9. The AACR Project GENIE Consortium et al. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 7, 818–831 (2017).

    Article  PubMed Central  Google Scholar 

  10. Pugh, T. J. et al. AACR project GENIE: 100,000 cases and beyond. Cancer Discov. 12, 2044–2057 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Wang, S. & Ye, K. Deep-learning based representation and recognition for genome variants — from SNVs to structural variants. Natl Sci. Rev. 11, nwae335 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018). This paper, the publication of DeepVariant, brought the proliferation of machine learning to bioinformatics, demonstrating that traditional heuristic and statistical approaches to variant calling could be outperformed.

    Article  PubMed  CAS  Google Scholar 

  13. Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. AlDubayan, S. H. et al. Detection of pathogenic variants with germline genetic testing using deep learning vs standard methods in patients with prostate cancer and melanoma. JAMA 324, 1957–1969 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Olson, N. D. et al. PrecisionFDA Truth Challenge V2: calling variants from short and long reads in difficult-to-map regions. Cell Genom. 2, 100129 (2022). This paper illustrates the methodological shift of variant callers towards using machine learning while also highlighting challenge areas for future developers.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Mandiracioglu, B. et al. ECOLE: learning to call copy number variants on whole exome sequencing data. Nat. Commun. 15, 132 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Popic, V. et al. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat. Methods 20, 559–568 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Behera, S. et al. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat. Biotechnol. 43, 1177–1191 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Yi, R., Chang, P.-C., Baid, G. & Carroll, A. Learning from data-rich problems: a case study on genetic variant calling. Preprint at https://doi.org/10.48550/arXiv.1911.05151 (2019).

  21. Scheffler, K. et al. Somatic small-variant calling methods in Illumina DRAGENTM Secondary Analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.534011 (2023).

  22. Park, J. et al. Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02839-x (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Betschart, R. O. et al. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci. Rep. 12, 21502 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Roy, S. et al. Standards and guidelines for validating next-generation sequencing bioinformatics pipelines. J. Mol. Diagn. 20, 4–27 (2018).

    Article  PubMed  CAS  Google Scholar 

  25. van de Haar, J. et al. ESMO recommendations on clinical reporting of genomic test results for solid cancers. Ann. Oncol. 35, 954–967 (2024).

    Article  PubMed  Google Scholar 

  26. Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  27. den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).

    Article  Google Scholar 

  28. Holmes, J. B., Moyer, E., Phan, L., Maglott, D. & Kattman, B. SPDI: data model for variants and applications at NCBI. Bioinformatics 36, 1902–1907 (2020).

    Article  PubMed  CAS  Google Scholar 

  29. Wang, M. et al. hgvs: a python package for manipulating sequence variants using HGVS nomenclature: 2018 update. Hum. Mutat. 39, 1803–1813 (2018).

    Article  PubMed  Google Scholar 

  30. Lefter, M. et al. Mutalyzer 2: next generation HGVS nomenclature checker. Bioinformatics 37, 2811–2817 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. van Giffen, B., Herhausen, D. & Fahse, T. Overcoming the pitfalls and perils of algorithms: a classification of machine learning biases and mitigation methods. J. Bus. Res. 144, 93–106 (2022).

    Article  Google Scholar 

  32. Singh, D. & Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 97, 105524 (2020).

    Article  Google Scholar 

  33. Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 50, D20–D26 (2022).

    Article  PubMed  CAS  Google Scholar 

  34. Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J. & Dalgleish, R. VariantValidator: accurate validation, mapping, and formatting of sequence variation descriptions. Hum. Mutat. 39, 61–68 (2018).

    Article  PubMed  Google Scholar 

  35. Freeman, P. J. et al. Standardizing variant naming in literature with VariantValidator to increase diagnostic rates. Nat. Genet. 56, 2284–2286 (2024).

    Article  PubMed  CAS  Google Scholar 

  36. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Wagner, A. H. et al. The GA4GH Variation Representation Specification: a computational framework for variation representation and federated identification. Cell Genom. 1, 100027 (2021). This paper shows that VRS enables semantically precise, computable variant representation that facilitates further downstream bioinformatic applications and machine learning models.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).

    Article  PubMed  CAS  Google Scholar 

  39. Arbesfeld, J. A. et al. Mapping MAVE data for use in human genomics applications. Genome Biol. 26, 179 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Pagel, K. A. et al. Integrated informatics analysis of cancer-related variants. JCO Clin. Cancer Inform. 4, 310–317 (2020).

    Article  PubMed  Google Scholar 

  41. Bruijn, I. et al. Genome Nexus: a comprehensive resource for the annotation and interpretation of genomic variants in cancer. JCO Clin. Cancer Inform. 6, e2100144 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Durkie, M. et al. ACGS Best Practice Guidelines for Variant Classification in Rare Disease (ACGS, 2024).

  44. Horak, P. et al. Standards for the classification of pathogenicity of somatic variants in cancer (oncogenicity): joint recommendations of Clinical Genome Resource (ClinGen), Cancer Genomics Consortium (CGC), and Variant Interpretation for Cancer Consortium (VICC). Genet. Med. 24, 986–998 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).

    Article  PubMed  CAS  Google Scholar 

  46. Brandes, N., Goldman, G., Wang, C. H., Ye, C. J. & Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). This paper shows DeepMind’s AlphaMissense and introduces it as a transformative deep learning model for missense variant effect prediction that was rigorously evaluated for its utility within pathogenicity assessments.

    Article  PubMed  CAS  Google Scholar 

  48. Kurtovic-Kozaric, A. et al. Comprehensive evaluation of AlphaMissense predictions by evidence quantification for variants of uncertain significance. Front. Genet. 15, 1487608 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Muiños, F., Martínez-Jiménez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).

    Article  PubMed  Google Scholar 

  50. Demajo, S. et al. Identification of clonal hematopoiesis driver mutations through in silico saturation mutagenesis. Cancer Discov. 14, 1717–1731 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Vihinen, M. Problems in variation interpretation guidelines and in their implementation in computational tools. Mol. Genet. Genom. Med. 8, e1206 (2020).

    Article  Google Scholar 

  52. Fayer, S. et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Rubin, A. F. et al. MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays. Genome Biol. 26, 13 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Arafeh, R., Shibue, T., Dempster, J. M., Hahn, W. C. & Vazquez, F. The present and future of the cancer dependency map. Nat. Rev. Cancer 25, 59–73 (2025).

    Article  PubMed  CAS  Google Scholar 

  55. Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. Preprint at bioRxiv https://doi.org/10.1101/2025.02.18.638918 (2025).

  56. Avsec, Ž. et al. AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model. Preprint at bioRxiv https://doi.org/10.1101/2025.06.25.661532 (2025).

  57. Li, M. M. et al. Standards and guidelines for the interpretation and reporting of sequence variants in cancer. J. Mol. Diagn. 19, 4–23 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Mateo, J. et al. A framework to rank genomic alterations as targets for cancer precision medicine: the ESMO Scale for Clinical Actionability of Molecular Targets (ESCAT). Ann. Oncol. 29, 1895–1902 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. He, M. M. et al. Variant Interpretation for Cancer (VIC): a computational tool for assessing clinical impacts of somatic variants. Genome Med. 11, 53 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Li, Q. et al. CancerVar: an artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer. Sci. Adv. 8, eabj1624 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. Ruzicka, J. et al. Clinical evaluation of an AI system for streamlined variant interpretation in genetic testing. Preprint at medRxiv https://doi.org/10.1101/2025.02.04.25321641 (2025).

  62. Lammert, J. et al. Large language models for precision oncology: clinical decision support through expert-guided learning. J. Clin. Oncol. 42, e13609 (2024).

    Article  Google Scholar 

  63. Klein, H. et al. MatchMiner: an open-source platform for cancer precision medicine. NPJ Precis. Oncol. 6, 69 (2022). The authors introduce a clinical trial matching platform and a structured format for enrolment criteria to facilitate clinical trial matching for precision oncology, addressing a historically intractable problem within the field.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Lotter, W. et al. Artificial intelligence in oncology: current landscape, challenges, and future directions. Cancer Discov. 14, 711–726 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Wong, C. et al. Scaling clinical trial matching using large language models: a case study in oncology. In Proc. 8th Machine Learning for Healthcare Conference 846–862 (PMLR, 2023).

  66. Jin, Q. et al. Matching patients to clinical trials with large language models. Nat. Commun. 15, 9074 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Cerami, E. et al. MatchMiner-AI: an open-source solution for cancer clinical trial matching. Preprint at https://doi.org/10.48550/arXiv.2412.17228 (2024).

  68. Reisle, C. et al. Evaluating language models for biomedical fact-checking: a benchmark dataset for cancer variant interpretation verification. Preprint at bioRxiv https://doi.org/10.1101/2025.09.10.675443 (2025).

  69. Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33, 9459–9474 (Curran Associates, 2020).

  70. Jun, H. et al. Implementing a context-augmented large language model to guide precision cancer medicine. Preprint at medRxiv https://doi.org/10.1101/2025.05.09.25327312 (2025).

  71. Schick, T. et al. Toolformer: language models can teach themselves to use tools. In Advances in Neural Information Processing Systems 36, 68539–68551 (Curran Associates, 2023).

  72. Yao, S. et al. ReAct: synergizing reasoning and acting in language models. Preprint at https://doi.org/10.48550/arXiv.2210.03629 (2023).

  73. Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning across a universe of tools. Preprint at https://doi.org/10.48550/arXiv.2503.10970 (2025).

  74. Ferber, D. et al. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer 6, 1337–1349 (2025). This study is one of the most prominent illustrations of agentic AI systems being applied to precision oncology to support a wide array of clinical decision-making tasks.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  75. Benary, M. et al. Leveraging large language models for decision support in personalized oncology. JAMA Netw. Open 6, e2343689 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Verlingue, L. et al. Artificial intelligence in oncology: ensuring safe and effective integration of language models in clinical practice. Lancet Reg. Health Eur. 46, 101064 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Elemento, O., Khozin, S. & Sternberg, C. N. The use of artificial intelligence for cancer therapeutic decision-making. NEJM AI 2, AIra2401164 (2025).

    Article  Google Scholar 

  78. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  79. Yang, K., Qinami, K., Fei-Fei, L., Deng, J. & Russakovsky, O. Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 547–558 (ACM, 2020).

  80. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  81. Acebedo, A. et al. Collaborating across sectors in service of open science, precision oncology, and patients: an overview of the AACR Project GENIE (Genomics Evidence Neoplasia Information Exchange) Biopharma Collaborative (BPC). ESMO Real World Data Digit. Oncol. 7, 100097 (2025).

    Article  Google Scholar 

  82. Painter, C. A. et al. The Angiosarcoma Project: enabling genomic and clinical discoveries in a rare cancer through patient-partnered research. Nat. Med. 26, 181–187 (2020).

    Article  PubMed  CAS  Google Scholar 

  83. Crowdis, J. et al. A patient-driven clinicogenomic partnership for metastatic prostate cancer. Cell Genom. 2, 100169 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  84. Lee, E., Jung, S. Y., Hwang, H. J. & Jung, J. Patient-level cancer prediction models from a nationwide patient cohort: model development and validation. JMIR Med. Inform. 9, e29807 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  85. Placido, D. et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat. Med. 29, 1113–1122 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  86. Buk Cardoso, L. et al. Machine learning for predicting survival of colorectal cancer patients. Sci. Rep. 13, 8874 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  87. Moon, I. et al. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat. Med. 29, 2057–2067 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  88. Jee, J. et al. Automated real-world data integration improves cancer outcome prediction. Nature 636, 728–736 (2024). This paper shows MSKCC leveraging their data warehouse to develop a machine learning model to predict clinical outcomes, a paradigm that will continue to define clinicogenomic discoveries in the near term.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 1–7 (2020).

    Article  Google Scholar 

  90. Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 12598 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Pati, S. et al. Federated learning enables big data for rare cancer boundary detection. Nat. Commun. 13, 7346 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  92. Brauneck, A. et al. Federated machine learning in data-protection-compliant research. Nat. Mach. Intell. 5, 2–4 (2023).

    Article  Google Scholar 

  93. Ogier du Terrail, J. et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat. Med. 29, 135–146 (2023).

    Article  PubMed  CAS  Google Scholar 

  94. Stark, Z. et al. A call to action to scale up research and clinical genomic data sharing. Nat. Rev. Genet. 26, 141–147 (2024). This study outline several steps to data sharing and harmonization that can enable clinicogenomic datasets of thousands of patients with cancer, enabling biological discovery and machine learning models that generalize across institutions.

    Article  PubMed  Google Scholar 

  95. Fiume, M. et al. Federated discovery and sharing of genomic data using Beacons. Nat. Biotechnol. 37, 220–224 (2019). This study describes the Beacon protocol of GA4GH for federated data sharing, and it has become ubiquitous with federated learning within genomics.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  96. Elhussein, A., Baymuradov, U., Elhadad, N., Natarajan, K. & Gürsoy, G. A framework for sharing of clinical and genetic data for precision medicine applications. Nat. Med. 30, 3578–3589 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  97. Cho, H. et al. Secure and federated genome-wide association studies for biobank-scale datasets. Nat. Genet. 57, 809–814 (2025).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  98. Hanser, T. et al. Data-driven federated learning in drug discovery with knowledge distillation. Nat. Mach. Intell. 7, 423–436 (2025).

    Article  Google Scholar 

  99. Riba, M. et al. The 1+Million Genomes Minimal Dataset for Cancer. Nat. Genet. 56, 733–736 (2024).

    Article  PubMed  CAS  Google Scholar 

  100. Kehl, K. L. et al. Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncol. 5, 1421–1429 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  101. Kehl, K. L. et al. Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin. Cancer Inform. 4, 680–690 (2020).

    Article  PubMed  Google Scholar 

  102. Sushil, M. et al. CORAL: expert-curated oncology reports to advance language model inference. NEJM AI 1, AIdbp2300110 (2024).

    Article  Google Scholar 

  103. Hoes, L. R. et al. Patients with rare cancers in the Drug Rediscovery Protocol (DRUP) benefit from genomics-guided treatment. Clin. Cancer Res. 28, 1402–1411 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  104. Helland, Å et al. Improving public cancer care by implementing precision medicine in Norway: IMPRESS-Norway. J. Transl. Med. 20, 225 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  105. Mohammad, S. F. H. et al. The evolution of precision oncology: the ongoing impact of the Drug Rediscovery Protocol (DRUP). Acta Oncol. 63, 34885 (2024).

    Google Scholar 

  106. Nikolski, M. et al. Roadmap for a European cancer data management and precision medicine infrastructure. Nat. Cancer 5, 367–372 (2024).

    Article  PubMed  Google Scholar 

  107. Sweeney, S. M. et al. Challenges to using big data in cancer. Cancer Res. 83, 1175–1182 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  108. Seligson, N. D. et al. Recommendations for patient similarity classes: results of the AMIA 2019 Workshop on Defining Patient Similarity. J. Am. Med. Inform. Assoc. 27, 1808–1812 (2020). This study provides a conceptual roadmap for the development and implementation of patient similarity approaches within medicine broadly.

    Article  PubMed  PubMed Central  Google Scholar 

  109. Allam, A., Dittberner, M., Sintsova, A., Brodbeck, D. & Krauthammer, M. Patient similarity analysis with longitudinal health data. Preprint at https://doi.org/10.48550/arXiv.2005.06630 (2020).

  110. Jia, Z., Zeng, X., Duan, H., Lu, X. & Li, H. A patient-similarity-based model for diagnostic prediction. Int. J. Med. Inf. 135, 104073 (2020).

    Article  Google Scholar 

  111. Navaz, A. N. et al. A novel patient similarity network (PSN) framework based on multi-model deep learning for precision medicine. J. Pers. Med. 12, 768 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  112. Wang, N. et al. Sequential data-based patient similarity framework for patient outcome prediction: algorithm development. J. Med. Internet Res. 24, e30720 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  113. Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 4, 43–56 (2023). This study excellently illustrates the power of sequence models to model temporal relationships while maintaining interpretability.

    Article  PubMed  Google Scholar 

  114. Manuilova, I. et al. Identifications of similarity metrics for patients with cancer: protocol for a scoping review. JMIR Res. Protoc. 13, e58705 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  115. Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  116. Osipov, A. et al. The molecular twin artificial-intelligence platform integrates multi-omic data to predict outcomes for pancreatic adenocarcinoma patients. Nat. Cancer 5, 299–314 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  117. Najgebauer, H. et al. CELLector: genomics-guided selection of cancer in vitro models. Cell Syst. 10, 424–432.e6 (2020).

    Article  PubMed  CAS  Google Scholar 

  118. Sinha, R., Luna, A., Schultz, N. & Sander, C. A pan-cancer survey of cell line tumor similarity by feature-weighted molecular profiles. Cell Rep. Methods 1, 100039 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  119. Zhao, Y. et al. CUP-AI-Dx: a tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence. EBioMedicine 61, 103030 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  120. Vibert, J. et al. Identification of tissue of origin and guided therapeutic applications in cancers of unknown primary using deep learning and RNA sequencing (TransCUPtomics). J. Mol. Diagn. 23, 1380–1392 (2021).

    Article  PubMed  CAS  Google Scholar 

  121. Darmofal, M. et al. Deep-learning model for tumor-type prediction using targeted clinical genomic sequencing data. Cancer Discov. 14, 1064–1081 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  122. Bick, A. G. et al. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).

    Article  Google Scholar 

  123. Subhashini, R. & Kumar, V. J. S. Evaluating the performance of similarity measures used in document clustering and information retrieval. In Proc. First International Conference on Integrated Intelligent Computing 27–31 (IEEE, 2010).

  124. Parimbelli, E., Marini, S., Sacchi, L. & Bellazzi, R. Patient similarity for precision medicine: a systematic review. J. Biomed. Inform. 83, 87–96 (2018).

    Article  PubMed  CAS  Google Scholar 

  125. Cross, J. L., Choma, M. A. & Onofrey, J. A. Bias in medical AI: implications for clinical decision-making. PLoS Digit. Health 3, e0000651 (2024). This study outlines several biases that must be considered for successful AI applications within medicine broadly, especially model developers.

    Article  PubMed  PubMed Central  Google Scholar 

  126. Collins, G. S. et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 384, e074819 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  127. Hantel, A. et al. Perspectives of oncologists on the ethical implications of using artificial intelligence for cancer care. JAMA Netw. Open 7, e244077 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  128. Dai, L., Zhu, H. & Liu, D. Patient similarity: methods and applications. Preprint at https://doi.org/10.48550/arXiv.2012.01976 (2020).

  129. Aldrighetti, C. M., Niemierko, A., Van Allen, E., Willers, H. & Kamran, S. C. Racial and ethnic disparities among participants in precision oncology clinical studies. JAMA Netw. Open 4, e2133205 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  130. Kamran, S. C. et al. Tumor mutations across racial groups in a real-world data registry. JCO Precis. Oncol. 5, 1654–1658 (2021).

    Article  PubMed  Google Scholar 

  131. Cheung, A. T. M. et al. Racial and ethnic disparities in a real-world precision oncology data registry. NPJ Precis. Oncol. 7, 1–6 (2023).

    Google Scholar 

  132. Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  133. Kehl, K. L. et al. Shareable artificial intelligence to extract cancer outcomes from electronic health records for precision oncology research. Nat. Commun. 15, 1–11 (2024).

    Article  Google Scholar 

  134. Ehrmann, D. E., Joshi, S., Goodfellow, S. D., Mazwi, M. L. & Eytan, D. Making machine learning matter to clinicians: model actionability in medical decision-making. NPJ Digit. Med. 6, 1–5 (2023).

    Article  Google Scholar 

  135. Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  136. Riley, R. D. et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ 384, e074820 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  137. Riley, R. D. et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ 384, e074821 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  138. la Roi-Teeuw, H. M. et al. Don’t be misled: 3 misconceptions about external validation of clinical prediction models. J. Clin. Epidemiol. 172, 111387 (2024).

    Article  PubMed  Google Scholar 

  139. Petersen, C. et al. Recommendations for the safe, effective use of adaptive CDS in the US healthcare system: an AMIA position paper. J. Am. Med. Inform. Assoc. 28, 677–684 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  140. Ong, J. C. L. et al. Medical ethics of large language models in medicine. NEJM AI 1, AIra2400038 (2024).

    Article  Google Scholar 

  141. Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021). This critical review encourages model developers to focus on model validation instead of interpretability.

    Article  PubMed  CAS  Google Scholar 

  142. Gilbert, S. & Kather, J. N. Guardrails for the use of generalist AI in cancer care. Nat. Rev. Cancer 24, 357–358 (2024).

    Article  PubMed  CAS  Google Scholar 

  143. Zhou, L. et al. Larger and more instructable language models become less reliable. Nature 634, 61–68 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  144. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  145. Lipkova, J. & Kather, J. N. The age of foundation models. Nat. Rev. Clin. Oncol. 21, 769–770 (2024).

    Article  PubMed  Google Scholar 

  146. Okun, S. A., Lu, D., Sew, K., Subramaniam, A. & Lockwood, W. W. MET activation in lung cancer and response to targeted therapies. Cancers 17, 281 (2025).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  147. Rodon, J. et al. Genomic and transcriptomic profiling expands precision cancer medicine: the WINTHER trial. Nat. Med. 25, 751–758 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  148. Vaske, O. M. et al. Comparative tumor RNA sequencing analysis for difficult-to-treat pediatric and young adult patients with cancer. JAMA Netw. Open 2, e1913968 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  149. Wong, M. et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer. Nat. Med. 26, 1742–1753 (2020).

    Article  PubMed  CAS  Google Scholar 

  150. Yates, J. & Van Allen, E. M. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 43, 708–727 (2025).

    Article  PubMed  CAS  Google Scholar 

  151. Rehm, H. L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1, 100029 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  152. Shick, A. A. et al. Transparency of artificial intelligence/machine learning-enabled medical devices. NPJ Digit. Med. 7, 1–4 (2024).

    Article  Google Scholar 

  153. Bonneville, R. et al. Landscape of microsatellite instability across 39 cancer types. JCO Precis. Oncol. 1, 1–15 (2017).

    Article  Google Scholar 

  154. Nguyen, L. et al. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  155. Jia, P. et al. MSIsensor-pro: fast, accurate, and matched-normal-sample-free detection of microsatellite instability. Genom. Proteom. Bioinform. 18, 65–71 (2020).

    Article  Google Scholar 

  156. Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).

    Article  PubMed  CAS  Google Scholar 

  157. Ziegler, J. et al. A deep multiple instance learning framework improves microsatellite instability detection from tumor next generation sequencing. Nat. Commun. 16, 136 (2025). This paper presents a deep learning model that increases performance of MSI detection relative to status quo bioinformatic tools while also enabling tissue conservation.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  158. Sztupinszki, Z. et al. Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer. NPJ Breast Cancer 4, 1–4 (2018).

    Article  CAS  Google Scholar 

  159. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  160. Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  161. Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  162. Gulhan, D. C., Lee, J. J.-K., Melloni, G. E. M., Cortés-Ciriano, I. & Park, P. J. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat. Genet. 51, 912–919 (2019).

    Article  PubMed  CAS  Google Scholar 

  163. Laprovitera, N. et al. Cancer of unknown primary: challenges and progress in clinical management. Cancers 13, 451 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  164. Belenkaya, R. et al. Extending the OMOP common data model and standardized vocabularies to support observational cancer research. JCO Clin. Cancer Inform. 5, 12–20 (2021).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank T. Feldman, S. Jiang, H. Jun, J. Karam, T. O’Meara, W. Mei, T. Pappa, J. Park, D. Reshef, E. Saad, M. Shady and M. Vergara for the many conversations involving the content of this Perspective; H. Jun, M. Shady and A. H. Wagner for reading the text and providing valued feedback; and J. Han for providing helpful citations about the adoption of DRAGEN.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to all aspects of the article.

Corresponding authors

Correspondence to Aedin C. Culhane or Eliezer M. Van Allen.

Ethics declarations

Competing interests

B.R. has filed institutional patents on methods for clinical interpretation. E.M.V.A. has received research support (to institution) from Novartis, BMS, Sanofi and NextPoint. E.M.V.A. serves as a consultant or on scientific advisory boards of Novartis Institute for Biomedical Research, Serinus Bio and TracerBio. E.M.V.A. has equity in Tango Therapeutics, Genome Medical, Genomic Life, Enara Bio, Manifold Bio, Microsoft, Monte Rosa, Riva Therapeutics, Serinus Bio, Syapse and Tracer Bio. E.M.V.A. has received speaking fees from TD Cowen. E.M.V.A. has filed institutional patents on chromatin mutations and immunotherapy response and on methods for clinical interpretation, and has intermittent legal consulting on patents for Foley Hoag LLP. E.M.V.A. serves on the editorial board of Science Advances. A.C.C. declares no competing interests.

Peer review

Peer review information

Nature Reviews Cancer thanks Amalio Telenti and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

2016 PrecisionFDA Truth Challenge: https://precision.fda.gov/challenges/truth/results

AACR Project GENIE: https://www.aacr.org/professionals/research/aacr-project-genie/

PrecisionFDA Truth Challenge V2: https://precision.fda.gov/challenges/10/results

UK Biobank: https://www.ukbiobank.ac.uk/

Glossary

Ancillary models

Supplementary models used to complement a primary model, often used to improve interpretability.

Autoencoder

A type of neural network model that encodes input data into a compressed representation and then decodes it, aiming to minimize the error between input and decoded data during training, known as the reconstruction error.

Basket trials

Clinical trials that enrol patients based on the presence of specific genomic alterations, regardless of cancer type.

Clinicogenomic datasets

Datasets that contain both clinical and genomic data for the same patients.

Data warehouses

Structured, centralized repositories of data that are curated and maintained by institutions and laboratories to manage their clinical and genomic data in an integrated manner.

Embeddings

Compressed representations of input data learned by machine learning models.

Federated learning

A machine learning approach wherein models are allowed to train on data from multiple institutions without sharing the underlying data between participants.

Fine-tuning

Further training of pretrained, often general-purpose models using a smaller, specialized dataset to adapt it for a specific application.

Hard-filtering approaches

The filtering of sequence variants using predefined thresholds on metrics such as allelic fraction, mapping quality and sequencing depth.

ImageNet

A publicly available database developed in the late 2000s, containing over 12 million manually annotated images at the time of publication that served as a foundational benchmark and training dataset for subsequent deep learning models in object recognition and image classification.

Immortal time bias

A bias occurring within clinicogenomic cohorts, wherein outcome events do not consider patients who died before sequencing and, thus, would not have been included in the cohort.

Model hallucinations

Outputs produced by a machine learning model that are incorrect or misleading and do not seem to be directly based on training data, often used in the context of large language models and generative artificial intelligence.

Molecular foundation models

Deep learning models that have been trained on a vast array of information relating to molecular biology that are often used by machine learning developers as a basis for more specialized models.

Multiplexed assays of variant effects

(MAVEs). High-throughput assays that assess the functional impact of thousands of sequence variants in parallel.

Nearest-neighbour comparisons

A framework for identifying and ordering data points by relative proximity within a representation space defined by a chosen similarity metric.

Patient similarity

The process of identifying and comparing patients that share key traits, such as clinical or genomic features within the context of precision oncology.

Segmental duplications

Short sequences of DNA that are highly similar and appear multiple times within a genome.

Sequence model

A type of neural network that is designed to model data that are provided in the order of their occurrence (in sequence).

Shapley values

A metric used to assess the relative importance of input features to the output prediction of a model.

Transfer learning

Adapting a pretrained model for a different but related task.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reardon, B., Culhane, A.C. & Van Allen, E.M. Convergence of machine learning and genomics for precision oncology. Nat Rev Cancer (2026). https://doi.org/10.1038/s41568-025-00897-6

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41568-025-00897-6

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer