Abstract
The number of data points per patient considered at the point-of-care in precision cancer medicine continues to increase, and it is accompanied by a growing challenge of translating these observations into clinical insights. This is a time-intensive and laborious process for oncology professionals and molecular tumour boards. As large clinicogenomic datasets and data-sharing protocols mature alongside machine learning methods, molecular diagnostic workflows have an opportunity to integrate these tools. This integration can help extract more information from next-generation sequencing data, enhance cancer variant interpretation, streamline case review and generate therapeutic hypotheses for biomarker-negative patients at the point-of-care. Although machine learning holds promise for precision oncology, responsible implementation and model evaluation remain essential for clinical adoption.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
References
Suehnholz, S. P. et al. Quantifying the expanding landscape of clinical actionability for patients with cancer. Cancer Discov. 14, 49–65 (2023).
Horak, P. & Fröhling, S. Measuring progress in precision oncology. Cancer Discov. 14, 18–19 (2024).
Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 1, 1–16 (2017).
Griffith, M. et al. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat. Genet. 49, 170–174 (2017).
Reardon, B. et al. Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology. Nat. Cancer 2, 1102–1112 (2021).
Luchini, C., Lawlor, R. T., Milella, M. & Scarpa, A. Molecular tumor boards in clinical practice. Trends Cancer 6, 738–744 (2020).
Gladstone, B. P. et al. Systematic review and meta-analysis of molecular tumor board data on clinical effectiveness and evaluation gaps. NPJ Precis. Oncol. 9, 96 (2025).
Nichetti, F. et al. Real-world outcomes of molecular tumor board treatment recommendations. JCO Precis. Oncol. 9, e2400387 (2025).
The AACR Project GENIE Consortium et al. AACR Project GENIE: powering precision medicine through an international consortium. Cancer Discov. 7, 818–831 (2017).
Pugh, T. J. et al. AACR project GENIE: 100,000 cases and beyond. Cancer Discov. 12, 2044–2057 (2022).
Wang, S. & Ye, K. Deep-learning based representation and recognition for genome variants — from SNVs to structural variants. Natl Sci. Rev. 11, nwae335 (2024).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018). This paper, the publication of DeepVariant, brought the proliferation of machine learning to bioinformatics, demonstrating that traditional heuristic and statistical approaches to variant calling could be outperformed.
Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
AlDubayan, S. H. et al. Detection of pathogenic variants with germline genetic testing using deep learning vs standard methods in patients with prostate cancer and melanoma. JAMA 324, 1957–1969 (2020).
Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).
Olson, N. D. et al. PrecisionFDA Truth Challenge V2: calling variants from short and long reads in difficult-to-map regions. Cell Genom. 2, 100129 (2022). This paper illustrates the methodological shift of variant callers towards using machine learning while also highlighting challenge areas for future developers.
Mandiracioglu, B. et al. ECOLE: learning to call copy number variants on whole exome sequencing data. Nat. Commun. 15, 132 (2024).
Popic, V. et al. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat. Methods 20, 559–568 (2023).
Behera, S. et al. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat. Biotechnol. 43, 1177–1191 (2024).
Yi, R., Chang, P.-C., Baid, G. & Carroll, A. Learning from data-rich problems: a case study on genetic variant calling. Preprint at https://doi.org/10.48550/arXiv.1911.05151 (2019).
Scheffler, K. et al. Somatic small-variant calling methods in Illumina DRAGENTM Secondary Analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.03.23.534011 (2023).
Park, J. et al. Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02839-x (2025).
Betschart, R. O. et al. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci. Rep. 12, 21502 (2022).
Roy, S. et al. Standards and guidelines for validating next-generation sequencing bioinformatics pipelines. J. Mol. Diagn. 20, 4–27 (2018).
van de Haar, J. et al. ESMO recommendations on clinical reporting of genomic test results for solid cancers. Ann. Oncol. 35, 954–967 (2024).
Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).
Holmes, J. B., Moyer, E., Phan, L., Maglott, D. & Kattman, B. SPDI: data model for variants and applications at NCBI. Bioinformatics 36, 1902–1907 (2020).
Wang, M. et al. hgvs: a python package for manipulating sequence variants using HGVS nomenclature: 2018 update. Hum. Mutat. 39, 1803–1813 (2018).
Lefter, M. et al. Mutalyzer 2: next generation HGVS nomenclature checker. Bioinformatics 37, 2811–2817 (2021).
van Giffen, B., Herhausen, D. & Fahse, T. Overcoming the pitfalls and perils of algorithms: a classification of machine learning biases and mitigation methods. J. Bus. Res. 144, 93–106 (2022).
Singh, D. & Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 97, 105524 (2020).
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 50, D20–D26 (2022).
Freeman, P. J., Hart, R. K., Gretton, L. J., Brookes, A. J. & Dalgleish, R. VariantValidator: accurate validation, mapping, and formatting of sequence variation descriptions. Hum. Mutat. 39, 61–68 (2018).
Freeman, P. J. et al. Standardizing variant naming in literature with VariantValidator to increase diagnostic rates. Nat. Genet. 56, 2284–2286 (2024).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Wagner, A. H. et al. The GA4GH Variation Representation Specification: a computational framework for variation representation and federated identification. Cell Genom. 1, 100027 (2021). This paper shows that VRS enables semantically precise, computable variant representation that facilitates further downstream bioinformatic applications and machine learning models.
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
Arbesfeld, J. A. et al. Mapping MAVE data for use in human genomics applications. Genome Biol. 26, 179 (2025).
Pagel, K. A. et al. Integrated informatics analysis of cancer-related variants. JCO Clin. Cancer Inform. 4, 310–317 (2020).
Bruijn, I. et al. Genome Nexus: a comprehensive resource for the annotation and interpretation of genomic variants in cancer. JCO Clin. Cancer Inform. 6, e2100144 (2022).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Durkie, M. et al. ACGS Best Practice Guidelines for Variant Classification in Rare Disease (ACGS, 2024).
Horak, P. et al. Standards for the classification of pathogenicity of somatic variants in cancer (oncogenicity): joint recommendations of Clinical Genome Resource (ClinGen), Cancer Genomics Consortium (CGC), and Variant Interpretation for Cancer Consortium (VICC). Genet. Med. 24, 986–998 (2022).
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).
Brandes, N., Goldman, G., Wang, C. H., Ye, C. J. & Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023).
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). This paper shows DeepMind’s AlphaMissense and introduces it as a transformative deep learning model for missense variant effect prediction that was rigorously evaluated for its utility within pathogenicity assessments.
Kurtovic-Kozaric, A. et al. Comprehensive evaluation of AlphaMissense predictions by evidence quantification for variants of uncertain significance. Front. Genet. 15, 1487608 (2024).
Muiños, F., Martínez-Jiménez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).
Demajo, S. et al. Identification of clonal hematopoiesis driver mutations through in silico saturation mutagenesis. Cancer Discov. 14, 1717–1731 (2024).
Vihinen, M. Problems in variation interpretation guidelines and in their implementation in computational tools. Mol. Genet. Genom. Med. 8, e1206 (2020).
Fayer, S. et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021).
Rubin, A. F. et al. MaveDB 2024: a curated community database with over seven million variant effects from multiplexed functional assays. Genome Biol. 26, 13 (2025).
Arafeh, R., Shibue, T., Dempster, J. M., Hahn, W. C. & Vazquez, F. The present and future of the cancer dependency map. Nat. Rev. Cancer 25, 59–73 (2025).
Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. Preprint at bioRxiv https://doi.org/10.1101/2025.02.18.638918 (2025).
Avsec, Ž. et al. AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model. Preprint at bioRxiv https://doi.org/10.1101/2025.06.25.661532 (2025).
Li, M. M. et al. Standards and guidelines for the interpretation and reporting of sequence variants in cancer. J. Mol. Diagn. 19, 4–23 (2017).
Mateo, J. et al. A framework to rank genomic alterations as targets for cancer precision medicine: the ESMO Scale for Clinical Actionability of Molecular Targets (ESCAT). Ann. Oncol. 29, 1895–1902 (2018).
He, M. M. et al. Variant Interpretation for Cancer (VIC): a computational tool for assessing clinical impacts of somatic variants. Genome Med. 11, 53 (2019).
Li, Q. et al. CancerVar: an artificial intelligence-empowered platform for clinical interpretation of somatic mutations in cancer. Sci. Adv. 8, eabj1624 (2022).
Ruzicka, J. et al. Clinical evaluation of an AI system for streamlined variant interpretation in genetic testing. Preprint at medRxiv https://doi.org/10.1101/2025.02.04.25321641 (2025).
Lammert, J. et al. Large language models for precision oncology: clinical decision support through expert-guided learning. J. Clin. Oncol. 42, e13609 (2024).
Klein, H. et al. MatchMiner: an open-source platform for cancer precision medicine. NPJ Precis. Oncol. 6, 69 (2022). The authors introduce a clinical trial matching platform and a structured format for enrolment criteria to facilitate clinical trial matching for precision oncology, addressing a historically intractable problem within the field.
Lotter, W. et al. Artificial intelligence in oncology: current landscape, challenges, and future directions. Cancer Discov. 14, 711–726 (2024).
Wong, C. et al. Scaling clinical trial matching using large language models: a case study in oncology. In Proc. 8th Machine Learning for Healthcare Conference 846–862 (PMLR, 2023).
Jin, Q. et al. Matching patients to clinical trials with large language models. Nat. Commun. 15, 9074 (2024).
Cerami, E. et al. MatchMiner-AI: an open-source solution for cancer clinical trial matching. Preprint at https://doi.org/10.48550/arXiv.2412.17228 (2024).
Reisle, C. et al. Evaluating language models for biomedical fact-checking: a benchmark dataset for cancer variant interpretation verification. Preprint at bioRxiv https://doi.org/10.1101/2025.09.10.675443 (2025).
Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33, 9459–9474 (Curran Associates, 2020).
Jun, H. et al. Implementing a context-augmented large language model to guide precision cancer medicine. Preprint at medRxiv https://doi.org/10.1101/2025.05.09.25327312 (2025).
Schick, T. et al. Toolformer: language models can teach themselves to use tools. In Advances in Neural Information Processing Systems 36, 68539–68551 (Curran Associates, 2023).
Yao, S. et al. ReAct: synergizing reasoning and acting in language models. Preprint at https://doi.org/10.48550/arXiv.2210.03629 (2023).
Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning across a universe of tools. Preprint at https://doi.org/10.48550/arXiv.2503.10970 (2025).
Ferber, D. et al. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer 6, 1337–1349 (2025). This study is one of the most prominent illustrations of agentic AI systems being applied to precision oncology to support a wide array of clinical decision-making tasks.
Benary, M. et al. Leveraging large language models for decision support in personalized oncology. JAMA Netw. Open 6, e2343689 (2023).
Verlingue, L. et al. Artificial intelligence in oncology: ensuring safe and effective integration of language models in clinical practice. Lancet Reg. Health Eur. 46, 101064 (2024).
Elemento, O., Khozin, S. & Sternberg, C. N. The use of artificial intelligence for cancer therapeutic decision-making. NEJM AI 2, AIra2401164 (2025).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Yang, K., Qinami, K., Fei-Fei, L., Deng, J. & Russakovsky, O. Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 547–558 (ACM, 2020).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Acebedo, A. et al. Collaborating across sectors in service of open science, precision oncology, and patients: an overview of the AACR Project GENIE (Genomics Evidence Neoplasia Information Exchange) Biopharma Collaborative (BPC). ESMO Real World Data Digit. Oncol. 7, 100097 (2025).
Painter, C. A. et al. The Angiosarcoma Project: enabling genomic and clinical discoveries in a rare cancer through patient-partnered research. Nat. Med. 26, 181–187 (2020).
Crowdis, J. et al. A patient-driven clinicogenomic partnership for metastatic prostate cancer. Cell Genom. 2, 100169 (2022).
Lee, E., Jung, S. Y., Hwang, H. J. & Jung, J. Patient-level cancer prediction models from a nationwide patient cohort: model development and validation. JMIR Med. Inform. 9, e29807 (2021).
Placido, D. et al. A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories. Nat. Med. 29, 1113–1122 (2023).
Buk Cardoso, L. et al. Machine learning for predicting survival of colorectal cancer patients. Sci. Rep. 13, 8874 (2023).
Moon, I. et al. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat. Med. 29, 2057–2067 (2023).
Jee, J. et al. Automated real-world data integration improves cancer outcome prediction. Nature 636, 728–736 (2024). This paper shows MSKCC leveraging their data warehouse to develop a machine learning model to predict clinical outcomes, a paradigm that will continue to define clinicogenomic discoveries in the near term.
Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 1–7 (2020).
Sheller, M. J. et al. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 12598 (2020).
Pati, S. et al. Federated learning enables big data for rare cancer boundary detection. Nat. Commun. 13, 7346 (2022).
Brauneck, A. et al. Federated machine learning in data-protection-compliant research. Nat. Mach. Intell. 5, 2–4 (2023).
Ogier du Terrail, J. et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat. Med. 29, 135–146 (2023).
Stark, Z. et al. A call to action to scale up research and clinical genomic data sharing. Nat. Rev. Genet. 26, 141–147 (2024). This study outline several steps to data sharing and harmonization that can enable clinicogenomic datasets of thousands of patients with cancer, enabling biological discovery and machine learning models that generalize across institutions.
Fiume, M. et al. Federated discovery and sharing of genomic data using Beacons. Nat. Biotechnol. 37, 220–224 (2019). This study describes the Beacon protocol of GA4GH for federated data sharing, and it has become ubiquitous with federated learning within genomics.
Elhussein, A., Baymuradov, U., Elhadad, N., Natarajan, K. & Gürsoy, G. A framework for sharing of clinical and genetic data for precision medicine applications. Nat. Med. 30, 3578–3589 (2024).
Cho, H. et al. Secure and federated genome-wide association studies for biobank-scale datasets. Nat. Genet. 57, 809–814 (2025).
Hanser, T. et al. Data-driven federated learning in drug discovery with knowledge distillation. Nat. Mach. Intell. 7, 423–436 (2025).
Riba, M. et al. The 1+Million Genomes Minimal Dataset for Cancer. Nat. Genet. 56, 733–736 (2024).
Kehl, K. L. et al. Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncol. 5, 1421–1429 (2019).
Kehl, K. L. et al. Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin. Cancer Inform. 4, 680–690 (2020).
Sushil, M. et al. CORAL: expert-curated oncology reports to advance language model inference. NEJM AI 1, AIdbp2300110 (2024).
Hoes, L. R. et al. Patients with rare cancers in the Drug Rediscovery Protocol (DRUP) benefit from genomics-guided treatment. Clin. Cancer Res. 28, 1402–1411 (2022).
Helland, Å et al. Improving public cancer care by implementing precision medicine in Norway: IMPRESS-Norway. J. Transl. Med. 20, 225 (2022).
Mohammad, S. F. H. et al. The evolution of precision oncology: the ongoing impact of the Drug Rediscovery Protocol (DRUP). Acta Oncol. 63, 34885 (2024).
Nikolski, M. et al. Roadmap for a European cancer data management and precision medicine infrastructure. Nat. Cancer 5, 367–372 (2024).
Sweeney, S. M. et al. Challenges to using big data in cancer. Cancer Res. 83, 1175–1182 (2023).
Seligson, N. D. et al. Recommendations for patient similarity classes: results of the AMIA 2019 Workshop on Defining Patient Similarity. J. Am. Med. Inform. Assoc. 27, 1808–1812 (2020). This study provides a conceptual roadmap for the development and implementation of patient similarity approaches within medicine broadly.
Allam, A., Dittberner, M., Sintsova, A., Brodbeck, D. & Krauthammer, M. Patient similarity analysis with longitudinal health data. Preprint at https://doi.org/10.48550/arXiv.2005.06630 (2020).
Jia, Z., Zeng, X., Duan, H., Lu, X. & Li, H. A patient-similarity-based model for diagnostic prediction. Int. J. Med. Inf. 135, 104073 (2020).
Navaz, A. N. et al. A novel patient similarity network (PSN) framework based on multi-model deep learning for precision medicine. J. Pers. Med. 12, 768 (2022).
Wang, N. et al. Sequential data-based patient similarity framework for patient outcome prediction: algorithm development. J. Med. Internet Res. 24, e30720 (2022).
Savcisens, G. et al. Using sequences of life-events to predict human lives. Nat. Comput. Sci. 4, 43–56 (2023). This study excellently illustrates the power of sequence models to model temporal relationships while maintaining interpretability.
Manuilova, I. et al. Identifications of similarity metrics for patients with cancer: protocol for a scoping review. JMIR Res. Protoc. 13, e58705 (2024).
Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).
Osipov, A. et al. The molecular twin artificial-intelligence platform integrates multi-omic data to predict outcomes for pancreatic adenocarcinoma patients. Nat. Cancer 5, 299–314 (2024).
Najgebauer, H. et al. CELLector: genomics-guided selection of cancer in vitro models. Cell Syst. 10, 424–432.e6 (2020).
Sinha, R., Luna, A., Schultz, N. & Sander, C. A pan-cancer survey of cell line tumor similarity by feature-weighted molecular profiles. Cell Rep. Methods 1, 100039 (2021).
Zhao, Y. et al. CUP-AI-Dx: a tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence. EBioMedicine 61, 103030 (2020).
Vibert, J. et al. Identification of tissue of origin and guided therapeutic applications in cancers of unknown primary using deep learning and RNA sequencing (TransCUPtomics). J. Mol. Diagn. 23, 1380–1392 (2021).
Darmofal, M. et al. Deep-learning model for tumor-type prediction using targeted clinical genomic sequencing data. Cancer Discov. 14, 1064–1081 (2024).
Bick, A. G. et al. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).
Subhashini, R. & Kumar, V. J. S. Evaluating the performance of similarity measures used in document clustering and information retrieval. In Proc. First International Conference on Integrated Intelligent Computing 27–31 (IEEE, 2010).
Parimbelli, E., Marini, S., Sacchi, L. & Bellazzi, R. Patient similarity for precision medicine: a systematic review. J. Biomed. Inform. 83, 87–96 (2018).
Cross, J. L., Choma, M. A. & Onofrey, J. A. Bias in medical AI: implications for clinical decision-making. PLoS Digit. Health 3, e0000651 (2024). This study outlines several biases that must be considered for successful AI applications within medicine broadly, especially model developers.
Collins, G. S. et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 384, e074819 (2024).
Hantel, A. et al. Perspectives of oncologists on the ethical implications of using artificial intelligence for cancer care. JAMA Netw. Open 7, e244077 (2024).
Dai, L., Zhu, H. & Liu, D. Patient similarity: methods and applications. Preprint at https://doi.org/10.48550/arXiv.2012.01976 (2020).
Aldrighetti, C. M., Niemierko, A., Van Allen, E., Willers, H. & Kamran, S. C. Racial and ethnic disparities among participants in precision oncology clinical studies. JAMA Netw. Open 4, e2133205 (2021).
Kamran, S. C. et al. Tumor mutations across racial groups in a real-world data registry. JCO Precis. Oncol. 5, 1654–1658 (2021).
Cheung, A. T. M. et al. Racial and ethnic disparities in a real-world precision oncology data registry. NPJ Precis. Oncol. 7, 1–6 (2023).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Kehl, K. L. et al. Shareable artificial intelligence to extract cancer outcomes from electronic health records for precision oncology research. Nat. Commun. 15, 1–11 (2024).
Ehrmann, D. E., Joshi, S., Goodfellow, S. D., Mazwi, M. L. & Eytan, D. Making machine learning matter to clinicians: model actionability in medical decision-making. NPJ Digit. Med. 6, 1–5 (2023).
Vaccaro, M., Almaatouq, A. & Malone, T. When combinations of humans and AI are useful: a systematic review and meta-analysis. Nat. Hum. Behav. 8, 2293–2303 (2024).
Riley, R. D. et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ 384, e074820 (2024).
Riley, R. D. et al. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study. BMJ 384, e074821 (2024).
la Roi-Teeuw, H. M. et al. Don’t be misled: 3 misconceptions about external validation of clinical prediction models. J. Clin. Epidemiol. 172, 111387 (2024).
Petersen, C. et al. Recommendations for the safe, effective use of adaptive CDS in the US healthcare system: an AMIA position paper. J. Am. Med. Inform. Assoc. 28, 677–684 (2021).
Ong, J. C. L. et al. Medical ethics of large language models in medicine. NEJM AI 1, AIra2400038 (2024).
Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021). This critical review encourages model developers to focus on model validation instead of interpretability.
Gilbert, S. & Kather, J. N. Guardrails for the use of generalist AI in cancer care. Nat. Rev. Cancer 24, 357–358 (2024).
Zhou, L. et al. Larger and more instructable language models become less reliable. Nature 634, 61–68 (2024).
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Lipkova, J. & Kather, J. N. The age of foundation models. Nat. Rev. Clin. Oncol. 21, 769–770 (2024).
Okun, S. A., Lu, D., Sew, K., Subramaniam, A. & Lockwood, W. W. MET activation in lung cancer and response to targeted therapies. Cancers 17, 281 (2025).
Rodon, J. et al. Genomic and transcriptomic profiling expands precision cancer medicine: the WINTHER trial. Nat. Med. 25, 751–758 (2019).
Vaske, O. M. et al. Comparative tumor RNA sequencing analysis for difficult-to-treat pediatric and young adult patients with cancer. JAMA Netw. Open 2, e1913968 (2019).
Wong, M. et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer. Nat. Med. 26, 1742–1753 (2020).
Yates, J. & Van Allen, E. M. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 43, 708–727 (2025).
Rehm, H. L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1, 100029 (2021).
Shick, A. A. et al. Transparency of artificial intelligence/machine learning-enabled medical devices. NPJ Digit. Med. 7, 1–4 (2024).
Bonneville, R. et al. Landscape of microsatellite instability across 39 cancer types. JCO Precis. Oncol. 1, 1–15 (2017).
Nguyen, L. et al. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).
Jia, P. et al. MSIsensor-pro: fast, accurate, and matched-normal-sample-free detection of microsatellite instability. Genom. Proteom. Bioinform. 18, 65–71 (2020).
Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).
Ziegler, J. et al. A deep multiple instance learning framework improves microsatellite instability detection from tumor next generation sequencing. Nat. Commun. 16, 136 (2025). This paper presents a deep learning model that increases performance of MSI detection relative to status quo bioinformatic tools while also enabling tissue conservation.
Sztupinszki, Z. et al. Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer. NPJ Breast Cancer 4, 1–4 (2018).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution. Genome Biol. 17, 31 (2016).
Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).
Gulhan, D. C., Lee, J. J.-K., Melloni, G. E. M., Cortés-Ciriano, I. & Park, P. J. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat. Genet. 51, 912–919 (2019).
Laprovitera, N. et al. Cancer of unknown primary: challenges and progress in clinical management. Cancers 13, 451 (2021).
Belenkaya, R. et al. Extending the OMOP common data model and standardized vocabularies to support observational cancer research. JCO Clin. Cancer Inform. 5, 12–20 (2021).
Acknowledgements
We thank T. Feldman, S. Jiang, H. Jun, J. Karam, T. O’Meara, W. Mei, T. Pappa, J. Park, D. Reshef, E. Saad, M. Shady and M. Vergara for the many conversations involving the content of this Perspective; H. Jun, M. Shady and A. H. Wagner for reading the text and providing valued feedback; and J. Han for providing helpful citations about the adoption of DRAGEN.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to all aspects of the article.
Corresponding authors
Ethics declarations
Competing interests
B.R. has filed institutional patents on methods for clinical interpretation. E.M.V.A. has received research support (to institution) from Novartis, BMS, Sanofi and NextPoint. E.M.V.A. serves as a consultant or on scientific advisory boards of Novartis Institute for Biomedical Research, Serinus Bio and TracerBio. E.M.V.A. has equity in Tango Therapeutics, Genome Medical, Genomic Life, Enara Bio, Manifold Bio, Microsoft, Monte Rosa, Riva Therapeutics, Serinus Bio, Syapse and Tracer Bio. E.M.V.A. has received speaking fees from TD Cowen. E.M.V.A. has filed institutional patents on chromatin mutations and immunotherapy response and on methods for clinical interpretation, and has intermittent legal consulting on patents for Foley Hoag LLP. E.M.V.A. serves on the editorial board of Science Advances. A.C.C. declares no competing interests.
Peer review
Peer review information
Nature Reviews Cancer thanks Amalio Telenti and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
2016 PrecisionFDA Truth Challenge: https://precision.fda.gov/challenges/truth/results
AACR Project GENIE: https://www.aacr.org/professionals/research/aacr-project-genie/
PrecisionFDA Truth Challenge V2: https://precision.fda.gov/challenges/10/results
UK Biobank: https://www.ukbiobank.ac.uk/
Glossary
- Ancillary models
-
Supplementary models used to complement a primary model, often used to improve interpretability.
- Autoencoder
-
A type of neural network model that encodes input data into a compressed representation and then decodes it, aiming to minimize the error between input and decoded data during training, known as the reconstruction error.
- Basket trials
-
Clinical trials that enrol patients based on the presence of specific genomic alterations, regardless of cancer type.
- Clinicogenomic datasets
-
Datasets that contain both clinical and genomic data for the same patients.
- Data warehouses
-
Structured, centralized repositories of data that are curated and maintained by institutions and laboratories to manage their clinical and genomic data in an integrated manner.
- Embeddings
-
Compressed representations of input data learned by machine learning models.
- Federated learning
-
A machine learning approach wherein models are allowed to train on data from multiple institutions without sharing the underlying data between participants.
- Fine-tuning
-
Further training of pretrained, often general-purpose models using a smaller, specialized dataset to adapt it for a specific application.
- Hard-filtering approaches
-
The filtering of sequence variants using predefined thresholds on metrics such as allelic fraction, mapping quality and sequencing depth.
- ImageNet
-
A publicly available database developed in the late 2000s, containing over 12 million manually annotated images at the time of publication that served as a foundational benchmark and training dataset for subsequent deep learning models in object recognition and image classification.
- Immortal time bias
-
A bias occurring within clinicogenomic cohorts, wherein outcome events do not consider patients who died before sequencing and, thus, would not have been included in the cohort.
- Model hallucinations
-
Outputs produced by a machine learning model that are incorrect or misleading and do not seem to be directly based on training data, often used in the context of large language models and generative artificial intelligence.
- Molecular foundation models
-
Deep learning models that have been trained on a vast array of information relating to molecular biology that are often used by machine learning developers as a basis for more specialized models.
- Multiplexed assays of variant effects
-
(MAVEs). High-throughput assays that assess the functional impact of thousands of sequence variants in parallel.
- Nearest-neighbour comparisons
-
A framework for identifying and ordering data points by relative proximity within a representation space defined by a chosen similarity metric.
- Patient similarity
-
The process of identifying and comparing patients that share key traits, such as clinical or genomic features within the context of precision oncology.
- Segmental duplications
-
Short sequences of DNA that are highly similar and appear multiple times within a genome.
- Sequence model
-
A type of neural network that is designed to model data that are provided in the order of their occurrence (in sequence).
- Shapley values
-
A metric used to assess the relative importance of input features to the output prediction of a model.
- Transfer learning
-
Adapting a pretrained model for a different but related task.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Reardon, B., Culhane, A.C. & Van Allen, E.M. Convergence of machine learning and genomics for precision oncology. Nat Rev Cancer (2026). https://doi.org/10.1038/s41568-025-00897-6
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41568-025-00897-6


