Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Artificial intelligence agents in cancer research and oncology

Abstract

Since 2022, artificial intelligence (AI) methods have progressed far beyond their established capabilities of data classification and prediction. Large language models (LLMs) can perform logical reasoning, enabling them to plan and orchestrate complex workflows. By using this planning ability and equipped with the ability to act upon their environment, LLMs can function as agents. Agents are (semi-)autonomous systems capable of sensing, learning and acting upon their environments. As such, they can interact with external knowledge or external software and can execute sequences of tasks with minimal or no human input. In cancer research and oncology, evidence for the capability of AI agents is rapidly emerging. From autonomously optimizing drug design and development to proposing therapeutic strategies for clinical cases, AI agents can handle complex, multistep problems that were not addressable by previous generations of AI systems. Despite rapid developments, many translational and clinical cancer researchers still lack clarity regarding the precise capabilities, limitations, and ethical or regulatory frameworks associated with AI agents. Here we provide a primer on AI agents for cancer researchers and oncologists. We illustrate how this technology is set apart from and goes beyond traditional AI systems. We discuss existing and emerging applications in cancer research and address real-world challenges from the perspective of academic, clinical and industrial research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Types of artificial intelligence agent architectures.
Fig. 2: Artificial intelligence agents in cancer research.
Fig. 3: A multi-agent framework for oncological treatment decisions, which integrates diverse medical data.

Similar content being viewed by others

References

  1. Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

    Article  CAS  PubMed  Google Scholar 

  3. Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023). Demonstration of an LLM-based agent (Coscientist) autonomously planning and executing real-world scientific experiments, marking a milestone for AI agents in research.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).

    Article  Google Scholar 

  5. Kaiser, J., Lauscher, A. & Eichler, A. Large language models for human-machine collaborative particle accelerator tuning through natural language. Sci. Adv. 11, eadr4173 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Russell, S. & Norvig, P. Artificial Intelligence (Pearson, 1999).

  7. ANTHROP\C. Building effective AI agents. https://www.anthropic.com/engineering/building-effective-agents (2024).

  8. Zou, J. & Topol, E. J. The rise of agentic AI teammates in medicine. Lancet 405, 457 (2025).

    Article  PubMed  Google Scholar 

  9. Google Cloud. What is an AI agent? https://cloud.google.com/discover/what-are-ai-agents (2025).

  10. Ray, S. AI agents — what they are, and how they’ll change the way we work. Source https://news.microsoft.com/source/features/ai/ai-agents-what-they-are-and-how-theyll-change-the-way-we-work/ (2024).

  11. AWS. What are AI agents? https://aws.amazon.com/what-is/ai-agents/ (2025).

  12. Lee, Y., Ferber, D., Rood, J. E., Regev, A. & Kather, J. N. How AI agents will change cancer research and oncology. Nat. Cancer 5, 1765–1767 (2024).

    Article  PubMed  Google Scholar 

  13. Vaswani, A. et al. Attention is all you need. Preprint at https://doi.org/10.48550/arXiv.1706.03762 (2017). This study introduced the transformer architecture that underpins all modern LLMs and AI agents.

  14. Radford, A. et al. Language models are unsupervised multitask learners. OpenAI https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (2019).

  15. Zhao, W. X. et al. A survey of large language models. Preprint at https://doi.org/10.48550/arXiv.2303.18223 (2023).

  16. Brown, T. B. et al. Language models are few-shot learners. NeurIPS 33, 1877–1901 (2020). This study demonstrated that LLMs can perform diverse tasks with minimal examples, establishing the paradigm of in-context learning that enables agentic capabilities.

    Google Scholar 

  17. Truhn, D., Reis-Filho, J. S. & Kather, J. N. Large language models should be used as scientific reasoning engines, not knowledge databases. Nat. Med. 29, 2983–2984 (2023).

    Article  CAS  PubMed  Google Scholar 

  18. Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://doi.org/10.48550/arXiv.2303.12712 (2023).

  19. Hendrycks, D. et al. Measuring massive multitask language understanding. Preprint at https://doi.org/10.48550/arXiv.2009.03300 (2020).

  20. ANTHROP\C. Claude’s extended thinking. https://www.anthropic.com/research/visible-extended-thinking (2025).

  21. Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ferber, D. et al. In-context learning enables multimodal large language models to classify cancer pathology images. Nat. Commun. 15, 10104 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. OpenAI o1 System Card. OpenAI https://openai.com/index/openai-o1-system-card/ (2024).

  25. Brodeur, P. G. et al. Superhuman performance of a large language model on the reasoning tasks of a physician. Preprint at https://doi.org/10.48550/arXiv.2412.10849 (2024).

  26. Hao, S. et al. Training large language models to reason in a continuous latent space. Preprint at https://doi.org/10.48550/arXiv.2412.06769 (2024).

  27. OpenAI. Introducing OpenAI o3 and o4-mini. https://openai.com/index/introducing-o3-and-o4-mini/ (2025).

  28. Yuksekgonul, M. et al. Optimizing generative AI by backpropagating language model feedback. Nature 639, 609–616 (2025).

    Article  CAS  PubMed  Google Scholar 

  29. Bostrom, N. in Machine Ethics and Robot Ethics 69–75 (Routledge, 2020).

  30. Smit, A., Duckworth, P., Grinsztajn, N., Barrett, T. D. & Pretorius, A. Should we be going MAD? A look at multi-agent debate strategies for LLMs. Preprint at https://doi.org/10.48550/arXiv.2311.17371 (2023).

  31. Wu, Y. et al. ProAI: Proactive multi-agent conversational AI with structured knowledge base for psychiatric diagnosis. Preprint at https://doi.org/10.48550/arXiv.2502.20689v2 (2025).

  32. Yao, S. et al. ReAct: Synergizing reasoning and acting in language models. Preprint at https://doi.org/10.48550/arXiv.2210.03629 (2022). This study introduced the ReAct framework combining reasoning traces with actions, providing a foundational architecture for modern AI agents.

  33. Wang, E. et al. TxGemma: efficient and agentic LLMs for therapeutics. Preprint at https://doi.org/10.48550/arXiv.2504.06196 (2025).

  34. Shanahan, M., McDonell, K. & Reynolds, L. Role play with large language models. Nature 623, 493–498 (2023).

    Article  CAS  PubMed  Google Scholar 

  35. Moritz, M., Topol, E. & Rajpurkar, P. Coordinated AI agents for advancing healthcare. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01363-2 (2025).

    Article  PubMed  Google Scholar 

  36. Schaeffer, R. Pretraining on the test set is all you need. Preprint at https://doi.org/10.48550/arXiv.2309.08632 (2023).

  37. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). This study introduced Med-PaLM, demonstrating that LLMs can achieve expert-level performance on medical question answering and establishing benchmarks for clinical AI.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Phan, L. et al. Humanity’s last exam. Preprint at https://doi.org/10.48550/arXiv.2501.14249 (2025).

  40. Mitchener, L. et al. BixBench: a comprehensive benchmark for LLM-based agents in computational biology. Preprint at https://doi.org/10.48550/arXiv.2503.00096 (2025).

  41. Huang, K. et al. Biomni: a general-purpose biomedical AI agent. Bioinformatics https://doi.org/10.1101/2025.05.30.656746 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Schaefer, M. et al. Multimodal learning enables chat-based exploration of single-cell data. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02857-9 (2025).

    Article  PubMed  Google Scholar 

  43. Doshi, A. R. & Hauser, O. P. Generative AI enhances individual creativity but reduces the collective diversity of novel content. Sci. Adv. 10, eadn5290 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Si, C., Yang, D. & Hashimoto, T. Can LLMs generate novel research ideas? A large-scale human study with 100+ NLP researchers. Preprint at https://doi.org/10.48550/arXiv.2409.04109 (2024).

  45. Baek, J., Jauhar, S. K., Cucerzan, S. & Hwang, S. J. ResearchAgent: iterative research idea generation over scientific literature with large language models. Preprint at https://doi.org/10.48550/arXiv.2404.07738 (2024).

  46. Roohani, Y. et al. BioDiscoveryAgent: an AI agent for designing genetic perturbation experiments. Preprint at https://doi.org/10.48550/arXiv.2405.17631 (2024).

  47. Schmidgall, S. & Moor, M. AgentRxiv: towards collaborative autonomous research. Preprint at https://doi.org/10.48550/arXiv.2503.18102 (2025).

  48. Zhang, K. et al. Artificial intelligence in drug development. Nat. Med. 31, 45–59 (2025).

    Article  CAS  PubMed  Google Scholar 

  49. Schmidgall, S. et al. Agent laboratory: using LLM agents as research assistants. Preprint at https://doi.org/10.48550/arXiv.2501.04227 (2025).

  50. Swanson, K., Wu, W., Bulaong, N. L., Pak, J. E. & Zou, J. The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation. Bioinformatics https://doi.org/10.1101/2024.11.11.623004 (2024).

  51. Wang, H. et al. SpatialAgent: an autonomous AI agent for spatial biology. Bioinformatics https://doi.org/10.1101/2025.04.03.646459 (2025).

  52. Lu, C. et al. The AI scientist: towards fully automated open-ended scientific discovery. Preprint at https://doi.org/10.48550/arXiv.2408.06292 (2024).

  53. Yamada, Y. et al. The AI scientist-v2: workshop-level automated scientific discovery via agentic tree search. Preprint at https://doi.org/10.48550/arXiv.2504.08066 (2025).

  54. Wölflein, G., Ferber, D., Truhn, D., Arandjelović, O. & Kather, J. N. LLM agents making agent tools. Preprint at https://doi.org/10.48550/arXiv.2502.11705 (2025).

  55. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).

    Article  CAS  PubMed  Google Scholar 

  56. Williams, C. Y. K. et al. Physician- and large language model-generated hospital discharge summaries. JAMA Intern. Med. https://doi.org/10.1001/jamainternmed.2025.0821 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Goodell, A. J., Chu, S. N., Rouholiman, D. & Chu, L. F. Large language model agents can use tools to perform clinical calculations. NPJ Digit. Med. 8, 163 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Litière, S., Collette, S., de Vries, E. G. E., Seymour, L. & Bogaerts, J. RECIST — learning from the past to build the future. Nat. Rev. Clin. Oncol. 14, 187–192 (2017).

    Article  PubMed  Google Scholar 

  59. Liu, F. et al. RiskAgent: autonomous medical AI copilot for generalist risk prediction. Preprint at https://doi.org/10.48550/arXiv.2503.03802 (2025).

  60. Ferber, D. et al. GPT-4 for information retrieval and comparison of medical oncology guidelines. NEJM AI https://doi.org/10.1056/AIcs2300235 (2024).

    Article  Google Scholar 

  61. Ferber, D. et al. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer 6, 1337–1349 (2025). Peer-reviewed validation of an autonomous AI agent for oncology clinical decision support in a tumour board setting.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. NeurIPS 35, 22199–22213 (2022).

    Google Scholar 

  63. Liévin, V., Hother, C. E., Motzfeldt, A. G. & Winther, O. Can large language models reason about medical questions? Preprint at https://doi.org/10.48550/arXiv.2207.08143 (2022).

  64. Gao, S. et al. TxAgent: an AI agent for therapeutic reasoning across a universe of tools. Preprint at https://doi.org/10.48550/arXiv.2503.10970 (2025).

  65. Hasjim, B. J. et al. The AI agent in the room: informing objective decision making at the transplant selection committee. Transplantation https://doi.org/10.1101/2024.12.06.24318575 (2024).

    Article  Google Scholar 

  66. Wang, S. et al. Empowering medical multi-agents with clinical consultation flow for dynamic diagnosis. Preprint at https://doi.org/10.48550/arXiv.2503.16547 (2025).

  67. Kather, J. N., Ferber, D., Wiest, I. C., Gilbert, S. & Truhn, D. Large language models could make natural language again the universal interface of healthcare. Nat. Med. 30, 2708–2710 (2024).

    Article  CAS  PubMed  Google Scholar 

  68. Palepu, A. et al. Towards conversational AI for disease management. Preprint at https://doi.org/10.48550/arXiv.2503.06074 (2025).

  69. Ferber, D. et al. End-to-end clinical trial matching with large language models. Preprint at https://doi.org/10.48550/arXiv.2407.13463 (2024).

  70. Lukac, S. et al. Evaluating ChatGPT as an adjunct for the multidisciplinary tumor board decision-making in primary breast cancer cases. Arch. Gynecol. Obstet. 308, 1831–1844 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Schmidl, B. et al. Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for recurrent/metastatic head and neck cancer cases – the first study on ChatGPT 4o and a comparison to ChatGPT 4.0. Front. Oncol. 14, 1455413 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Nardone, V. et al. The role of artificial intelligence on tumor boards: perspectives from surgeons, medical oncologists and radiation oncologists. Curr. Oncol. 31, 4984–5007 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Ghezloo, F. et al. PathFinder: a multi-modal multi-agent system for medical diagnostic decision-making applied to histopathology. Preprint at https://doi.org/10.48550/arXiv.2502.08916 (2025).

  74. Sinsky, C. et al. Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann. Intern. Med. 165, 753–760 (2016).

    Article  PubMed  Google Scholar 

  75. Rotenstein, L. et al. Virtual scribes and physician time spent on electronic health records. JAMA Netw. Open 7, e2413140 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  76. Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 183, 589–596 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Chen, S. et al. The effect of using a large language model to respond to patient messages. Lancet Digit. Health 6, e379–e381 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Bock, A. Using a virtual scribe may shorten EHR time. JAMA 332, 188 (2024).

    PubMed  Google Scholar 

  79. Maddox, T. M. et al. Generative AI in medicine — evaluating progress and challenges. N. Engl. J. Med. https://doi.org/10.1056/NEJMsb2503956 (2025).

    Article  PubMed  Google Scholar 

  80. Bastubbe, Y., Jain, D. & Torti, F. Frontier Technologies in Industrial Operations: The Rise of Artificial Intelligence Agents. White Paper (World Economic Forum, 2025).

  81. Blease, C. R., Locher, C., Gaab, J., Hägglund, M. & Mandl, K. D. Generative artificial intelligence in primary care: an online survey of UK general practitioners. BMJ Health Care Inform. 31, e101102 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Umeton, R. et al. GPT-4 in a cancer center — institute-wide deployment challenges and lessons learned. NEJM AI https://doi.org/10.1056/AIcs2300191 (2024).

    Article  Google Scholar 

  83. Gilbert, S., Harvey, H., Melvin, T., Vollebregt, E. & Wicks, P. Large language model AI chatbots require approval as medical devices. Nat. Med. 29, 2396–2398 (2023).

    Article  CAS  PubMed  Google Scholar 

  84. Jiang, Y. et al. MedAgentBench: a virtual EHR environment to benchmark medical LLM agents. NEJM AI https://doi.org/10.1056/AIdbp2500144 (2025).

  85. Schmidgall, S. et al. AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments. Preprint at https://doi.org/10.48550/arXiv.2405.07960 (2024).

  86. Rodman, A., Zwaan, L., Olson, A. & Manrai, A. K. When it comes to benchmarks, humans are the only way. NEJM AI https://doi.org/10.1056/AIe2500143 (2025).

    Article  Google Scholar 

  87. Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3, 199–217 (2021).

    Article  Google Scholar 

  88. Zhang, A., Xing, L., Zou, J. & Wu, J. C. Shifting machine learning for healthcare from development to deployment and from models to data. Nat. Biomed. Eng. 6, 1330–1345 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  89. Schmidt, C. M. D. Anderson breaks with IBM Watson, raising questions about artificial intelligence in oncology. J. Natl Cancer Inst. https://doi.org/10.1093/jnci/djx113 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Dratsch, T. et al. Automation bias in mammography: the impact of artificial intelligence BI-RADS suggestions on reader performance. Radiology 307, e222176 (2023).

    Article  PubMed  Google Scholar 

  91. Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Han, T. et al. Medical large language models are susceptible to targeted misinformation attacks. NPJ Digit. Med. 7, 288 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  93. Clusmann, J. et al. Prompt injection attacks on vision language models in oncology. Nat. Commun. 16, 1239 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Savage, T., Nayak, A., Gallo, R., Rangan, E. & Chen, J. H. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. NPJ Digit. Med. 7, 20 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  95. Abgrall, G., Holder, A. L., Chelly Dagdia, Z., Zeitouni, K. & Monnet, X. Should AI models be explainable to clinicians? Crit. Care 28, 301 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Gilbert, S., Dai, T. & Mathias, R. Consternation as Congress proposal for autonomous prescribing AI coincides with the haphazard cuts at the FDA. NPJ Digit. Med. 8, 165 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Balch, J. A. et al. Machine learning-enabled clinical information systems using fast healthcare interoperability resources data standards: scoping review. JMIR Med. Inform. 11, e48297 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  98. Lee, H. et al. The impact of generative AI on critical thinking: self-reported reductions in cognitive effort and confidence effects from a survey of knowledge workers. In CHI Conference on Human Factors in Computing Systems (CHI ’25) 1–22 (Association for Computing Machinery, 2025).

  99. Fanous, A. et al. SycEval: evaluating LLM sycophancy. Preprint at https://doi.org/10.48550/arXiv.2502.08177 (2025).

  100. Wiest, I. C. et al. Large language models for clinical decision support in gastroenterology and hepatology. Nat. Rev. Gastroenterol. Hepatol. https://doi.org/10.1038/s41575-025-01108-1 (2025).

    Article  PubMed  Google Scholar 

  101. Rein, D. et al. GPQA: A graduate-level Google-proof Q&A benchmark. Preprint at https://doi.org/10.48550/arXiv.2311.12022 (2023).

  102. Kazemi, M. et al. BIG-bench extra hard. In Proc. 63rd Annu. Meet. Assoc. Comput. Linguist. Vol. 1, 26473–26501 (ACL, 2025).

  103. Liang, P. et al. Holistic evaluation of language models. Preprint at https://doi.org/10.48550/arXiv.2211.09110 (2022).

  104. Khandekar, N. et al. MedCalc-bench: evaluating large language models for medical calculations. Preprint at https://doi.org/10.48550/arXiv.2406.12036 (2024).

Download references

Acknowledgements

J.N.K. is supported by the German Cancer Aid (DECADE, 70115166), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET TRANSCAN; Come2Data, 16DKZ2044A; DEEP-HCC, 031L0315A), the German Academic Exchange Service (SECAI, 57616814), the European Union’s Horizon Europe and innovation programme (ODELIA, 101057091; GENIAL, 101096312), the European Research Council (ERC; NADIR, 101114631), the National Institutes of Health (EPICO, R01 CA263318) and the National Institute for Health and Care Research (NIHR, NIHR203331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. This work was funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Author information

Authors and Affiliations

Authors

Contributions

D.T., L.C.A., F.M. and J.N.K. researched data for the article. All authors contributed substantially to discussion of the content. D.T., S.A., J.Z., L.C.A. and J.N.K. wrote the article. All authors reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Jakob Nikolas Kather.

Ethics declarations

Competing interests

J.N.K. declares consulting services for Bioptimus, France; Panakeia, UK; AstraZeneca, UK; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI, Germany; Synagen, Germany; and Ignition Lab, Germany; has received an institutional research grant by GSK and AstraZeneca; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer and Fresenius. D.T. received honoraria for lectures by Bayer, GE, Roche, AstraZeneca and Philips and holds shares in StratifAI GmbH, Germany, and in Synagen GmbH, Germany. F.M. is a scientific adviser for and holds shares in Modella AI and is an adviser for Danaher. S.A. is an employee of Alphabet and may own stock as part of the standard compensation package. J.Z. and L.C.A. declare no competing interests.

Peer review

Peer review information

Nature Reviews Cancer thanks Anant Madabhushi, Wayne Zhao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Chain-of-thought reasoning

A prompting technique that encourages language models to generate intermediate reasoning steps before arriving at a final answer, improving performance on complex tasks.

Contraindications

Clinical conditions or factors that make a particular treatment or procedure inadvisable because of potential harm to the patient.

Deep learning

A subset of machine learning that uses artificial neural networks with multiple layers to learn hierarchical representations of data.

Differential diagnoses

A systematic process of distinguishing between diseases or conditions that share similar clinical features to identify the most likely diagnosis.

Edge case

An unusual or extreme scenario that occurs at the boundaries of normal operating conditions, often revealing limitations in system performance.

Hyperparameters

Configuration settings defined before model training that control the learning process, such as learning rate, batch size and network architecture choices.

Large language model

(LLM). Type of artificial intelligence model trained on vast amounts of text data to understand and generate human language, capable of performing diverse language tasks without task-specific training.

Multi-turn conversation

A dialogue consisting of multiple exchanges between a user and an AI system, in which context from previous turns informs subsequent responses.

Natural language processing

(NLP). A field of AI artificial intelligence focused on enabling computers to understand, interpret and generate human language.

Parsing documents

The computational process of analysing and extracting structured information from unstructured or semi-structured text documents.

Precompiled reports

Standardized documents generated in advance or from templates, typically containing structured clinical or research data ready for review.

Reinforcement learning

A machine learning paradigm in which an agent learns to make decisions by receiving feedback in the form of rewards or penalties based on its actions.

Token

The basic unit of text processed by a language model, which may represent a word, subword or character depending on the tokenization scheme.

Transformer architecture

A neural network design that uses self-attention mechanisms to process sequential data in parallel, forming the foundation of modern LLMs.

Vision language model

An AI model capable of processing and relating both visual information (such as images) and textual data within a unified framework.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Truhn, D., Azizi, S., Zou, J. et al. Artificial intelligence agents in cancer research and oncology. Nat Rev Cancer 26, 256–269 (2026). https://doi.org/10.1038/s41568-025-00900-0

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41568-025-00900-0

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer