Incorporating large language models as clinical decision support in oncology: the Woollie model

Heydari, Kimia; Enichen, Elizabeth J.; Li, Ben; Kvedar, Joseph C.

doi:10.1038/s41746-025-01941-3

Download PDF

Editorial
Open access
Published: 18 August 2025

Incorporating large language models as clinical decision support in oncology: the Woollie model

Kimia Heydari¹,
Elizabeth J. Enichen¹,
Ben Li^2,3 &
…
Joseph C. Kvedar^1,4

npj Digital Medicine volume 8, Article number: 529 (2025) Cite this article

4328 Accesses
4 Citations
6 Altmetric
Metrics details

Subjects

Integrating large language models (LLMs) into oncology holds promise for clinical decision support. Woollie is an LLM recently developed by Zhu et al., fine-tuned using radiology impression notes from Memorial Sloan Kettering Cancer Center and externally validated on UCSF oncology datasets. This methodology prioritizes data accuracy, preempts catastrophic forgetting, and demonstrates unparalleled rigor in predicting the progression of various cancer types. This work establishes a foundation for reliable, scalable, and equitable applications of LLMs in oncology.

Introduction

The effectiveness of oncology treatments depends on how cancer responds, as observed through radiological or pathological assessments. Tracking tumor regression in response to chemotherapy via serial radiologic imaging is critical for assessing treatment efficacy and guiding ongoing clinical management¹. However, these important data points are frequently documented as real-world data in non-standardized and unstructured formats, making them difficult to access and interpret—especially when leveraging Large Language Models (LLMs) in oncology². Besides non-standardized and unstructured formats of documenting tumor progression, subspecialty knowledge barriers and privacy concerns related to the deployment of closed-source LLMs in clinical settings complicate the integration of LLMs in oncology³.

Still, the incorporation of LLMs in oncology could serve as a helpful decision support tool for clinicians. Rapidly expanding medical literature presents a challenge to oncologists seeking optimized and targeted cancer therapies for their patients. Equally, in gathering large-scale data about tumor progression, LLMs can facilitate large-scale, systematic analysis of tumor progression data, potentially informing both individualized care and public health strategies by identifying patterns in metastasis and treatment efficacy⁴. Developing and facilitating prompts for LLMs to derive clinical factors has been proven efficient in extracting and collating crucial information from large medical records⁵. In streamlining data extraction from clinical reports (such as radiology interpretations or progress notes) in a reproducible manner, LLMs reduce manual labor and thereby alleviate time constraints on clinicians⁶.

The Woollie training model

Considering privacy concerns surrounding the implementation of LLMs in clinical decision-making, the increasing need for specialized oncology knowledge, and the complexity of extracting and collating real-world data, Menglei Zhu et al. developed Woollie, a dedicated LLM that is trained on real-world data from Memorial Sloan Kettering Cancer Center and

is thereby specialized for interpreting oncological radiology reports². In their paper entitled “Large Language Model Trained on Clinical Oncology Data Predicts Cancer Progression,” Zhu et al. demonstrate that this model surpasses existing LLMs in terms of medical knowledge benchmarks, including PubMedQA, MedMCQA, and USMLE⁷. In addition, Zhu et al. extended their validation to include an independent dataset of 600 radiology impressions involving 600 unique patients from the University of California, San Francisco medical center. Given the complexity and breadth of oncological knowledge, the authors enhanced Woollie’s analytical skills through a stacked alignment process. Through this process, the LLM is trained on various and interdependent fields of understanding cancer care and cancer progression: the LLM is first trained on a foundational model and then fine-tuned with increasingly domain-specific databases and validation tools, such as the most recent medical benchmarks and external datasets. This approach is necessary given the complexity and depth of oncological data and ensures that increased specialization preserves foundational knowledge. Resultingly, the model demonstrated excellent performance for predicting the progression of various cancer types, including lung, breast, and prostate cancer (AUROC 0.97 and 0.88 on internal and external validation data, respectively).

A strategy against catastrophic forgetting

Catastrophic forgetting occurs when an LLM loses previously learned knowledge while acquiring new knowledge for achieving a satisfactory performance in downstream tasks, especially in fields requiring complex subspecialty knowledge⁸. A core contribution of Zhu et al.’s work is their meticulous training approach: by employing the aforementioned stacked alignment process, they minimize catastrophic forgetting while expanding the model’s specialized knowledge base⁸. This training model ensures that Woollie LLMs preserves general domain competencies of reasoning, conversation, and information extraction while building upon each successive model iteration to enhance medical domain proficiency. This capacity is critical in incorporating LLMs in oncology, where confidently delivered but incorrect information can have severe consequences on patient care and trust. When combined with persistent attainment of clinical performance benchmarks, this robust training strategy suggests the safe integration of AI models into clinical decision-making for cancer progression prediction.

Potential for scalable cancer data

Given recent advances in training LLMs to retain broad knowledge while gaining domain-specific expertise in oncology, models like Woollie could be utilized alongside other LLMs to scale and systematize knowledge about cancer progression across different cancer centers⁹. Verification and validation of Woollie data across multiple cancer sites will strengthen the generalizability of this model. When aligned with established AI governance frameworks for clinical care, operations, and research in oncology, the Woollie model can be scaled responsibly and ethically across multiple institutions, both nationally and globally¹⁰. By thoroughly integrating a wide range of clinical protocols and perspectives across cancer care institutions, scaled LLMs can limit non-evidence-based variation in clinical recommendations and therefore promote more equitable cancer care worldwide. From a public health standpoint, access to such systematized data provides an opportunity to enhance population-level insights into cancer progression.

Conclusion

Efforts to support and systematize clinical decision-making must preserve clinical accuracy, as sustaining patients’ trust and confidence is critical, especially in cancer care. Due to both robust model training and external validation against UCSF datasets, Woollie is an LLM that prioritizes data accuracy and safeguards against catastrophic forgetting in delivering clinical decision support for predicting cancer progression. This advancement paves the way for making cancer care more scalable and equitable.

Data availability

No datasets were generated or analysed during the current study.

References

Ko, CC., Yeh, LR. & Kuo, YT. et al. Imaging biomarkers for evaluating tumor response: RECIST and beyond. Biomark Res. 9, 52, https://doi.org/10.1186/s40364-021-00306-8 (2021).
Article PubMed PubMed Central Google Scholar
Zhu, M. et al. Large language model trained on clinical oncology data predicts cancer progression. Npj Digit. Med. 8, 397 (2025).
Article PubMed PubMed Central Google Scholar
Chen, S. et al. Use of artificial intelligence Chatbots for cancer treatment information. JAMA Oncol. 9, 1459–1462 (2023).
Article PubMed PubMed Central Google Scholar
Fountzilas, E., Pearce, T., Baysal, M. A., Chakraborty, A. & Tsimberidou, A. M. Convergence of evolving artificial intelligence and machine learning techniques in precision oncology. NPJ Digit. Med. 8, 75 (2025).
Article PubMed PubMed Central Google Scholar
Choi, H. S., Song, J. Y., Shin, K. H., Chang, J. H. & Jang, B. S. Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer. Radiat. Oncol. J. 41, 209–216 (2023).
Article PubMed PubMed Central Google Scholar
Huang, J. et al. A critical assessment of using ChatGPT for extracting structured data from clinical notes. npj Digit. Med. 7, 106 (2024).
Article PubMed PubMed Central Google Scholar
Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950 (2025).
Article CAS PubMed PubMed Central Google Scholar
Mermillod, M., Aurélia, B. & Patrick, B. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Front. Psychol. 4, 504 (2013).
Article PubMed PubMed Central Google Scholar
Lavery, J. A. et al. A scalable quality assurance process for curating oncology electronic health records: the project GENIE biopharma collaborative approach. JCO Clin. Cancer Inform. 6, e2100105 (2022).
Article PubMed PubMed Central Google Scholar
Stetson, P. D. et al. Responsible artificial intelligence governance in oncology. npj Digit. Med. 8, 407 (2025).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This editorial did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Harvard Medical School, Boston, MA, USA
Kimia Heydari, Elizabeth J. Enichen & Joseph C. Kvedar
Division of Vascular Surgery, University of Toronto, Toronto, ON, Canada
Ben Li
Temerty Centre for Artificial Intelligence Research and Education in Medicine, University of Toronto, Toronto, ON, Canada
Ben Li
Massachusetts General Hospital, Harvard University, Boston, MA, USA
Joseph C. Kvedar

Authors

Kimia Heydari
View author publications
Search author on:PubMed Google Scholar
Elizabeth J. Enichen
View author publications
Search author on:PubMed Google Scholar
Ben Li
View author publications
Search author on:PubMed Google Scholar
Joseph C. Kvedar
View author publications
Search author on:PubMed Google Scholar

Contributions

K.H. wrote the first draft of the manuscript. E.J.E., B.L. and J.C.K. provided time-sensitive, critical revisions. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Kimia Heydari.

Ethics declarations

Competing interests

J.C.K. is the editor-in-chief of npj Digital Medicine. All other authors declare no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Heydari, K., Enichen, E.J., Li, B. et al. Incorporating large language models as clinical decision support in oncology: the Woollie model. npj Digit. Med. 8, 529 (2025). https://doi.org/10.1038/s41746-025-01941-3

Download citation

Received: 31 July 2025
Accepted: 08 August 2025
Published: 18 August 2025
Version of record: 18 August 2025
DOI: https://doi.org/10.1038/s41746-025-01941-3

This article is cited by

Implementing generative artificial intelligence in precision oncology: safety, governance, and significance
- Ryuji Hamamoto
- Takafumi Koyama
- Noboru Yamamoto
Journal of Hematology & Oncology (2026)
Foundation models in oncology win benchmarks but miss the clinic
- Zhiyun Duan
- Qihao Duan
- Roland Eils
Nature Cancer (2025)
The npj Digital Medicine Editorial Fellowship
- Ben Li
npj Digital Medicine (2025)