Towards human-centric intelligent treatment planning for radiation therapy

Jafar, Adnan; Jia, Xun

doi:10.1038/s41746-026-02339-5

Download PDF

Perspective
Open access
Published: 10 January 2026

Towards human-centric intelligent treatment planning for radiation therapy

Adnan Jafar¹ &
Xun Jia¹

npj Digital Medicine volume 9, Article number: 155 (2026) Cite this article

2104 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Current radiation therapy treatment planning is limited by suboptimal plan quality, inefficiency, and high costs. This perspective paper explores the complexity of treatment planning and introduces Human-Centric Intelligent Treatment Planning (HCITP), an AI-driven framework under human oversight, which integrates clinical guidelines, automates plan generation, and enables direct interaction with planners. We expect that HCITP will enhance efficiency, potentially reducing planning time to minutes, and will deliver personalized, high-quality plans. Challenges and potential solutions are discussed.

Multicenter study on the versatility and adoption of AI-driven automated radiotherapy planning across cancer types

Article Open access 15 December 2025

Clinical integration of machine learning for curative-intent radiation treatment of patients with prostate cancer

Article 03 June 2021

Radiological artificial intelligence - predicting personalized immunotherapy outcomes in lung cancer

Article Open access 21 November 2023

Introduction

Cancer is the second leading cause of death globally, with 18.74 million new cases and 9.7 million cancer-related deaths reported in 2022¹. Radiation therapy (RT), which uses high-energy radiation to damage cancer cell DNA², is a cornerstone of cancer treatment, benefiting more than two-thirds of cancer patients, either as a standalone therapy or in combination with other modalities like surgery or chemotherapy. Modern RT techniques, such as intensity-modulated radiotherapy and volumetric-modulated arc therapy, enable precise control of a medical linear accelerator (LINAC) for radiation delivery that conforms to the tumor shape while sparing healthy tissues, resulting in reduced toxicity compared to conventional methods, as demonstrated in numerous clinical studies across diverse cancer types^3,4,5,6.

The success of RT critically depends on treatment planning, a foundational step determining the LINAC control parameters to specify its operations, such as beam angle, radiation dose rate, and multi-leaf collimator motions to deliver the intended radiation dose (Fig. 1)⁷. Plans must satisfy two criteria: deliverability, which ensures physical execution by the LINAC, and acceptability, which confirms alignment with treatment intent. Achieving these criteria currently relies on collaboration between planners and plan evaluators, e.g., physicians and medical physicists, using a Treatment Planning System (TPS), a specialized software that models radiation production and its interaction with patient-specific anatomy based on fundamental physics principles, and generates plans through mathematical optimization. Despite being the standard practice, this workflow suffers from suboptimal plan quality, low efficiency, and high costs, all of which negatively impact healthcare outcomes.

**Fig. 1: Relationship between control parameters, LINAC, and dose distribution in RT.**

Historically, TPSs were designed to handle radiation physics modeling and plan optimization, delegating operational aspects to planners and allowing physicians to focus on patient care. However, a fundamental limitation of this workflow is the TPS’s lack of intelligence, requiring extensive human input. In recent years, Artificial Intelligence (AI) has significantly transformed medicine, including RT. Advances have shown remarkable progress in decision-making⁸, outperforming humans in complex tasks^9,10. Building on these advancements, there is a growing opportunity to address the challenges in treatment planning. In this perspective article, we aim to shed some light on the complexities of the treatment planning process and potential solutions with AI-based decision-making capabilities. Such solutions have the potential to streamline the planning process, overcoming the limitations of the current practice and generating substantial impacts.

Current treatment planning practice and its limitations

Treatment planning begins with a preparation stage, which includes the fusion of multimodal images to extract relevant clinical information, delineation of targets and organs at risk (OARs), and definition of the prescription specifying dose objectives for the targets and tolerance limits for the OARs (Fig. 2). This is followed by the plan generation stage, during which the treatment plan is created, and then by auxiliary steps that support treatment delivery, such as preparing setup instructions and planning documents. The current perspective focuses specifically on the plan generation stage within this overall workflow.

**Fig. 2: Current treatment planning workflow.**

The current practice for the plan generation stage follows an iterative process involving two primary interactions (Fig. 2). The first interaction is between a planner and TPS. After the planner defines dose distribution objectives in the objective function, the TPS solves the optimization problem while adhering to the LINAC’s physical constraints. The planner then repeatedly refines the objectives, guiding the TPS toward a plan that balances clinical objectives with technical feasibility. The second interaction involves the planner and the plan evaluators—typically the physician, who assesses the plan’s alignment with the clinical intent, and the medical physicist, who reviews its technical aspects. Feedback is then provided to the planner to further refine the plan. This cumbersome workflow presents significant limitations (Fig. 2).

Suboptimal plan quality undermines treatment outcomes

The optimal patient-specific plan is unknown, requiring planners to repeatedly interact with the TPS to explore the large solution space. The resulting plan quality heavily depends on human factors, including the planner’s experience, planner-evaluator communication, and the time allocated for planning^11,12. Suboptimal plans, e.g., those with unnecessarily high dose to healthy tissues, are frequently accepted unknowingly¹³. An analysis of the RTOG-0126 clinical trial found that 9.1% of patients received plans with unnecessarily 10% higher normal tissue complication risks, which could have been avoided with better planning¹⁴. These plans deteriorate outcomes. In head-and-neck cancer, suboptimal plans have been associated with a 20% lower 2-year overall survival and a 24% higher 2-year local-regional failure rates¹⁵.

Low planning efficiency delays treatment and impacts outcomes

The trial-and-error interaction between the planner and the TPS requires hours to generate a plan, while additional evaluator-planner iterations can extend this process to days or longer for complex cases. This tedious process prolongs the interval between diagnosis and RT initiation, which significantly impedes treatment outcomes. For example, in high-grade gliomas, each day of delay increases the risk of death by 2%¹⁶, and in head and neck cancer, RT delays can reduce loco-regional control by up to 12–14% per week¹⁷. Additionally, delayed planning increases the chance of anatomical changes during the waiting period, making the plans for the initial anatomy suboptimal at the time of delivery, while also exacerbating patient anxiety and distress. Notably, with the rising global incidence of cancer¹⁸, a 15% increase in new cases could lead to a 22.5% rise in waiting times¹⁹, highlighting the urgent need to streamline RT planning processes and mitigate treatment delays.

High costs burden healthcare systems

The current planning paradigm requires hospitals to hire professional planners, with a minimum ratio of one per 250 patients annually²⁰. This translates into significant costs for training ($145k per person)²¹, salaries (median $140k per person in the US in 2023), and other expenses that are ultimately passed on to patients and healthcare systems.

These limitations are particularly pronounced in adaptive RT^22,23, which frequently adjusts treatment plans to account for anatomical changes. Replanning tasks demands stringent plan quality under tight time constraints. In online adaptive RT, where planning occurs while the patient is on the treatment couch, planning must be completed within minutes—a daunting task under the current practice. The limitations are further amplified in low- and middle-income countries, where more than 50% of cancer patients requiring RT lack access to treatment²⁴. While efforts have been made to establish basic RT infrastructure like LINACs, the scarcity of trained personnel for treatment planning remains a critical bottleneck.

Existing efforts using AI to advance treatment planning

Substantial efforts have been made to address these limitations over the years. For example, knowledge-based planning builds predictive models to derive patient-specific optimal dose-volume histograms (DVHs), a widely used measure representing the radiation dose distributions within specific structures for evaluating plan quality, to guide treatment planning²⁵. In recent years, studies have incorporated AI technologies in this area. Our literature review (workflow in Supplementary Fig. 1) identified existing studies, which can be broadly categorized into two groups.

The first group included studies focusing on the acceptability criterion (Supplementary Table 1). Most studies leveraged deep neural networks to predict optimal dose distributions tailored to a patient’s anatomy^26,27. Yet, a key challenge remained—the deliverability of the predicted dose. As a result, these predictions primarily served as guidance for planners, who must use the TPS to approximate the predicted dose. This group also included studies that developed metrics to assess plan acceptability, providing additional guidance during treatment planning²⁸.

The second group of studies emphasized the deliverability criterion (Supplementary Table 2). To replicate the decision-making process of human planners, researchers employed reinforcement learning (RL) and other techniques to develop virtual planners capable of operating the TPS^29,30,31,32. These virtual planners have demonstrated performance comparable to human planners in head-to-head treatment planning competitions organized by scientific societies^33,34. More recently, Large Language Models (LLMs) were explored for autonomously adjusting organ priority weights³⁵. Additionally, studies attempted to directly predict LINAC control parameters based on patient anatomy³⁶ using a deep Q-network method³⁷.

Based on the literature review, existing attempts have primarily focused on addressing the two key criteria —deliverability and acceptability—separately. Moreover, a critical gap remains: these AI-based tools lack mechanisms for seamless interaction with physicians to incorporate their feedback, which is essential, as physicians are ultimately responsible for plan approval. With recent advances in AI demonstrating remarkable progress in decision-making and human-AI interaction⁸, it is both timely and feasible to rebuild the treatment planning paradigm.

Human-centric intelligent treatment planning

Overall scheme

We envision the next-generation treatment planning workflow (Fig. 3), called Human-Centric Intelligent Treatment Planning (HCITP), enabled by an agentic virtual planner composed of three decision-making modules (highlighted in green) to augment the TPS and interact directly with the human evaluator. Specifically, once the planning preparation stage is completed, HCITP immediately generates a preliminary treatment plan for review. The human evaluators, typically the physician focusing on clinical aspects and the medical physicist addressing technical considerations, provide feedback to refine the plan. This feedback is communicated directly to the virtual planner for implementation. The resulting iterative loop between the evaluators and the virtual planner facilitates rapid completion of the planning workflow while maintaining high plan quality. Notably, HCITP serves as a tool to facilitate treatment planning, with the final responsibility for plan approval and conflict resolution always resting with the physician to ensure that clinical priorities and patient-specific considerations are upheld.

**Fig. 3: New treatment planning workflow enabled by HCITP.**

At a high level, HCITP comprises three purposefully designed core modules. The first one is the Evaluation Module, responsible for assessing the quality of the treatment plan. Built on foundation models (FMs) with explainable AI techniques³⁸, it evaluates the plan with respect to clinical guidelines, physician preferences, as well as practical considerations. FMs refer to large-scale machine learning models trained on broad and diverse data that can be adapted to a wide range of downstream tasks with minimal task-specific tuning, serving as a versatile backbone for many AI applications³⁹. With FM, the Evaluation Module processes multimodal data, including clinical protocols, technical guidelines, medical images of various modalities, treatment plans, clinical notes, etc., to generate contextualized embedded states for evidence-driven, case-adapted treatment plan assessment. In addition, the module incorporates physician preferences, such as trade-offs among organ doses. While a plan may meet standard quality guidelines, it can still be rejected due to individual physician preferences. To address this, the Evaluation Module encodes these preferences based on historically approved plans, enabling HCITP to prioritize the plans most likely to receive physician approval. Furthermore, this module also assesses practical aspects of the plan related to deliverability, such as delivery time and plan modulation factor that reflects the complexity of a plan and hence the level of accuracy required by the LINAC to precisely deliver it. When building this module, explainable AI can be employed to make the decision-making processes transparent and understandable to humans³⁸. We also envision that a key feature of this module is continual learning, allowing it to monitor and integrate the up-to-date clinical and technical guidelines, ensuring evaluations remain aligned with current standards. The FM-based Evaluation Module can be built on task-specific encoders, for example, language models such as BioBERT⁴⁰ and PubMedBERT⁴¹ for clinical guidelines, imaging models such as nnU-Net⁴² and Swin-UNETR⁴³ for medical images, and NLP tools such as cTAKES⁴⁴ and ClinicalBERT⁴⁵ for clinical notes. Multimodal integration can be achieved through fusion at low-level features (early fusion), at the encoder level (mid fusion), or at the decision stage (late fusion), as well as through hybrid approaches, enabling the FM to reason jointly across modalities and to support case-adapted plan assessment for the execution module.

The second, the Execution Module replicates the decision-making capability of human planners and autonomously operates the TPS, aiming at generating deliverable treatment plans under the guidance of the Evaluation Module. This module can be built using RL, a machine learning technique in which an agent learns to make decisions through interactions with an environment. The goal is to discover policies that achieve specific objectives, such as treatment planning in HCITP, by maximizing a reward function, which serves as a numerical signal that reflects how favorable each decision is with respect to the defined goals⁴⁶. The reward function is derived from the Evaluation Module to quantify how well the treatment plan satisfies both clinical criteria and practical considerations. Training this module should incorporate human experience in operating the TPS. To enhance versatility, FMs may be used as the underlying architecture. The training dataset should include cases with diverse tumor sites, patient anatomies, and clinical conditions.

The third one is the Conversation Module powered by LLMs and speech recognition technologies. Its purpose is to keep humans in the loop under a smooth workflow by enabling interactive feedback and guidance throughout the planning process—hence the term human-centered in HCITP. This module enables real-time bi-directional communication with the evaluator by summarizing feedback on plan quality and prompting for clarification when needed. In contrast to the current clinical workflow, where the human evaluator’s feedback reaches the TPS indirectly through the human planner, the direct interaction between evaluators and the TPS ensures that clear, actionable input is relayed to the planning process, supporting real-time dynamic plan refinement.

Advantages of human-centric intelligent treatment planning

By establishing an AI-augmented treatment planning workflow under human oversight, HCITP holds several advantages over existing approaches and addresses key challenges (Table 1).

Table 1 Key features of HCITP and its advantages compared to the current treatment planning workflow

Full size table

Leveraging the few-shot learning capabilities of FMs due to the extensive pre-training and their ability to process and contextualize multimodal data, HCITP is designed to manage across various cancer sites while continuously incorporating up-to-date plan evaluation criteria. Integrating physician preferences with clinical guidelines ensures that plans are optimized not only for clinical quality but also for individual patient needs and physician-specific standards, enhancing personalization.

In terms of generating plans to meet treatment intent, the exploratory nature of RL enables the Execution Module to uncover novel planning strategies, potentially pushing the boundaries of achievable plan quality beyond existing clinical practices. This also provides educational value by offering insights into optimal treatment plans and planning strategies. The deliverability criterion is maintained through the direct integration of the Execution Module with the TPS, ensuring adherence to physics principles, machine constraints, and other practical requirements.

HCITP also streamlines the workflow by allowing human evaluators to provide direct, natural, and intuitive feedback to the planning process, which is dynamically processed by the Evaluation Module and then passed on to the Execution Module. This maintains human oversight for the treatment planning process and eliminates the intensive and iterative task of planners manually interpreting and encoding evaluators’ feedback in the current clinical workflow, holding the potential to reduce planning time from days to minutes. The resulting reduction in planning time will shorten the interval between diagnosis and treatment initiation, critical for improving outcomes, particularly in rapidly progressing tumors.

Finally, by reducing reliance on human planners, HCITP has the potential to lower costs and expand access to RT services, especially in resource-limited settings, ultimately enhancing global cancer care.

Notably, previous studies have explored similar concepts of human-centric RT planning, albeit under different terminologies^47,48,49. A particularly relevant analogy has been drawn between aviation and RT. In aviation, automation has been seamlessly integrated under pilot oversight, shifting the pilot’s role from direct control to system management while maintaining their critical decision-making authority⁴⁷. Similarly, in RT, automation is expected to enhance treatment planning without diminishing the essential roles of human experts. However, a key distinction in the current RT planning workflow lies in the division of human roles: planners generate treatment plans, while physicians approve them and provide feedback. This introduces an added layer of complexity: planner-physician communication, unlike the pilot model in aviation. To address this, HCITP redefines the workflow by positioning the physician as the central human component, directly interacting with AI automation through the Conversation Module, thereby streamlining the process and reducing inefficiencies.

Considerations on human-centric intelligent treatment planning

Given the revolutionary nature of HCITP, there are foreseeable challenges that call for our prompt actions toward the effective development of this system.

Challenges related to technology development

Model training

Well-validated, trusted data form the foundation for training the HCITP model^50,51. As with developing any AI-driven systems, collecting and processing such data presents significant challenges. Training the Evaluation Module should include clinical and practical guidelines on plan evaluation. Because it also assesses plans in the context of physician preferences, data collection efforts should include gathering physician-specific prior multimodal treatment plans paired with corresponding physician decisions. Planners’ actions in operating the TPS in the current practice and conversation data between physicians and human planners may be collected to train the Execution Module and the Conversation Module. Meanwhile, powerful generative models, like diffusion models, may be employed to synthesize data, reducing the burden for extensive data acquisition. Yet expert review by RT professionals is necessary to verify the plausibility and clinical relevance of the data. As the HCITP modules are developed, the integration of explainable AI techniques is essential to ensure transparency, trustworthiness, and reliability in the clinical decision-making process for RT treatment planning^52,53.

From the computational standpoint, training the Execution Module via the RL framework requires repeated interactions with the TPS to learn an optimal policy for operating it. This is computationally intensive, as the solution space expands rapidly when exploring complex operation strategies that experienced human planners can master. The challenge becomes even more significant for anatomically complex cancer sites. To mitigate these issues, enhancing the computational power of the TPS to accelerate the solution of plan optimization problems is essential. Augmenting the RL training process with human experience in planning decisions can guide the RL agent’s exploration and facilitate faster convergence⁵⁴.

Variability in acceptability and deliverability

Treatment plans in current clinical practice often exhibit substantial variability in both acceptability (e.g., plan quality and clinical trade-offs) and deliverability (e.g., machine limitations and treatment complexity). This poses a challenge for training HCITP, as it introduces ambiguity in defining what constitutes an optimal plan. The variation in acceptability is multifaceted. One major factor is the lack of a definitive ground truth for plan quality evaluation. With HCITP, state-of-the-art clinical guidelines can be integrated. Additionally, HCITP will learn physician preferences for plan acceptance. The Evaluation Module, trained in this way, will provide guidance to the Execution Module, ensuring greater consistency in generated plans. Another key factor contributing to variability is the acceptance of suboptimal plans due to time constraints or ineffective communication between planners and physicians. HCITP’s streamlined workflow facilitates the pursuit of optimal plans, thereby reducing quality variations. Moreover, by incorporating clinical guidelines and enabling physicians to explore a broader range of plans, HCITP can offer valuable educational opportunities, helping to mitigate variations driven by individual human factors. Regarding variability in deliverability, the Evaluation Module will be trained not only to assess plan quality from a clinical perspective but also to account for other practical factors, such as plan modulation factors, delivery time considering patient tolerance, beam angles appropriate for immobilization devices to prevent collisions, and more. Recognizing this variability, HCITP development will likely need to be iterative. Early, controlled implementation can standardize planning strategies, reduce unwarranted variation, and supporting the system’s continual refinement.

Generalization

While ensuring generalization across datasets for diverse populations is critical, in treatment planning, generalization also refers to the ability to perform this task for a wide range of tumor sites. Unlike human planners, who are trained to handle various disease sites, existing virtual planners are developed for specific cancer types, limiting their scalability and versatility. RL-based Execution Module can be effectively trained to incorporate broad knowledge in operating the TPS and be fine-tuned for different tumor sites.

HCITP leverages published clinical guidelines, e.g., those from the American Society for Radiation Oncology and the European Society for Radiotherapy and Oncology, as part of its FM pre-training to ensure broad generalizability. However, alignment with local datasets and institutional protocols should not be neglected for safe and clinically relevant deployment. This alignment can be achieved through strategies such as fine-tuning on de-identified local data, federated learning across institutions, or feedback loops that allow the model to continuously adapt to local practice patterns.

Continual learning

RT and treatment planning continuously evolve to accommodate advancements such as new treatment guidelines and innovative delivery approaches of LINACs. To keep pace with them, HCITP should be designed to seamlessly monitor and integrate with society guidelines and diverse treatment delivery technologies. When new physicians join an institution, the Evaluation Module needs to be updated to learn their preferences using their initial clinical cases. In these scenarios, transfer learning can be employed to reduce the effort required for training and implementation. Additionally, regular audits on data quality are necessary to detect and address emerging biases or performance deficiencies. A robust feedback loop should be incorporated, allowing users to provide input during routine clinical practice to refine and enhance the system’s performance.

Challenges related to clinical implementation

Model development and deployment

Implementing HCITP requires significant investment, such as computational and data resources for training, as well as infrastructure to support model inference at deployment. This may not be feasible universally across hospitals, especially in resource-constrained settings globally. To address this challenge, we envision using lightweight models that can run locally on multiple GPUs, with the option to leverage cloud resources when necessary. Rather than full model training, a more practical approach involves using lightweight post-training techniques.

It is important to acknowledge that HCITP’s performance may not always be perfect, particularly in clinically complex scenarios. One such example is re-irradiation, where a patient has received prior treatments, and the previously delivered dose must be accurately transferred to the current planning stage to define appropriate dose objectives and assess plan quality⁵⁵. HCITP is intentionally designed to preserve human oversight, with physicians serving as the ultimate decision-makers. Similar to current clinical practice, where physicians communicate with planners during plan review, the Conversation Module is envisioned to facilitate efficient physician-system interaction for plan evaluation. In cases where direct physician involvement is limited by clinical workload, appropriately trained physician delegates may perform preliminary plan evaluations, thereby reducing the required effort from physicians.

The AI modules relieve human planners of repetitive and routine tasks, enabling them to focus on complex and nonstandard cases that remain beyond AI’s current capability, such as those involving unusual anatomy or intricate dose distribution requirements. Planners will also continue to perform essential auxiliary tasks that facilitate treatment delivery, though some of these may be progressively automated with future technological and workflow advancements. Furthermore, their roles are expected to evolve toward evaluating AI-generated plans for quality and compliance, aligning their responsibilities more closely with those of medical physicists serving as human evaluators within the HCITP framework.

Meanwhile, overemphasizing human-centeredness may inadvertently limit plan quality improvement and educational opportunities. To prevent this, the Evaluation Module should prioritize ensuring that the latest planning guidelines are followed, promoting consistency and improving plan quality across institutions. Additionally, the enhanced workflow efficiency by HCITP will allow physicians more time to thoroughly review and refine plans. By observing a broader range of plans and exploring the solution space, physicians can better identify the optimal plans for individual patients. This process also provides valuable educational opportunities.

Evaluation

High-quality representative datasets must be collected, and infrastructure must be built to support evaluation. A well-defined pathway of evaluation should be established, starting with offline virtual testing on large-scale independent datasets, followed by pilot studies. Rigorous uncertainty estimation, e.g., via ensembles and Monte Carlo dropout⁵⁶, calibration, and robustness testing should be performed. Ultimately, a prospective evaluation, akin to multi-center clinical trials, should be conducted to objectively measure the overall impact of HCITP on patient care and healthcare delivery. Post-deployment, regular audits with diverse clinical data are necessary to monitor and sustain safety and performance.

While technical metrics, such as cumulative rewards and convergence rates, can provide insights into the performance of AI models, it is more important to design task-based metrics to assess HCITP’s performance in a contextualized setting. For example, plan quality can be measured using established numerical models to measure impact on healthcare outcomes⁵⁷. Health economics models may be employed to evaluate the cost-effectiveness of HCITP implementation⁵⁸. For explainability, HCITP’s strategies in generating plans and evaluating them can be compared against those of expert humans to validate their effectiveness and alignment with clinical expertise.

The treatment planning workflow encompasses many steps (Fig. 2), and its overall efficiency and impact are ultimately determined by the performance of all components. While HCITP aims to enhance automation and intelligence within the plan generation stage, the benefits of this advancement may be constrained by upstream processes, including image fusion, target and OAR delineation, and prescription definition, as well as downstream tasks, such as plan documentation. We acknowledge this limitation and look forward to future AI developments that extend automation and decision support to these steps^59,60.

Safety and privacy

FMs can sometimes lead to hallucinations or incorrect outputs⁶¹, posing risks to patient safety. Risks may also arise from improper explorations in RL model training, poorly designed reward functions, and biased training data, all of which can result in suboptimal or discriminatory actions. Differences in LINAC and TPS functions, compatibility, and dose modeling accuracy may introduce systematic biases during model training. Additionally, adversarial attacks on internal training data could lead to harmful or misleading outputs. As for privacy, large-scale AI systems, particularly FMs, are often trained on vast datasets that may contain personal information, raising concerns about data privacy. Malicious actors could exploit vulnerabilities, such as prompting tricks, to manipulate the models into revealing sensitive protected health information, thereby violating confidentiality standards. Such breaches could lead to legal consequences, erosion of trust in healthcare technologies, and increased patient reluctance to consent to AI-assisted care.

Several strategies can help mitigate these issues. For instance, combining chain-of-thought prompting, which guides the model to reason step-by-step, with self-consistency, which improves reliability by generating multiple reasoning paths and selecting the most frequent or confident response, has been shown to enhance LLM reasoning accuracy by 5–10%^62,63. Retrieval-augmented generation can further improve model responses by incorporating relevant external information. Additionally, guardrails such as regular model evaluations, adversarial testing, and continuous monitoring post-deployment are essential.

In addition, the increased software integration required to incorporate HCITP into the clinical workflow introduces broader cybersecurity risks. Modern RT systems already operate within a complex network of interconnected hardware and software, and expanding this ecosystem with AI-driven components heightens exposure to potential vulnerabilities. Proactive countermeasures are therefore essential. Incorporating systematic security testing, such as fuzz testing⁶⁴, along with enforcing strong input validation and hardening of data interfaces, can help identify and mitigate vulnerabilities before deployment. Providing routine cybersecurity training for RT staff and establishing clear incident-response procedures will help ensure the overall safety and robustness.

Legal considerations and clinical adoption

While potentially revolutionizing RT treatment planning, HCITP also raises a critical question common to AI-based healthcare systems: who should be held accountable for errors it makes? In the HCITP framework, much like the current practice, physicians retain the authority to approve or reject plans. This ensures that physicians remain ultimately accountable for their validity.

As with the adoption of other AI techniques in healthcare, government guidelines are essential to establish clear roles and responsibilities for all parties involved. Software manufacturers must prioritize creating reliable AI systems, rigorously testing them across diverse datasets to ensure robustness, and transparently disclosing system limitations. Users, in turn, should undergo comprehensive training to effectively interpret AI-generated recommendations and validate their applicability before implementation. During operation, identified errors should be reported. By fostering collaboration between users and software manufacturers, and strengthening these efforts through robust legislation, the risks associated with HCITP can be minimized. To obtain regulatory approval, vendors developing HCITP systems must demonstrate compliance with medical-device standards (e.g., U.S. FDA requirements), providing validated evidence of accuracy, transparency, reproducibility, and robustness in treatment planning, while also addressing ethical and privacy concerns.

It is essential to strategize a roadmap that builds trust among key stakeholders, including patients, clinicians, administrators, regulators, and vendors. This roadmap should prioritize focused technology development on the key attributes outlined above, be supported by comprehensive multi-site validation that benchmarks performance against expert practice, and ensure alignment with regulatory and ethical standards. Throughout development and deployment, continuous human oversight with clearly defined responsibilities should be maintained. Following pilots and controlled trials, a phased roll-out with structured user training should be implemented, and objective evidence of effectiveness should be reported routinely to guide scale-up and continuous improvement. Early, sustained stakeholder engagement will help streamline approval, foster confidence, and increase the likelihood of successful adoption.

Conclusion

This perspective paper outlines key challenges in current RT treatment planning, particularly the lack of intelligence within existing TPSs. As a solution, we envision HCITP as a unified, agentic AI-powered framework that integrates decision-making capabilities while preserving human oversight to ensure quality and safety. Unlike prior efforts that address isolated aspects of treatment planning, HCITP aims to harmonize these solutions into a single workflow. We look forward to future developments in this area, highlighting the potential for HCITP to enhance personalized treatment planning, increase access to RT, and drive significant improvements in clinical practice.

Data availability

No datasets were generated or analyzed during the current study.

References

Filho, A. M. et al. The GLOBOCAN 2022 cancer estimates: data sources, methods, and a snapshot of the cancer burden worldwide. Int. J. Cancer 156, 1336–1346 (2025).
Article PubMed Google Scholar
Hall, E. J. et al. Radiobiology for the Radiologist, Vol. 6 (Lippincott Williams & Wilkins, 2006).
Palma, D. et al. Volumetric modulated arc therapy for delivery of prostate radiotherapy: comparison with intensity-modulated radiotherapy and three-dimensional conformal radiotherapy. Int. J. Radiat. Oncol. Biol. Phys. 72, 996–1001 (2008).
Article PubMed Google Scholar
Gupta, T. et al. Three-dimensional conformal radiotherapy (3D-CRT) versus intensity modulated radiation therapy (IMRT) in squamous cell carcinoma of the head and neck: a randomized controlled trial. Radiother. Oncol. 104, 343–348 (2012).
Article CAS PubMed Google Scholar
Tribius, S. & Bergelt, C. Intensity-modulated radiotherapy versus conventional and 3D conformal radiotherapy in patients with head and neck cancer: is there a worthwhile quality of life gain? Cancer Treat. Rev. 37, 511–519 (2011).
Article PubMed Google Scholar
Nutting, C. M. et al. Parotid-sparing intensity modulated versus conventional radiotherapy in head and neck cancer (PARSPORT): a phase 3 multicentre randomised controlled trial. Lancet Oncol. 12, 127–136 (2011).
Article PubMed PubMed Central Google Scholar
Khan, F. M. & Gibbons, J. P. Khan’s The Physics of Radiation Therapy (Lippincott Williams & Wilkins, 2014).
Duan, Y., Edwards, J. S. & Dwivedi, Y. K. Artificial intelligence for decision making in the era of big data–evolution, challenges and research agenda. Int. J. Inf. Manag. 48, 63–71 (2019).
Google Scholar
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Article CAS PubMed Google Scholar
Fawzi, A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 47–53 (2022).
Article CAS PubMed PubMed Central Google Scholar
Batumalai, V., Jameson, M. G., Forstner, D. F., Vial, P. & Holloway, L. C. How important is dosimetrist experience for intensity modulated radiation therapy? A comparative analysis of a head and neck case. Pract. Radiat. Oncol. 3, e99–e106 (2013).
Article PubMed Google Scholar
Nelms, B. E. et al. Variation in external beam treatment plan quality: an inter-institutional study of planners and planning systems. Pract. Radiat. Oncol. 2, 296–305 (2012).
Article PubMed Google Scholar
Das, I. J. et al. Intensity-modulated radiation therapy dose prescription, recording, and delivery: patterns of variability among institutions and treatment planning systems. J. Natl. Cancer Inst. 100, 300–307 (2008).
Article PubMed Google Scholar
Moore, K. L. et al. Quantifying unnecessary normal tissue complication risks due to suboptimal planning: a secondary study of RTOG 0126. Int. J. Radiat. Oncol. Biol. Phys. 92, 228–235 (2015).
Article PubMed PubMed Central Google Scholar
Peters, L. J. et al. Critical impact of radiotherapy protocol compliance and quality in the treatment of advanced head and neck cancer: results from TROG 02.02. J. Clin. Oncol. 28, 2996–3001 (2010).
Article PubMed Google Scholar
Do, V., Gebski, V. & Barton, M. B. The effect of waiting for radiotherapy for grade III/IV gliomas. Radiother. Oncol. 57, 131–136 (2000).
Article CAS PubMed Google Scholar
Ferreira, J. A. G., Jaén Olasolo, J., Azinovic, I. & Jeremic, B. Effect of radiotherapy delay in overall treatment time on local control and survival in head and neck cancer: review of the literature. Rep. Pract. Oncol. Radiother. 20, 328–339 (2015).
Article Google Scholar
Soerjomataram, I. & Bray, F. Planning for tomorrow: global cancer incidence and the role of prevention 2020–2070. Nat. Rev. Clin. Oncol. 18, 663–672 (2021).
Article PubMed Google Scholar
Babashov, V. et al. Reducing patient waiting times for radiation therapy and improving the treatment planning process: a discrete-event simulation model (radiation treatment planning). Clin. Oncol. 29, 385–391 (2017).
Article CAS Google Scholar
American Society for Radiation Oncology. Safety Is No Accident: A Framework for Quality Radiation Oncology and Care (American Society for Radiation Oncology, 2012).
Van Dyk, J., Zubizarreta, E. & Lievens, Y. Cost evaluation to optimise radiation therapy implementation in different income settings: a time-driven activity-based analysis. Radiother. Oncol. 125, 178–185 (2017).
Article PubMed Google Scholar
Yan, D., Vicini, F., Wong, J. & Martinez, A. Adaptive radiation therapy. Phys. Med. Biol. 42, 123 (1997).
Article CAS PubMed Google Scholar
Li, N. et al. Automatic treatment plan re-optimization for adaptive radiotherapy guided with the initial plan DVHs. Phys. Med. Biol. 58, 8725 (2013).
Article PubMed Google Scholar
Zubizarreta, E., Fidarova, E., Healy, B. & Rosenblatt, E. Need for radiotherapy in low and middle income countries–the silent crisis continues. Clin. Oncol. 27, 107–114 (2015).
Article CAS Google Scholar
Ge, Y. & Wu, Q. J. Knowledge-based planning for intensity-modulated radiation therapy: a review of data-driven approaches. Med. Phys. 46, 2760–2775 (2019).
Article PubMed PubMed Central Google Scholar
Nguyen, D. et al. A feasibility study for predicting optimal radiation therapy dose distributions of prostate cancer patients from patient anatomy using deep learning. Sci. Rep. 9, 1076 (2019).
Article PubMed PubMed Central Google Scholar
Chen, X., Men, K., Li, Y., Yi, J. & Dai, J. A feasibility study on an automated method to generate patient-specific dose distributions for radiotherapy using deep learning. Med. Phys. 46, 56–64 (2019).
Article PubMed Google Scholar
Gao, Y., Shen, C., Gonzalez, Y. & Jia, X. Modeling physician’s preference in treatment plan approval of stereotactic body radiation therapy of prostate cancer. Phys. Med. Biol. 67, 115012 (2022).
Article Google Scholar
Shen, C. et al. Intelligent inverse treatment planning via deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer. Phys. Med. Biol. 64, 115013 (2019).
Article PubMed PubMed Central Google Scholar
Shen, C., Chen, L., Gonzalez, Y. & Jia, X. Improving efficiency of training a virtual treatment planner network via knowledge-guided deep reinforcement learning for intelligent automatic treatment planning of radiotherapy. Med. Phys. 48, 1909–1920 (2021).
Article PubMed PubMed Central Google Scholar
Wang, H., Bai, X., Wang, Y., Lu, Y. & Wang, B. An integrated solution of deep reinforcement learning for automatic IMRT treatment planning in non-small-cell lung cancer. Front. Oncol. 13, 1124458 (2023).
Article PubMed PubMed Central Google Scholar
Yang, D. et al. Understanding and modeling human-AI interaction of artificial intelligence tool in radiation oncology clinic using deep neural network: a feasibility study using three year prospective data. Phys. Med. Biol. 69, 225018 (2024).
Article Google Scholar
Gao, Y., Park, Y. K. & Jia, X. Human-like intelligent automatic treatment planning of head and neck cancer radiation therapy. Phys. Med. Biol. 69, 115049 (2024).
Article CAS Google Scholar
Gao, Y., Shen, C., Jia, X. & Park, Y. K. Implementation and evaluation of an intelligent automatic treatment planning robot for prostate cancer stereotactic body radiation therapy. Radiother. Oncol. 184, 109685 (2023).
Article PubMed PubMed Central Google Scholar
Liu, S. et al. Automated radiotherapy treatment planning guided by GPT-4Vision. Phys. Med. Biol. 70, 155002 (2025).
Article CAS Google Scholar
Hrinivich, W. T. & Lee, J. Artificial intelligence-based radiotherapy machine parameter optimization using reinforcement learning. Med. Phys. 47, 6140–6150 (2020).
Article PubMed Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article CAS PubMed Google Scholar
Holzinger, A., Saranti, A., Molnar, C., Biecek, P. & Samek, W. Explainable AI methods—a brief overview. In Proc. International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers, 13–38 (Springer, 2022).
Zhou, C. et al. A comprehensive survey on pretrained foundation models: a history from BERT to ChatGPT. Int. J. Mach. Learn. Cybern. 2024, 1–65 (2024).
Google Scholar
Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2021).
Article CAS Google Scholar
Bareja, R. et al. nnU-Net–based segmentation of tumor subcompartments in pediatric medulloblastoma using multiparametric MRI: a multi-institutional study. Radiol. Artif. Intell. 6, e230115 (2024).
Article PubMed PubMed Central Google Scholar
Hatamizadeh, A. et al. Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images. In Proc. International MICCAI Brainlesion Workshop, 272–284 (Springer, 2021).
Savova, G. K. et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17, 507–513 (2010).
Article PubMed PubMed Central Google Scholar
Huang, K., Altosaar, J. & Ranganath, R. ClinicalBERT: modeling clinical notes and predicting hospital readmission. Preprint at https://arxiv.org/abs/1904.05342 (2019).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).
Callens, D. et al. Is full-automation in radiotherapy treatment planning ready for take off? Radiother. Oncol. 201, 110546 (2024).
Sheng, Y. et al. Artificial intelligence applications in intensity modulated radiation treatment planning: an overview. Quant. Imaging Med. Surg. 11, 4859 (2021).
Article PubMed PubMed Central Google Scholar
European Society for Radiotherapy and Oncology (ESTRO). AI for the Fully Automated Radiotherapy Treatment. https://www.estro.org/Workshops/2023-Physics-Workshop-Science-in-Development/AI-for-the-fully-automated-radiotherapy-treatment (2023).
Chen, I., Johansson, F. D. & Sontag, D. Why is my classifier discriminatory? Advances in Neural Information Processing Systems, 31, (2018).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Article CAS PubMed PubMed Central Google Scholar
Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023).
Article CAS PubMed Google Scholar
Zhao, H. et al. Explainability for large language models: a survey. ACM Trans. Intell. Syst. Technol. 15, 1–38 (2024).
Google Scholar
Grondman, I., Busoniu, L., Lopes, G. A. & Babuska, R. A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. C (Appl. Rev.) 42, 1291–1307 (2012).
Article Google Scholar
Beddok, A. et al. Reirradiation: standards, challenges, and patient-focused strategies across tumor types. CA Cancer J. Clin. 75, 630–666 (2025).
Abdar, M. et al. A review of uncertainty quantification in deep learning: techniques, applications and challenges. Inf. Fusion 76, 243–297 (2021).
Article Google Scholar
Bentzen, S. M. et al. Quantitative analyses of normal tissue effects in the clinic (QUANTEC): an introduction to the scientific issues. Int. J. Radiat. Oncol. Biol. Phys. 76, S3–S9 (2010).
Article PubMed PubMed Central Google Scholar
Lievens, Y. & Grau, C. Health economics in radiation oncology: introducing the ESTRO HERO project. Radiother. Oncol. 103, 109–112 (2012).
Article PubMed Google Scholar
Oh, Y. et al. LLM-driven multimodal target volume contouring in radiation oncology. Nat. Commun. 15, 9186 (2024).
Article CAS PubMed PubMed Central Google Scholar
Rajendran, P. et al. Autodelineation of treatment target volume for radiation therapy using large language model-aided multimodal learning. Int. J. Radiat. Oncol. Biol. Phys. 121, 230–240 (2025).
Article PubMed Google Scholar
Xu, Z., Jain, S. & Kankanhalli, M. Hallucination is inevitable: an innate limitation of large language models. Preprint at https://arxiv.org/abs/2401.11817 (2024).
Huang, J. et al. Large language models can self-improve. In Proc. 2023 Conference on Empirical Methods in Natural Language Processing, 1051–1068 (2023).
Wang, X. et al. Self-consistency improves chain of thought reasoning in language models. In Proc. International Conference on Learning Representations (ICLR). https://webdocs.cs.ualberta.ca/dale/papers/iclr23b.pdf (2023).
Sutton, M., Greene, A. & Amini, P. Fuzzing: Brute Force Vulnerability Discovery (Pearson, 2007).

Download references

Acknowledgements

This work was supported in part by NIH grants R01CA227289, R01CA254377, R37CA214639, and R01EB032716.

Author information

Authors and Affiliations

Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, MD, USA
Adnan Jafar & Xun Jia

Authors

Adnan Jafar
View author publications
Search author on:PubMed Google Scholar
Xun Jia
View author publications
Search author on:PubMed Google Scholar

Contributions

A.J. and X.J. wrote the main manuscript text and prepared figures. Both authors have read and approved the manuscript.

Corresponding author

Correspondence to Xun Jia.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jafar, A., Jia, X. Towards human-centric intelligent treatment planning for radiation therapy. npj Digit. Med. 9, 155 (2026). https://doi.org/10.1038/s41746-026-02339-5

Download citation

Received: 31 December 2024
Accepted: 01 January 2026
Published: 10 January 2026
Version of record: 12 February 2026
DOI: https://doi.org/10.1038/s41746-026-02339-5

Subjects

Abstract

Similar content being viewed by others

Multicenter study on the versatility and adoption of AI-driven automated radiotherapy planning across cancer types

Clinical integration of machine learning for curative-intent radiation treatment of patients with prostate cancer

Radiological artificial intelligence - predicting personalized immunotherapy outcomes in lung cancer

Introduction

Current treatment planning practice and its limitations

Suboptimal plan quality undermines treatment outcomes

Low planning efficiency delays treatment and impacts outcomes

High costs burden healthcare systems

Existing efforts using AI to advance treatment planning

Human-centric intelligent treatment planning

Overall scheme

Advantages of human-centric intelligent treatment planning

Considerations on human-centric intelligent treatment planning

Challenges related to technology development

Model training

Variability in acceptability and deliverability

Generalization

Continual learning

Challenges related to clinical implementation

Model development and deployment

Evaluation

Safety and privacy

Legal considerations and clinical adoption

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links