Introduction

Lung cancer remains a significant challenge for patients and healthcare systems worldwide. It is the leading cause of cancer-related deaths and ranks as the third most common cancer globally, accounting for around 1.8 million deaths1. Most lung cancer cases are detected at advanced stages, where treatment is mainly palliative, resulting in poor survival outcomes2. In South Korea, lung cancer screening is conducted through the National Lung Cancer Screening Program (NLCSP), which targets individuals aged 54 to 74 years with a smoking history of at least 30 pack-years, providing biennial low-dose computed tomography (LDCT) scans.

The integration of artificial intelligence (AI) has greatly enhanced the capabilities of computer-aided detection (CAD) systems, broadening their application across various medical imaging exams, including mammography, brain CT scans, and chest radiography or CT scans. These systems are now employed for multiple purposes, such as detecting lesions, providing differential diagnoses, prioritizing urgent images, and extracting imaging biomarkers3,4. The incorporation of deep learning technology into CAD systems has significantly improved the performance of CAD algorithms in analyzing chest radiography5,6. Multiple AI-driven CAD systems have been shown to markedly enhance the performance of radiologists as secondary readers6,7.

Chest radiography, the most frequently performed radiologic examination in clinical practice, is the primary method for ruling out chest diseases assessing the effectiveness of treatments (e.g., for pneumonia, tuberculosis, or lung cancer), and monitoring patients with chest abnormalities. Additionally, it provides an early opportunity to detect both symptomatic and asymptomatic cases of lung cancer8. Despite multiple large, randomized trials demonstrating the lack of efficacy of chest radiography in reducing lung cancer mortality9, it remains widely utilized for screening various lung diseases, including pulmonary tuberculosis, other chest infections, and lung cancer10,11. Specifically, chest radiography is frequently employed for health checkups among the general population in certain countries12,13. Furthermore, retrospective studies have suggested the potential of AI-based CAD systems to enhance the role of chest radiography in lung cancer screening12,14,15.

Despite the potential benefits of AI algorithms in evaluating lung cancer risk, there remains uncertainty regarding the extent to which integrating AI software for detecting lung cancer can improve diagnostic accuracy and subsequently impact healthcare outcomes and overall costs. Therefore, the objective of this study was to assess the cost-effectiveness of implementing a commercial AI-based computer-aided detection (CAD)–integrated picture archiving and communication system (PACS) for identifying actionable lung nodules on chest radiographs among participants undergoing health checkups. The study aimed to compare this approach with five mutually exclusive screening strategies: no screening, chest X-ray (CXR), AI-assisted CXR, low-dose computed tomography (LDCT), and AI-assisted LDCT. Each strategy was modeled separately, meaning that individuals in the simulation received only one screening modality (not both CXR and LDCT simultaneously). This design allowed us to evaluate and compare the incremental costs and benefits of each option.

Methods

Study overview

In this study, we constructed a mixed model of a decision tree and a Markov model to evaluate the cost-effectiveness of AI-based diagnostic software for lung cancer patients. We created four scenarios, each comprising 10,000 individuals in South Korea aged 54–74, 40–80, 50–80, and 60–80 years old, respectively. The population distributions used for each scenario are summarized in Supplementary Table 3. Scenario 1 was based on the current Korean lung cancer screening guideline16, targeting individuals aged 54–74 years old, while Scenario 2 to 4 were expanded to a wider range based on the United States Preventive Services Task Force (USPSTF) recommendation statement for lung cancer screening17. Each scenario includes demographic distributions of non-smokers (0 pack-years), light-smokers (more than 0 but fewer than 30 pack-years), and heavy-smokers (30 or more pack-years) in Korea. The screening age range refers to the age group eligible for lung cancer screening, while the time horizon represents the total period over which costs and health outcomes are tracked. Beyond the screening age, the modeled time horizons extend to capture full lifetime effects, corresponding to 20, 40, 30, and 20 years for each scenario, respectively. Quality-adjusted life years (QALYs), which encompass both quantity (life-years gained) and quality (health-related quality of life HRQOL in utility value) value, were used as effectiveness variables. A healthcare system perspective was adopted in this study according to the economic evaluation guidelines in Korea18, excluding non-medical (transportation and nursing care) and indirect costs (time costs) but only medical costs. The average cost-effectiveness ratio (ACER) and the incremental cost-effectiveness ratio (ICER) were calculated for the outcome measures to evaluate cost-effectiveness. A discount rate of 4.5% for both QALYs and costs was applied according to Korean guideline18. A willingness-to-pay (WTP) threshold of $32,409.9 per QALY gained was used, reflecting 1-time the gross domestic product per capita in 2022 South Korea19. Overall, this study followed the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) guideline (Supplementary Table 1)20. The study procedures were reviewed and approved by the Institutional Review Board of the National Health Insurance Service Ilsan Hospital (IRB number: 2023-04-021). As this study was based solely on previously published, de-identified data, it did not involve direct human participants or any identifiable personal information. Therefore, informed consent was not required. All methods were carried out in accordance with relevant guidelines and regulations. The input parameters used in the model were extracted from published literature, as detailed in Table 1.

Table 1 Model input parameters.

Model structure

The analytic software program TreeAge Pro 2022 (TreeAge Software, Williamstown, MA, USA) was used to compare the lifetime cost-effectiveness of diagnosing lung cancer with and without AI. Specifically, we compared five comparators in all scenarios: (1) no screening, (2) CXR, (3) CXR + AI, (4) LDCT, and (5) LDCT + AI. Step (A) in Fig. 1 represents these five comparators. For each alternative option, we distributed the population into non-smokers, light-smokers, and heavy-smokers based on the Korean smoking status, as illustrated in step (B) (population distributions) of Fig. 1. Since lung cancer incidence varies by smoking status, we incorporated the hazard ratio for lung cancer incidence into the transition probabilities from step (B) (population distributions) to step (C) (detection & diagnosis sensitivity) in Fig. 1. Then we divided the simulation into detected and undetected cases based on sensitivity and specificity data, which correspond to step (C) (detection & diagnosis sensitivity). Besides the no-screening strategy, all other options followed the Korean national screening guidelines, which recommend biennial screening (every two years). Finally, step (D) (Markov model) in Fig. 1 represents the lifetime disease progression component with a one-year cycle length. The model includes the smoking characteristics of individuals, all-cause mortality, and disease-specific mortality, but it is unable to track patients’ previous health states due to the memoryless property inherent in the Markov assumption21. The selection of comparators, diagnostic accuracy parameters, and clinical prognosis assumptions were derived from previous studies22,23,24,25.

Fig. 1
figure 1

Markov model.

Intervention and comparators

We compared the five strategies to identify the potential effects of AI-based detection. In the no screening group, the population is not exposed to regular lung cancer screening but is detected based on the current national probabilities once cancer has occurred. In the other four strategies, we hypothesized that regular cancer screening is conducted using CXR, CXR + AI, LDCT, and LDCT + AI options. The major differences between these comparators are the sensitivity and specificity of detecting lung cancer and its stages.

Input variables

The sensitivity and specificity values for CXR and LDCT were primarily obtained from large randomized trials and meta-analyses14,26. Performance values for AI-assisted CXR were derived from recent clinical validation studies of deep learning–based algorithms in chest radiography22,26. Hazard ratios for lung cancer incidence and mortality among light- and heavy-smokers were modeled relative to non-smokers, which served as the reference group. Utility weights for lung cancer stages were drawn from previously published study30. Utility was defined as a health-related quality-of-life (HRQOL) weight ranging from 0 (death) to 1 (perfect health). In the absence of prior data, the utility for the disease-free (health) state was assumed to be 0.9, while the utility for death was set to 0. A full list of data sources is provided in Table 1. For the AI-assisted strategies, diagnostic performance estimates were derived from published studies specific to each modality. The CXR + AI strategy was modeled as AI acting independently, based on the study by Nam et al.14, which evaluated a deep learning algorithm for lung cancer detection without radiologist input, reflecting initiatives to implement AI-based CXR as a scalable first-line screening tool. In contrast, the LDCT + AI strategy assumed that radiologists interpreted CT scans with AI assistance as a confirmatory step, consistent with current clinical practice, regulatory frameworks in Korea, and evidence from a systematic review by Wang et al.26. The sensitivity parameter was highest for the LDCT + AI option, followed by the LDCT strategy, while the specificity parameter was highest for the CXR + AI option. Based on these sensitivity and specificity parameters, individuals with lung cancer were divided into detected (true-positive) and undetected (false-negative) groups in the simulation model. False-negative cases were explicitly incorporated and assumed to remain undiagnosed, progressing naturally through the Markov process without the benefit of early detection. The proportions of detected and undetected cases were determined from the diagnostic performance values summarized in Table 1. Detailed cost data for lung cancer patients in Gyeonggi province were obtained from the National Cancer Center. Age-specific annual medical cost data, including the initial year (first year after diagnosis) and incremental (annual follow-up) costs, were used for local, regional, and distant stages. All costs were converted to 2022 USD using an exchange rate of 1 USD = 1265 KRW. “Additional costs” refer to non-medical expenditures, including LDCT, CXR, and AI utilization and maintenance fees. Background mortality was incorporated into the Markov model based on age-specific life tables. Detailed mortality rates applied to each cycle are provided in Supplementary Table 2.

Statistical analyses

In the base-case analysis, we evaluated the cost-effectiveness of each alternative using the ACER and ICER, applying an annual discount rate of 4.5%. ACER is calculated by dividing the total cost of a given strategy by its effectiveness, while ICER is determined by dividing the difference in costs between two strategies by the difference in their effectiveness (QALY). If the ICER is lower than the WTP threshold, the alternative is considered cost-effective. To address potential inaccuracies at the beginning or end of each cycle, a half-cycle correction was applied to both QALYs and costs. Sensitivity analysis is essential to reduce uncertainty inherent in economic evaluations. We conducted a deterministic sensitivity analysis (DSA) using a range of 90% to 110% for each parameter and a probabilistic sensitivity analysis (PSA) using distributions detailed in Table 1. The DSA results highlighted the ICER variations based on single parameter changes, allowing us to pinpoint the most sensitive input variable. Additionally, we performed a Monte Carlo simulation with 10,000 iterations for the PSA, which randomly sampled values to determine the percentage of scenarios in which different strategies were optimal. The results were presented using incremental cost-effectiveness (ICE) scatterplots and cost-effectiveness acceptability curves (CEAC), which visually represent the findings.

Results

Base-case analysis

Table 2 describes the overall results of simulating Scenarios 1 to 4, and Fig. 2 displays the cost and effectiveness of each strategy. Although Table 2 presents ICER-based comparisons, we also report ACER values to describe overall cost-effectiveness relative to no screening. Compared to the no screening option, using CXR + AI in diagnosing lung cancer showed ACERs of $12,927 per QALY gained in Scenario 1, $15,601 per QALY gained in Scenario 2, $13,190 per QALY gained in Scenario 3, and $9950 per QALY gained in Scenario 4. When comparing incremental changes between strategies, the ICER for CXR + AI compared to the CXR strategy were $9491, $10,030, $9552, $8679 per QALY gained in each respective scenario. Using LDCT + AI instead of CXR + AI resulted in an ICER of $599,724 per QALY in Scenario 1, far exceeding the commonly accepted WTP threshold of $32,410 per QALY gained in Korea.

Table 2 Results of the base case analysis: Scenario 1 to 4.
Fig. 2
figure 2

Results of the cost-effectiveness analysis for Scenario 1: 54–74 years old.

Sensitivity analysis

Deterministic sensitivity analysis (DSA) was conducted to examine the most sensitive input parameters by changing each parameter value from the base-case analysis. Figure 3 presents a tornado diagram comparing CXR + AI to the no screening option for Scenario 1, while Supplementary Figs. 1 to 3 display the results for Scenarios 2 to 4. In most scenarios, the start age of lung cancer screening was determined to be the most sensitive parameter affecting the outcome of ICER; the ICER values tend to be better, especially for older populations in all scenarios. The initial incidence of lung cancer by stages (regional, distant) was the following variable that is sensitive to the outcome. From the Monte Carlo simulation, we identified the integrated effects of the input parameters by randomly choosing the values. Figure 4 presents the incremental cost-effectiveness (ICE) scatter plot of 10,000 Monte Carlo iterations comparing all five strategies simultaneously. The CXR + AI strategy demonstrated the highest probability of being the most cost-effective option among all alternatives, with a 91.8% probability of being optimal in Scenario 1. The CXR + AI alternative was the optimal cost-effective strategy with probabilities of 83.9%, 81.8%, and 76.2% in Scenarios 2 to 4, respectively (Supplementary Figs. 4 to 6). In Scenario 1, the probability of CXR + AI being cost-effective was over 50% until the WTP threshold fell below $140,000 as shown in the CEAC graph (Fig. 5).

Fig. 3
figure 3

Tornado diagram for Scenario 1: 54–74 years old Abbreviations: pI_initial_2regional_all, initial population of lung cancer (regional); pI_initial_3distant_all, initial population of lung cancer (distant); u_0healthy, utility for healthy state; u_1local, utility for lung cancer (local); pA_TP_3CXR_AI, sensitivity of CXR + AI; u_4diseasefree, utility after treated; pI_initial_1local_all, initial population of lung cancer (local); u_3distant, utility for lung cancer (distant); cAdditional_CXR, cost for CXR; u_2regional, utility for lung cancer (regional); pM_recurrence, probability of lung cancer recurrence; p_detection, probability of detecting lung cancer naturally (assumed); pA_TN_3CXR_AI, specificity of CXR + AI; pS_cure_2regional_all, probability of being treated from lung cancer (regional); pS_cure_1local_all, probability of being treated from lung cancer (local); pS_cure_3distant_all, probability of being treated from lung cancer (distant); cAdditional_AI, cost for AI software; cAdditional_LDCT, cost for LDCT; pA_TP_4LDCT, sensitivity of LDCT; pA_TP_2CXR, sensitivity of CXR; pA_TP_5LDCT_AI, sensitivity of LDCT + AI; pA_TN_4LDCT, specificity of LDCT; pA_TN_2CXR, specificity of CXR; pA_TN_5LDCT_AI, specificity of LDCT + AI; u_5undetected, utility for undetected lung cancer.

Fig. 4
figure 4

Incremental cost-effectiveness (ICE) scatterplot for Scenario 1: 54–74 years old.

Fig. 5
figure 5

Cost-effectiveness acceptability curve (CEAC) for Scenario 1.

Discussion

While AI has been applied in diagnosis and therapeutic decision-making for lung cancer screening, there is limited evidence regarding its cost-effectiveness. To our best knowledge, this is the first economic evaluation that considers the cost-effectiveness of AI-based CAD–integrated PACS diagnostic aid as a medical device compared with standard care from a healthcare system perspective. Our model distinguished between AI acting independently and AI assisting radiologists. This distinction is important, as standalone AI may enhance scalability and reduce workforce demand, whereas collaborative human–AI interpretation has been shown to achieve higher diagnostic accuracy, particularly for complex modalities such as LDCT. These differences in integration approach may influence not only clinical outcomes but also the cost-effectiveness of screening programs. The results demonstrated that AI-based CAD–PACS aid will give incremental health gain per patient of 0.015 QALY with an incremental cost-effectiveness ratio of $9491 per QALY. A Markov microsimulation model was developed to optimize the utilization of data from important randomized controlled trials and observational studies, while also accounting for the diversity and variability among health checkup population. The findings indicated that lung cancer screening using an AI-based digital device led to a modest increase in healthcare costs while providing improved health benefits in CXR option. As a result, the AI-based digital device was deemed highly cost-effective for lung cancer screening, based on a willingness-to-pay threshold of $32,409.9 per QALY gained. In contrast, LDCT showed higher costs but limited incremental benefits, which may explain why the LDCT strategy appeared more expensive yet slightly more effective than LDCT + AI. This difference is primarily attributed to random variation within the Markov simulation rather than a true clinical advantage. Furthermore, CXR-based screening demonstrated greater cost-effectiveness than LDCT, largely due to the lower screening cost and the relatively low lung cancer incidence in the Korean population, which reduces the marginal QALY gain achievable by more intensive LDCT screening. The benefit of AI addition was also minimal for LDCT, since LDCT already has very high baseline sensitivity, leaving limited room for measurable improvement. Consequently, the incremental cost of AI-assisted LDCT outweighed its marginal clinical gain. Notably, although the baseline sensitivity of CXR alone (0.470) was nearly half that of LDCT (0.810), AI integration improved the sensitivity of CXR to 0.696, substantially narrowing the diagnostic gap. This improvement, combined with the lower screening and follow-up costs of chest radiography, resulted in favorable cost-effectiveness for the CXR + AI strategy, even with lower absolute sensitivity. While subgroup differences may exist depending on age or smoking history, this finding highlights that moderate diagnostic enhancement through AI can lead to substantial economic benefits when applied to a general population. The specificity of the CXT + AI model, however, was derived from a single study and may have been slightly overestimated, which we acknowledge as a limitation.

ICERs exhibited extreme sensitivity to the assumptions made regarding the lung cancer mortality benefit associated with screening, both during and after the active screening phase, as observed in trials. Variations in this parameter led to the widest range of ICER values in one-way sensitivity analyses, indicating that the cost-effectiveness of lung cancer screening in Korea heavily relies on achieving a mortality benefit that is at least equivalent to that observed in the trials. At an indicative ‘willingness-to-pay’ threshold in Korea approximately $30,000, 76.2–91.8% of simulations in a probabilistic sensitivity analysis resulted in ICERs that could be considered cost-effective using parameters.

This evaluation is the first to integrate Korean trends in lung cancer screening rates, applying them to a general population and screening prevalence model. This model was used to estimate both the number of Koreans currently undergoing screening and those who are not, while also accounting for the competing risks of lung cancer-related and all-cause mortality. We also incorporated updated, comprehensive health-system costs associated with lung cancer, estimated in a large population-based cohort study linked to routinely collected, administrative health databases. Preliminary cost estimates31 and ours indicate that the overall healthcare costs for treating distant lung cancer have nearly doubled. It’s noteworthy that systemic therapy costs, only contribute to a portion of the overall expenses. Screening can potentially mitigate the higher costs and lower survival rates associated with later-stage disease, thereby potentially enhancing cost-effectiveness. Our findings also suggest that variations in the cost of LDCT scanning substantially influence the ICER, highlighting the importance of cost management in large-scale screening probrams. The cost of an LDCT scan could potentially be reduced in a large-scale screening program. Our threshold analysis demonstrated that lung cancer screening would be deemed cost-effective in our base case if the scan price was set at $128, considering a willingness-to-pay threshold of $32,409.9 per QALY gained.

High false-positive rates caused by benign intrapulmonary lymph nodes or non-calcified granulomas, overdiagnosis, and the potential risk of radiation-induced cancer from prolonged exposure are notable concerns associated with LDCT in lung cancer screening32. These challenges persist as the most critical obstacles in the application of LDCT for this purpose33. Although CAD techniques demonstrate high sensitivity in detecting lung cancer nodules, they often come with comparatively low specificity34. Implementation of CAD systems in clinics for lung cancer screening is recommended. Studies have indicated that false-positive results in lung cancer screening decrease with each millimeter increase in the threshold nodule size35. Trial data from NLST, PLCO, and others have shown that annual lung cancer screening reduces lung cancer mortality by 11–21%, while biennial screening reduces it by only 6.5–9.6%. Triennial screening has limited effectiveness in reducing lung cancer mortality. Furthermore, more frequent LDCT screening leads to increased false-positive results36.

Conventional screening chest radiography has failed to yield positive results in several studies37. However, the integration of digital chest radiography along with computer-aided diagnostic techniques and highly quantum-efficient detectors38,39, as well as AI-based detection algorithms shows promise in improving the visualization of pulmonary structures and enhancing detection accuracy. Our study demonstrated that an AI-based lung cancer screening approach proved to be more cost-effective and sensitive compared to traditional methods. While lung cancer screening with chest radiography presents challenges such as the occasional oversight of lung cancer lesions by radiologists40, specialized training for interpreting chest radiographs in lung cancer screening settings can be advantageous41. Additionally, relying solely on AI readings can be beneficial, especially in areas where there is a lack of radiologists. This approach is not only cost-effective but may also contribute to reducing mortality. Digitalized chest radiography is readily accessible and cost-effective, offering minimal radiation exposure to participants. While LDCT boasts higher sensitivity in lung cancer screening for detecting small nodules, the lack of financial resources presents obstacles to implementing lung cancer screening using LDCT.

Our study has several limitations. Firstly, the current Korean national lung cancer screening program targets individuals aged 54–74 years with a smoking history of 30 pack-years or more, whereas our model simulated a broader population including non-smokers, light-smokers, and heavy-smokers to evaluate population-wide cost-effectiveness. This difference in the target population may limit the direct comparability of our results with the existing national program, although it provides meaningful insights into potential outcomes if eligibility criteria were expanded in the future. Secondly, we did not account for the effects of smoking cessation among patients detected in the screening program due to a lack of available information. It is noteworthy that mortality rates double when patients fail to quit smoking after the early detection of lung cancer. Considering that additional benefits of smoking cessation have been documented in various cost-effectiveness studies of lung cancer screening42, the inclusion of smoking cessation effects should be considered in future research. Thirdly, the uncertainty regarding these parameters introduces ambiguity into the study’s conclusions. However, by employing sensitivity analyses, particularly probabilistic sensitivity analysis, we demonstrated that the outcomes remained largely consistent with our primary findings. Specifically, we confirmed that increasing screening rates across various age ranges is cost-effective, even when parameter values were adjusted within reasonable bounds. Furthermore, uncertainties are compounded by the possibility of negotiations influencing unit prices, which contribute to the overall expenses of the digital AI-based platform, particularly if the intervention were to be introduced. Fourthly, our analysis was conducted from a healthcare provider perspective; however, it’s crucial to recognize that lung cancer imposes substantial health-related and economic burdens not only on individual patients but also on society as a whole. Anticipated increases in broader societal costs in the coming years are largely attributed to shifts in the demographic composition of the population, leading to an aging society. In addition, the performance of AI models may vary depending on the dataset and training process, which could limit generalizability across populations. Our model incorporated estimates from peer-reviewed studies to reflect realistic performance levels in clinical settings. Importantly, the economic value of AI-assisted screening is highly sensitive to diagnostic accuracy: greater sensitivity enhances early detection benefits, whereas reduced specificity could increase downstream costs from false positives. Therefore, robust external validation of AI models is critical to ensure both clinical and economic applicability. Moreover, the specificity value for the CXR + AI strategy was derived from a single validation study, which may overestimate its diagnostic performance compared to results reported in broader multicenter analyses. This limitation highlights the need for additional large-scale evidence to confirm the reproducibility of CXR-based AI performance across diverse populations.

Conclusion

The study’s findings suggest that AI-based computer-aided detection for lung cancer screening shows significant potential for improving outcomes and achieving substantial cost-effectiveness on a large scale. This highlights the importance of integrating such identification into routine medical practice to improve the lung cancer screening process.