Introduction

Positron emission tomography (PET) myocardial perfusion imaging (MPI), especially when combined with computed tomography (CT), plays a pivotal role in assessing patients with suspected coronary artery disease (CAD). The wealth of data provided by cardiac PET/CT includes: (1) perfusion imaging, which depicts the extent of ischemia; (2) measurements of absolute myocardial blood flow (MBF), quantifying the volume of blood flow per unit time per myocardial mass (typically mL/min/g) and enhancing diagnostic accuracy beyond perfusion imaging alone1,2,3,4,5; (3) functional information; and (4) coronary artery calcium (CAC), a crucial indicator of atherosclerotic burden, all of which offer significant insights for risk stratification6. The latter can be derived automatically from CT attenuation correction (CTAC) scans, which are routinely acquired as part of MPI exams on modern PET/CT systems2,7.

However, currently, the assessment of CAD with PET/CT MPI does not optimally leverage the combined diagnostic power of these various imaging markers, including CAC3,4,8,9. Recognizing this gap, we developed an artificial intelligence (AI) model that integrates a parsimonious set of 10 common PET MPI parameters—features often used intuitively but not systematically combined in clinical practice. While clinicians assess these parameters during interpretation, synthesizing all relevant data into a single interpretation can be challenging. In fact, there remains debate regarding the best methods to integrate just three features (CAC, relative perfusion, and blood flow)10, highlighting the need for an objective, automated approach.

Our AI approach harnesses deep learning and machine learning to comprehensively integrate and analyze key PET–CT parameters—including calcium burden, perfusion, MBF, and functional metrics—enabling a more robust assessment of CAD. We based our model on standard PET imaging measurements rather than raw images to ensure generalizability, interpretability, and transparency—critical factors for clinical adoption. Furthermore, we rigorously validated our model on a large external testing cohort from three sites to confirm its robustness. To our knowledge, this work represents the first multicenter, externally validated, AI-driven cardiac PET MPI analysis.

Results

Study population

Patient characteristics are shown in Table 1. Among the 1664 participants, 1085 (65%) were male, and the median age was 68 with an interquartile range (IQR) of (61, 75). Of these patients, 386 were allocated for model training and optimization, while 1278 from separate institutions were used for external testing. The training data were complete, containing no missing values, whereas the percentages of missingness in the testing set were ≤1% (Supplementary Table 1). For univariate comparisons in the external testing set, missing values of total perfusion deficit (TPD) were imputed using the corresponding summed scores with scores/68*100%, and missing values of flows and CAC were imputed with their medians. The prevalence of CAD was significantly higher in the internal training set than in the external testing set (61% vs. 53%, p = 0.004).

Table 1 Patient characteristics

Myocardial imaging analysis parameters

In both the internal training and external testing sets, patients with CAD demonstrated significantly higher median ischemic TPD, stress TPD, and transient ischemic dilation ratio (TID) compared to those without CAD (p < 0.001 for all in the testing set; p < 0.001 for TPDs and p = 0.022 for TID in the training set) (Table 2). Additionally, patients with CAD had notably lower stress MBF (p < 0.001), myocardial flow reserve (MFR) (p < 0.001), and median stress ejection fraction (p = 0.011 in training, p < 0.001 in testing).

Table 2 Parameters included in the AI model from the training set (n = 386) and external testing set (n = 1278)

Coronary artery calcium

Of the internal training cohort (n = 386), CAC was 0 in 58 (15.0%) patients, >0-–100 in 58 (15.0%), >100–400 in 79 (20.5%), and CAC > 400 in 191 (49.5%) subjects. In the external testing set (n = 1278), 288 (22.5%) had a CAC of 0, 231 (18.1%) had a CAC between >0–100, 264 (20.7%) had a CAC between >100–400, and 495 (38.7%) had a CAC > 400 (Table 1).

Model performance

In external testing, we compared the AI model with clinical measurements (Fig. 1). The comprehensive machine learning model (area under the receiver operating characteristic curve [AUC] 0.83, [0.81–0.85]) outperformed clinical score from experienced physicians (0.80 [0.77–0.82], p = 0.02), ischemic TPD (0.79, [0.76–0.81], p < 0.001), MFR (0.75, [0.72–0.78], p < 0.001), stress MBF (0.75, [0.72–0.77], p < 0.001), and CAC (0.69, [0.66–0.72], p < 0.001) (Fig. 2a, Supplementary Table 2). Figure 2b shows that the AI model achieved significantly higher sensitivity (p < 0.001) than quantitative thresholds, including a minimum 3-vessel MFR < 2, stress MBF < 1.8, and ischemic TPD ≥ 5%, when matched for specificity in the external testing set. The AI thresholds and diagnostic accuracies were reported in Supplementary Table 3. This improvement was driven by the model’s ability to identify more high-risk patients classified by Duke 6 and Duke 5 criteria, compared to quantitative measurements at matched specificity (Supplementary Table 4). Supplementary Table 5 reports the performance assessment stratified by site and isotope, further highlighting the generalizability of the model. This is likely because we utilized tracer-specific normal limits for perfusion quantification and dedicated kinetic models for flow quantification, which ensured compatible quantitative values across the two tracers. We also compared the Extreme Gradient Boosting (XGBoost) model with the conventional logistic regression model in Supplementary Table 6, obtaining similar results. The calibration performance was presented in Supplementary Fig. 1.

Fig. 1: Central illustration.
Fig. 1: Central illustration.
Full size image

Artificial intelligence (AI) model integrating fully automated PET flow, perfusion quantification, and deep learning (DL)-based coronary artery calcium in all patients undergoing myocardial perfusion imaging positron emission computed tomography/computed tomography (PET/CT). Receiver-operating characteristics curve for coronary artery disease (CAD) diagnosis and area under the receiver-operating characteristic curve values of the AI model vs. clinical score, ischemic total perfusion deficit (TPD), myocardial flow reserve (MFR), and coronary calcium in external validation assessment (n = 1278). Clinical score was defined as the summed difference score from the 17-segment AHA model, reflecting standardized myocardial perfusion abnormalities. Created in BioRender. Zhang, W. (2025) https://BioRender.com/y27z326. LSTM long short-term memory network, XGBoost extreme gradient boosting.

Fig. 2: AI model performance.
Fig. 2: AI model performance.
Full size image

a Receiver-operating characteristics curve for coronary artery disease (CAD) and areas under the curve (AUC) values. AI Imaging model incorporates sex, coronary artery calcium score, rest and stress left ventricle ejection fraction (LVEF), rate pressure product (RPP), myocardial blood flow (MBF), and flow reserve (MFR), and perfusion measurements derived from PET imaging. b Sensitivity analysis comparing AI and quantitative methods for predicting CAD. Performance was assessed on the external validation set, with specificity matched between AI predictions and quantitative analysis. MFR and MBF were selected from the minimum of the 3 vessels. Notably, AI model shows significant improvement over MFR, MBF, and iTPD in terms of sensitivity. AI artificial intelligence, TID transient ischemic dilation, iTPD ischemic total perfusion deficit.

The top features driving the prediction were ischemic and stress TPD, CAC, and MFR using shapley additive explanations (SHAP) values in Supplementary Fig. 2. In a predefined subgroup analysis, the AI models exhibited comparable performance across various subgroups: female and male patients (AUC 0.83 vs. 0.83, p = 0.65), patients classified as non-obese (body mass index [BMI] < 30 kg/m2) and obese (BMI ≥ 30 kg/m2) (AUC 0.84 vs. 0.81, p = 0.10), and older (age ≥ 65 years) and younger (age < 65 years) (AUC 0.84 vs. 0.81, p = 0.36) (Supplementary Fig. 3). Additionally, the AI model was evaluated for its performance among patients not referred to 180-day invasive coronary angiography (ICA), with outcomes assessed against major adverse cardiovascular events up to 5 years (Supplementary Fig. 4).

Individualized Inference by the AI model

In Fig. 3, two cases illustrate the ability of the AI model to integrate multiple PET MPI parameters for CAD diagnosis. Figure 3a demonstrates how the model accurately identifies obstructive CAD, emphasizing MFR as a key driver. Figure 3b highlights the capability of the model to correctly rule out CAD despite borderline TPD and clinician-assigned stress scores. These examples demonstrate the model’s effectiveness in leveraging integrated biomarkers to align predictions with patient outcomes.

Fig. 3: Examples of PET/CT imaging in predicted CAD patients.
Fig. 3: Examples of PET/CT imaging in predicted CAD patients.
Full size image

Examples of patients undergoing PET/CT myocardial perfusion imaging predicted to have CAD. Bar plots demonstrate the top 5 parameters with the highest influence on per-patient prediction of CAD. a Case where patient has abnormal MFR and stress MBF, borderline stress TPD, zero CAC, left main ≥50% and ≥70% in other three arteries (99% in LAD, 70% in distal LCX, and 70% in distal RCA); AI model estimated the likelihood of CAD to be high in agreement with the clinical score. b Case where patient has normal MFR and stress MBF, borderline ischemic and stress TPD, zero CAC, no actual CAD (20% stenosis in mid RCA; no stenosis elsewhere). The clinical score indicated an abnormal scan. The AI model correctly estimated the likelihood of CAD to be low, aligning with the patient’s actual condition. While iTPD for the patient is borderline, the presence of other factors, such as normal blood flow, reduces the probability of the disease. BMI body mass index, CAC coronary calcium score, CAD coronary artery disease, CT computed tomography, LCX left circumflex artery, LAD left anterior descending, LM left main, LVEF left ventricular ejection fraction, MBF myocardial blood flow, MFR myocardial flow reserve, PET positron emission tomography, RPP rate pressure product, TID transient ischemic dilation, TPD total perfusion deficit, VD vessel disease.

Discussion

PET/CT, with its unique ability to quantify absolute MBF, allows robust detection of obstructive coronary stenoses across the spectrum of CAD, including diffuse atherosclerosis, microvascular disease, or multivessel disease. Consequently, PET is the fastest-growing non-invasive cardiac imaging test11. However, to take full advantage of PET/CT, it is necessary to combine complex data, which is currently not performed systematically.

In this study, we developed, tested, and externally validated an interpretable AI model that integrates key quantitative PET MPI parameters—including stress/rest perfusion, myocardial flow, ejection fraction, AI-derived calcium scores, and sex—into a unified assessment. In multicenter external validation, our AI approach outperformed both expert physicians (provided with complete imaging/clinical data during assessment) and conventional quantitative methods. The AI model demonstrated improved sensitivity and superior detection of high-risk patients, highlighting its strong generalizability despite inter-center variability. To avoid the perception of AI as a ‘black box’, we have provided both patient-level and model-level explanations by highlighting the most influential factors in predicting significant CAD, enhancing physician understanding and confidence in AI-driven predictions. Collectively, these characteristics underscore the tremendous potential of our AI model for a real-world, practical approach to diagnosis, incorporating all major factors currently considered by physicians, including calcium derived by deep learning from CT attenuation maps.

To date, several single-site studies have investigated the benefit of integrating subsets of PET/CT MPI parameters3,4,9,12,13,14,15,16, Initial studies focused on optimizing risk stratification by combining MBF and perfusion data3,12,16. Gupta et al. established that MFR serves as a stronger predictor of cardiac mortality than maximal MBF in patients with stable CAD, with integrated assessment of MFR and MBF identifying distinct prognostic risk profiles, particularly highlighting MFR impairment as a key mortality indicator9. Gould et al. proposed the coronary flow capacity measure, which integrates regional stress MBF and MFR into one variable14. Using data from a single center, Singh et al. developed an explainable deep learning model for all-cause mortality prediction, integrating polar maps of stress and rest perfusion, MBF, MFR, and spill-over fraction combined with cardiac volumes, singular indices, and sex15. These prior studies have consistently demonstrated that combining multiparametric PET data improves risk stratification, with more comprehensive models outperforming approaches relying on only two or three variables.

Few studies have attempted to improve the diagnostic accuracy of cardiac PET/CT through such an integrative approach3,6,15. Poitrasson-Rivière et al. demonstrated that combining MFR and relative perfusion improves the detection of obstructive CAD4. Beyond flow and perfusion measures, assessments of the coronary atherosclerotic burden can provide additional information. Zampella et al. demonstrated that combining CAC, MBF, and MFR provides incremental information about the presence of CAD17. Similarly, Brodov et al. showed that integrating per-vessel ischemic TPD with CAC improves CAD detection13. While previous studies combining PET and CAC data relied on dedicated ECG-gated CT and manual expert analysis, in this study, we obtained CAC data automatically from PET CTAC. By leveraging CTAC for disease burden assessments, our approach can be employed for all PET MPI studies rather than being limited to exams that include a dedicated CAC scan. Our study is the first AI empowered multicenter PET analysis with rigorous external validation. While previous efforts relied on single-center data and hence employed prospective validation, we developed our model using data from one site and tested its performance on unseen data from three independent high-volume PET centers. This provides a critical evaluation of the external generalizability of our approach, which is further strengthened by the utilization of PET imaging measurements that vary less between institutions compared to actual PET images18.

In our study, XGBoost was compared against conventional logistic regression. Both models performed similarly in terms of discrimination and calibration, consistent with prior evidence that machine learning often does not outperform well-specified regression models when sample size and predictors are limited19. Nonetheless, XGBoost offers advantages that may become more relevant in larger cohorts or with richer feature set20,21,22.

Traditional clinical studies combining multiple PET parameters required separate threshold values for key parameters and relied on conventional statistics3,12. In contrast, our AI model integrates multidimensional data as continuous variables, delivering a probability-based assessment of obstructive CAD without relying on arbitrary thresholds. The results presented in this study suggest that a holistic AI-based approach improves overall diagnostic sensitivity and enables more accurate identification of high-risk patients who may be missed by individual quantitative metrics. Using SHAP values, the model highlights parameters influencing CAD predictions for individual patients (Fig. 3), enabling physicians to validate and understand case-specific diagnostic factors. This explainable AI model enhances clinical workflows by providing quantitative, automated, and objective insights, ultimately improving the efficiency of PET/CT MPI-based CAD prediction. By offering patient-specific diagnostic insights, it empowers physicians with comprehensive and actionable decision support at the point of care.

This study has a few limitations. While it was a retrospective study that only included four sites, it represents the largest PET/MPI and CAC study with invasive angiography data used as a gold standard. At this stage, the model is intended strictly for research use and is not approved for clinical deployment. Further external validation—ideally through prospective studies evaluating workflow integration and clinical impact—will be required before regulatory consideration and real-world translation. While most processing was automated, there was a quality control step of myocardial contours performed by an experienced technologist. Another limitation is the potential for referral bias, as only 10% of the overall PET MPI cohort underwent ICA within 180 days. These patients have higher rates of medical comorbidities and more abnormal perfusion findings compared to the overall PET MPI referral populations. However, these selection criteria are unavoidable in cases in which the presence or absence of obstructive CAD needs to be confirmed by invasive testing23,24. Additionally, although invasive angiography serves as a traditional diagnostic endpoint, it is important to acknowledge the possibility of significant disease being present despite negative angiographic findings. For example, flow assessment by PET MPI can detect microvascular dysfunction and diffuse atherosclerosis without significant angiographic stenosis, an aspect that was not addressed in the current study. A limitation is that our reference standard was based solely on anatomical stenosis thresholds (≥50% left main, ≥70% other vessels), as fractional flow reserve (FFR) and instantaneous wave-free ratio (iFR) measurements were not available in our cohort25. This approach may not fully capture the physiological significance of coronary lesions or account for microvascular dysfunction. Prior studies have demonstrated substantial discordance between anatomical severity and functional ischemia, underscoring the importance of future work incorporating functional indices26,27. Finally, race analysis was excluded from this study due to the limited availability of data on individuals from racial and ethnic minority groups, such as Black individuals.

This study introduces a novel AI model that integrates PET MPI parameters with deep learning-derived calcium scores, significantly enhancing CAD diagnosis. It outperforms experienced physicians, offers robust and interpretable assessments, and represents the first multicenter study with external validation for AI-driven cardiac PET MPI analysis.

Methods

Study population

In our multi-center study involving 17,348 patients undergoing cardiac PET/CT from the REgistry of Fast Myocardial Perfusion Imaging with NExt generation PET (REFINE PET), we retrospectively enrolled 1664 patients from four sites who had suspected CAD, underwent 82Rubidium or 13N-ammonia PET MPI, and ICA within 180 days from the PET/CT (Supplementary Fig. 5). CAD was defined as left main diameter stenosis ≥50% or ≥70% diameter stenosis in other epicardial arteries, based on clinical ICA readings and review of the images by an experienced physician. At each site, perfusion, quantitative flow, function, and CTAC data were interpreted by local readers, and reports were sent to the ordering physicians to guide decisions on whether ICA was appropriate. Patients with prior myocardial infarction, percutaneous coronary intervention (PCI), and coronary artery bypass graft (CABG) were excluded from the analysis (Table 1). For patients who underwent several exams within the study period, only the initial exam was considered. Data from a single site, comprising 386 patients, were used for model training and optimization, while data from three additional sites, totaling 1278 patients, were reserved for external testing. Clinical and demographic data were collected on the day of the MPI scan. All ICA referrals followed routine clinical practice, with decisions based on patients’ symptom status, past medical history, and PET perfusion/flow findings from clinical reports, particularly the severity of flow reduction and perfusion abnormality23.

Institutional review boards (IRB) approval was obtained at each site, and the study complies with the Declaration of Helsinki. Sites either obtained written informed consent or a waiver of consent for the use of the de-identified data.

PET protocol

For all patients, same-day rest and pharmacologic stress 82Rubidium or 13N-ammonia PET MPI studies were conducted using a Biograph mCT 64 PET/CT scanner (Siemens Healthineers), a Discovery RX scanner (GE Healthcare), or a Discovery 710 scanner (GE Healthcare). A 6-min rest list-mode acquisition was started immediately before administering weight-based doses of 82Rubidium or 13N-ammonia. Pharmacologic stress was induced using regadenoson (n = 1489), adenosine (n = 131), dobutamine (n = 25), or dipyridamole (n = 19). Concurrent with the beginning of the injection, a 6-min stress imaging acquisition was initiated. Prior to each PET acquisition for rest and stress, a low-dose helical CT scan was performed for attenuation correction, as detailed previously28.

PET imaging and quantification

All PET imaging variables, including myocardial perfusion, blood flow, ejection fraction, and TID, were computed automatically in batch mode at the core laboratory with dedicated software (QPET, Cedars-Sinai Medical Center, Los Angeles, CA), ensuring standardized measurements between sites29,30,31. Rest and stress relative perfusion were quantified using TPD. Ischemic TPD is defined as stress TPD—rest TPD; Normal myocardial perfusion was defined as stress TPD < 5%32. Rest and stress MBF were measured using a 1-tissue compartment kinetic model for 82Rubidium PET and a 2-compartment model for 13N-ammonia PET29,30,31. MBF and the spillover fraction from the blood to the myocardium were determined via numeric optimization. Stress and rest flow values, expressed in units of mL/g/min, were computed locally at each pixel on the polar map. Minimal segmental stress MBF (stress MBF) was used in the training of the AI model33. Specifically, stress MBF and MFR were derived from the 17-segment AHA model using the lowest segmental value. The rate–pressure product (RPP) was derived by multiplying the heart rate (bpm) by the systolic blood pressure (mmHg), which was then applied to normalize the rest MBF within the angiographic group using the formula (rest MBF × average RPP)/RPP. Heart rate and blood pressure used for calculating the RPP were obtained at rest. The average RPP value in the angiographic group was 8500 bpm mmHg34,35. MFR was computed in each segment as the ratio of stress MBF to rest MBF (rest flow adjusted by RPP) for the same segmental anatomy.

Clinical scoring

PET/CT scans were visually assessed during clinical reporting by experienced physicians at each site, with knowledge of all available data, including stress and rest perfusion imaging, gated functional data, myocardial flow reserve, all other quantitative information, CT images, and clinical information. The final visual assessment was performed by summed stress scores (SSS), summed rest scores (SRS), and summed difference score (SDS) using the 17-segment American Heart Association model. The SDS was used as the final clinical score due to its reliability and standardization in summarizing myocardial perfusion abnormalities.

AI coronary artery calcium scoring

We used our previously validated deep learning (DL) model for CAC segmentation and scoring2,36. This model was trained and internally validated on data from three centers, including a total of 9543 scans: 1827 electrocardiographically gated CAC scans and 7716 CTAC maps2. Using the established deep learning segmentation method, we automatically derived CAC scores from CTAC maps.

Classification model and internal model testing

XGBoost models (version 1.7.3), a leading machine learning approach, were used for CAD diagnosis20. Initially, 10-fold cross-validation was applied across the training dataset, where each fold allocated 90% of the data for model training and the remaining 10% for internal validation. To optimize model performance, hyper-parameter tuning was conducted within each fold through grid search, selecting the optimal hyper-parameter configuration from all tested combinations. In each 10-fold subset, an internal 10-fold cross-validation regimen further allocated 90% of the subset data for fitting and 10% for tuning, maximizing training data utilization and mitigating overfitting (Supplementary Fig. 6).

For the subsequent model evaluation phase, an external testing set was used. This additional testing set, originating from three distinct sites from that of the training data, was employed to rigorously assess the predictive performance on unseen data, providing an unbiased evaluation. The final model tested was created using the optimal hyperparameters obtained through grid search, retrained on the entire training set, and then tested on this external testing set.

The AI model outputs a continuous likelihood score between 0 and 1, with a higher value indicating a higher probability of being diagnosed as CAD. For binary classification, likelihoods were converted into binary outcomes using thresholds calibrated to the specificity of an established clinical metric (such as ischemic TPD). This threshold ensures the AI model identifies a similar proportion of patients as “normal,” allowing for clinically meaningful comparison between AI predictions and traditional assessments.

The diagnostic performance of the holistic AI model was evaluated using AUC and sensitivity analysis. Sensitivity was compared by adjusting prediction thresholds to match specificity across methods, then assessing the corresponding sensitivity. Comparisons focused on high-risk groups in the external testing set, defined by Duke category 6 (≥50% left main stenosis, ≥70% 3-vessel disease, or 2-vessel CAD involving proximal left anterior descending (LAD) and Duke category 5 (2-vessel severe stenosis excluding proximal LAD, one-vessel severe stenosis in proximal LAD, or ≥50% 3-vessel moderate stenosis)37. Quantitative analysis includes minimum 3-vessel MFR < 212, minimum 3-vessel stress MBF < 1.838, and ischemic TPD > 5%13.

Additionally, subgroup analyses were conducted to gain further insights into the performance of the AI models. Subgroups were categorized based on sex (male, female), age (<65, ≥65 years), and BMI (<30 kg/m2, ≥30 kg/m2)39. Due to limited data for racial and ethnic minorities, the race-based subgroup analysis was excluded.

Models

Six models were used for the CAD diagnosis: 1—CAC derived from CTAC maps, 2—ischemic TPD, 3—stress MBF, 4—MFR, 5—summed difference score (clinical score) obtained during clinical reading by an experienced physician, 6—AI, which employs PET/CT-derived data solely without incorporating additional clinical variables. The AI model incorporated the CAC along with: Stress MBF, MFR, ischemic TPD, stress TPD, stress and rest left ventricular ejection fraction (LVEF), RPP, TID, and sex, resulting in a total of 10 image-derived features to mimic clinical practice (Table 2). The model is designed to hold regardless of age, BMI, obesity, or other demographic factors.

Model explainability

The predictive power of model variables was assessed using XGBoost feature importance, measured by information gain to quantify accuracy improvement from each feature20. Features were ranked by overall importance, with the most influential at the top. SHAP, a game-theory method, explained how features contributed to individual predictions40. In SHAP plots, dots represent the SHAP value of each patient for a feature, with positive values increasing predictions and negative values decreasing them. The x-axis distance from zero indicates the magnitude of influence.

Statistical analysis

We assessed the distribution of data using the Shapiro–Wilk test. Categorical variables were reported as n (%) and continuous variables with a normal distribution as mean ± standard deviation (SD), while non-normal distribution was reported as median with IQR [IQ1–IQ3]. We used the Pearson’s χ2 test to assess the differences between categorical variables, the student’s t-test for continuous variables with normal distribution, the Mann–Whitney–Wilcoxon test for non-parametric continuous variables. Predictions of CAD by ischemia, stress MBF, MFR, coronary calcium, and AI models were assessed by pairwise comparisons of the AUC with the DeLong test41. A two-tailed p-value of <0.05 was considered statistically significant. All statistical analyses were performed with Pandas (version 2.1.1), Numpy (version 1.24.3), Scipy (version 1.11.4), Lifelines (version 0.28.0) and Scikit-learn (version 1.3.0) in Python 3.11.5 (Python Software Foundation, Wilmington, DE, USA), as well as “nricens” package (version 1.6) in R version 4.3.2 (R Foundation for Statistical Computing, Vienna, Austria).