Introduction

Alzheimer’s disease (AD) is the predominant form of dementia, representing the majority of cases across aging populations1. Epidemiological studies have demonstrated that AD incidence increases substantially with age, showing a marked acceleration in risk particularly after age 652. The pathological hallmarks of AD include the accumulation of β-amyloid (Aβ) protein fragments in the extracellular space and the presence of intracellular neurofibrillary tangles composed of abnormally phosphorylated tau protein3,4. Of particular clinical significance is the observation that Aβ deposition can begin decades before the manifestation of cognitive symptoms, highlighting a crucial window of opportunity for therapeutic intervention5. This extended preclinical phase emphasizes the critical importance of identifying amyloid status early in the disease process, potentially allowing for more effective treatment strategies before significant cognitive decline occurs.

Two standard approaches are available for detecting amyloid-β (Aβ) pathology in the brain. One method involves molecular imaging through positron emission tomography (PET) using various amyloid-specific radiotracers, including 11C-Pittsburgh compound B (PiB)6, 18F-florbetapir7, 18F-florbetaben8, and 18F-flutemetamol9. The other method utilizes the analysis of cerebrospinal fluid (CSF) to measure Aβ concentrations10. Although these methods are well-established for clinical diagnosis, significant practical limitations exist. Amyloid PET scanning, while highly specific, remains beyond reach for many patients due to its substantial cost and limited availability at specialized centers. CSF analysis, while diagnostically accurate, has limitations due to its requirement for invasive lumbar puncture, a procedure that many patients are reluctant to undergo. These limitations highlight the need for more accessible approaches to determine amyloid status.

Previous studies have explored the potential of 18F-fluorodeoxyglucose (FDG) PET, a widely available functional imaging modality, for predicting Aβ positivity11,12,13. Recently, dual-phase protocols, which enable acquisition of two different functional images from a single radiotracer injection, have been widely adopted by many institutions and incorporated into their routine amyloid PET protocols. This approach offers practical advantages by reducing costs and avoiding duplicate radiation exposure compared to separate amyloid and FDG PET scans. Studies have shown that early-phase amyloid PET provides metabolic information highly comparable to FDG PET. Meyer et al. demonstrated that the R1 parameter, representing relative perfusion derived from pharmacokinetic analysis of early-phase PIB data, strongly correlates with FDG uptake14. Peretti et al. found a high correlation between early PiB perfusion and FDG in AD15, Hsiao et al. showed that early-phase florbetapir perfusion correlated well with FDG metabolism, particularly in regions vulnerable to AD16, and Boccalini et al. recently confirmed that both early florbetapir and flutemetamol provide similar diagnostic performance to FDG in discriminating AD patients17. This consistent evidence for tight coupling between FDG PET metabolism and early-phase amyloid PET perfusion provides the foundation for using both approaches to evaluate neuronal dysfunction and predict amyloid status.

In particular, early-phase amyloid PET offers an advantage of simultaneously acquiring metabolism and amyloid information from a single scan, while most previous FDG PET-based prediction models have relied on FDG and amyloid PET scans acquired at different timepoints. In this study, we developed machine learning models using early-phase 18F-florbetaben PET from a single-institution cohort comprising cognitively normal (NC), mild cognitive impairment (MCI), and dementia samples to predict Aβ positivity. While not aiming to replace delayed-phase amyloid imaging, our approach explores whether metabolism information from early-phase scanning can contribute to amyloid status prediction. Furthermore, we constructed machine learning models based on various clinical indicators, including APOE genotype and cognitive test results, and sought to enhance performance by developing an integrated model that combines both early PET imaging and clinical indicators. Additionally, we investigated whether early-phase amyloid PET can predict cognitive status, maximizing the clinical utility of dual-phase imaging protocols. Through this comprehensive approach, we also aim to provide insights into the potential application of FDG PET-based prediction models, given the established correlation between early-phase amyloid PET and FDG PET metabolism information.

Materials and methods

Patient cohort

Between June 2018 and July 2021, 176 subjects who completed dual-phase 18F-FBB PET scanning were retrospectively included in this study. Diagnostic classification was based on comprehensive clinical evaluations including medical history, neurological examinations, laboratory tests, neuropsychological assessments, and PET results. The diagnostic criteria for MCI and AD were based on those proposed by the National Institute on Aging and Alzheimer’s Association (NIA-AA)18. Aβ status was determined based upon visual assessment of 18F-FBB PET scan results19. This study was conducted in accordance with the Declaration of Helsinki (2013 version) and received approval from the Institutional Review Board (IRB) of Ewha Womans University Mokdong Hospital (IRB No. EUMC 2024-07-013). Due to the retrospective nature of the study, the requirement for informed consent was waived by the IRB of Ewha Womans University Mokdong Hospital.

18F-FBB PET imaging

All 18F-FBB PET/CT examinations were conducted following our institution’s standardized protocol. We acquired PET images in three-dimensional (3D) list-mode using a dedicated PET/CT scanner (Biograph mCT, Siemens) following a bolus injection of 308.12 ± 10.93 MBq 18F-FBB. Image acquisition was performed over 10 min for the early phase (0–10 min post-injection) and 20 min for the delayed-phase (90–110 min post-injection). Brain CT was performed using spiral acquisition with the following parameters: 120 kV, 30 mAs, and 1.0 mm slice thickness. The CT data were used for PET attenuation correction. To minimize motion artifacts, subjects’ heads were immobilized using a head holder and vacuum cushion fixation system. Standard PET data from dual-phase scans were reconstructed into a 128 × 128 matrix (voxel size: 3.18 × 3.18 × 2.02 mm3) using the scanner’s built-in 3D ordered subset expectation maximization algorithm (4 iterations, 12 subsets, 5-mm Gaussian filter). The reconstructed data were then averaged into single frames for both early and delayed-phases.

Aβ status

Aβ status was determined from delayed-phase 18F-FBB PET scans (90–110 min post-injection), where two expert nuclear medicine physicians independently performed visual assessments while blinded to all clinical information20,21. The visual assessment involved evaluating tracer uptake in four target cortical regions (lateral temporal, frontal, and parietal cortices, and posterior cingulate cortex/precuneus) using a three-point scale for regional cortical tracer uptake (RCTU): 1 indicating no uptake, 2 indicating moderate uptake, and 3 indicating pronounced uptake. Global brain amyloid burden was then classified using a three-tier system based on RCTU scores: level 1 representing RCTU score 1 in all four regions, level 2 indicating RCTU score 2 in any region without score 3, and level 3 showing RCTU score 3 in at least one region. Subjects were classified as Aβ positive (Aβ+) if their global amyloid burden score was 2 or 3, and Aβ negative (Aβ) otherwise. Additionally, as a quantitative measure of amyloid burden, global standardized uptake value ratios (SUVR) were obtained using Neurophet software (http://neurophet.com/, South Korea).

Early phase PET and clinical features for a predictive model

To develop a machine learning model for predicting Aβ positivity, we utilized early-phase PET and clinical features. From early-phase 18F-FBB PET scans (0–10 min post-injection), we extracted brain region-specific SUVR as imaging features. SUVR measurements were obtained using Neurophet software (http://neurophet.com/, South Korea), an artificial intelligence (AI)-based software that provides automated quantification by directly processing reconstructed DICOM (Digital Imaging and Communications in Medicine) PET images via co-registered MRIs, using the whole cerebellum as the reference region. All reconstructed PET images underwent spatial normalization to Montreal Neurological Institute (MNI) standard space using individual MRI data. Automatic quantitative analyses were performed using volumes of interest defined on atlas templates from the Automated Anatomical Labeling Atlas 3 (AAL3).

To predict Aβ positivity, we used clinical data encompassing multiple diagnostic and demographic factors related to cognitive function, lifestyle, and brain health. Specific variables included indicators of APOE genotype, global cognition assessed by the Korean version of the Mini-Mental Status Examination (MMSE), global Clinical Dementia Rating (CDR), CDR Sum of Boxes (CDRSB), Geriatric Depression Scale (GDS), functional assessment by Instrumental Activities of Daily Living (IADL), and a detailed neuropsychological battery of Seoul Neuropsychological Screening Battery-II (SNSB) across nine subdomains. Additionally, patient characteristics such as years of education, handedness, and family history of dementia were included, alongside lifestyle factors like smoking and alcohol consumption. Family history of dementia was defined as having parents or siblings with any form of dementia, not limited to AD specifically.

Data preprocessing

Several key preprocessing steps were undertaken to prepare the dataset for machine learning analysis. First, categorical variables were transformed into numerical formats using One-Hot Encoding, creating binary columns for each category to facilitate effective processing by machine learning algorithms. Second, numerical features were standardized using the StandardScaler to ensure a mean of zero and a standard deviation of one, enhancing the uniformity of feature scales. The StandardScaler was fitted on the training data and then applied to both the training and test datasets. Finally, the preprocessed dataset was divided into training and testing sets, with 80% allocated for model training and 20% reserved for testing. The dataset was randomly split into training and testing sets to maintain representativeness and ensure reproducible outcomes. A fixed partitioning method was used to prevent bias and maintain consistency across experiments. These preprocessing steps were designed to ensure data quality and optimize the performance of the machine learning models.

Comprehensive evaluation of machine learning models

For predicting Aβ positivity, we implemented and compared multiple machine learning models including Random Forest, K-Nearest Neighbors, Naive Bayes, Decision Tree, Gradient Boosting, Bagging Classifier, XGBoost. The performance of each model was comprehensively evaluated using several metrics. We assessed accuracy, defined as the proportion of correctly predicted instances to the total instances in the dataset, and the Receiver Operating Characteristic Area Under the Curve (ROC AUC), which evaluates the model’s ability to differentiate between positive and negative classes by plotting the true positive rate against the false positive rate at various threshold settings. Additionally, we measured recall (sensitivity), which quantifies the proportion of true positive predictions relative to the total actual positives, and precision, which assesses the accuracy of positive predictions made by the model. The F1 score was calculated to provide a balanced measure between precision and recall. Through this comprehensive evaluation process, we aimed to identify the most effective model for classifying Aβ positivity in clinical applications.

To further validate the performance of the machine learning models, our study evaluated them under two distinct scenarios. In the first scenario, we utilized only early-phase PET imaging features to predict Aβ positivity. This approach allowed us to assess the standalone diagnostic power of PET imaging. In the second scenario, we combined early-phase PET imaging features with clinical data, including demographic and neuropsychological test results, to evaluate whether integrating clinical information enhances prediction accuracy.

To identify the most influential features contributing to the predictions made by each model, we calculated the feature importance scores for the top 10 features per model. Feature importance for tree-based models was determined using their inherent feature importance scoring mechanisms, which assess the contribution of each feature to the reduction of impurity during model training. For non-tree-based models, we employed permutation importance, a model-agnostic method that evaluates the importance of each feature by measuring the decrease in model when the feature’s values are randomly permuted. This systematic approach enabled a comprehensive analysis of the critical factors influencing the predictions across all models.

Statistical analysis

All statistical analyses were conducted using appropriate statistical software. To compare demographic and clinical characteristics between Aβ-negative and Aβ-positive groups, we performed independent two-sample t-tests for continuous variables and chi-square tests for categorical variables.

For comparisons of SUVR values between Aβ-negative and Aβ-positive groups, independent two-sample t-tests were performed. If assumptions of normality were violated, the Mann-Whitney U test was used. A p-value of < 0.05 was considered statistically significant.

Results

Subject information collected for this study is summarized in Table 1. Evaluation of model performance for predicting Aβ positivity was conducted across two distinct datasets: early-phase PET only, and combination of early-phase PET and clinical data. The results are summarized in the following subsections. The table showing the clinical characteristics of participants stratified by amyloid PET positivity according to the training and test sets can be found in Supplemental Table 1.

Table 1 Clinical characteristics of participants stratified by amyloid PET positivity.

Early-phase PET only model performance for predicting Aβ positivity

The accuracy scores presented for each classifier correspond to their predictions on the test data, ensuring an unbiased evaluation of model performance. Table 2 summarizes the performance metrics of various classifiers evaluated on the early-phase PET-only dataset. Random Forest maintained the highest accuracy at 80.56%, with an AUC of 0.83, recall of 0.74, precision of 0.88, and F1 score of 0.80. Following Random Forest, XGBoost achieved an accuracy of 77.78% with a comparable AUC of 0.81, with a recall of 0.68, precision of 0.87, and F1 score of 0.76.

K-Nearest Neighbors (KNN) reached an accuracy of 75.00%, with a slightly underperforming AUC of 0.78 and recall of 0.58 but demonstrating high precision of 0.92, resulting in an F1 score of 0.71. The Gradient Boosting classifier matched KNN’s accuracy at 75.00%, with a higher ROC AUC of 0.82, though its recall of 0.63, precision of 0.86, and F1 score of 0.73 indicated slightly more balanced performance.

Bagging Classifier, while achieving a lower accuracy of 69.44%, with a comparable AUC of 0.82 and precision of 0.79, though recall remained at 0.58, resulting in an F1 score of 0.67. Meanwhile, Naive Bayes and Decision Tree both achieved the lowest accuracy at 66.67%, with AUC values of 0.77 and 0.67, respectively. Naive Bayes showed higher precision of 0.89 but low recall of 0.42, leading to an F1 score of 0.57, while Decision Tree had slightly more balanced recall of 0.58 and precision of 0.73, resulting in an F1 score of 0.65.

Table 2 This table summarizes the effectiveness of different machine learning classifiers for predicting Aβ positivity using early-phase PET only. The metrics include accuracy, AUC, recall, precision, and F1 score for each model.

When extracting the top 10 important features, temporal and limbic regions showed high predictive value, with hippocampus SUVR showing the highest importance (0.086), followed by isthmus of cingulate SUVR (0.051), entorhinal SUVR (0.038), parahippocampal SUVR (0.031), and temporal pole SUVR (0.029). Figure 1 shows a bar chart illustrating the relative importance of these early-phase PET features in predicting Aβ positivity.

Fig. 1
figure 1

Bar chart illustrating the importance of early-phase PET features in predicting Aβ positivity.

When the difference of top 3 SUVR values derived solely from early-phase PET data according to Aβ positivity was analyzed, hippocampus SUVR, isthmus of cingulate SUVR, and entorhinal SUVR showed significantly lower values in the amyloid positivity group (positivity 1) compared to the negative group (positivity 0). For hippocampus SUVR, positivity 0 group showed higher values (mean: 0.84, range: 0.67–1.18) compared to positivity 1 group (mean: 0.80, range: 0.68–1.12). Similarly, isthmus of cingulate SUVR was higher in positivity 0 group (mean: 1.02, range: 0.85–1.16) than in positivity 1 group (mean: 0.98, range: 0.79–1.70). Entorhinal SUVR also demonstrated higher values in positivity 0 group (mean: 0.70, range: 0.50–0.95) compared to positivity 1 group (mean: 0.66, range: 0.43–1.05). These differences between positivity groups are summarized in Fig. 2. The figure showing the top 10 features that most influenced model accuracy can be found in Supplemental Fig. 1.

Fig. 2
figure 2

Distribution of SUVR values for hippocampus, isthmus of cingulate, and entorhinal regions shown as box plots. (a) Hippocampus SUVR (p < 0.05), (b) Isthmus of cingulate SUVR (p < 0.05), and (c) Entorhinal SUVR (p < 0.05), illustrating the differences between Positivity 0 and Positivity 1 in each region.

Combination of early-phase PET and clinical data model performance for predicting Aβ positivity

Table 3 summarizes the performance metrics of various classifiers evaluated on the combined dataset of early-phase PET and clinical data. Gradient Boosting achieved the highest accuracy at 88.89%, with an outstanding AUC of 0.92 and precision of 1.00. It also demonstrated a comparable recall of 0.79 and the highest F1 score of 0.88.

Table 3 This table summarizes the effectiveness of different machine learning classifiers for predicting Aβ positivity using a combination of early-phase PET and clinical data. The metrics include accuracy, ROC AUC, recall, precision, and F1 score for each model.

Random Forest, Naive Bayes, and XGBoost each showed competitive performance, achieving 83.33% accuracy, AUC values of 0.92 (Random Forest and XGBoost), and 0.89 (Naive Bayes). These models exhibited balanced recall of 0.74 and precision of 0.93, leading to comparable F1 scores of 0.82.

Bagging Classifier also achieved 83.33% accuracy but had a slightly lower AUC of 0.84 compared to the top-performing models. Despite its relatively lower recall of 0.68, it maintained perfect precision of 1.00 and a comparable F1 score of 0.81.

KNN and Decision Tree delivered lower accuracy at 77.78%, with AUC values of 0.79 (KNN) and 0.78 (Decision Tree). KNN demonstrated higher recall of 0.84 compared to 0.68 of Decision Tree, but both showed moderate performance with F1 scores of 0.80 (KNN) and 0.76 (Decision Tree).

The combined model identified APOE genotype (0.2485), medial orbitofrontal SUVR (0.0996), and hippocampus SUVR (0.0663) as the most important features. Other notable contributors included posterior transverse temporal SUVR (0.0302), age (0.0301), and Digit Span Forward test as a measure of cognitive performance (0.0212). Additional imaging features, such as pars opercularis SUVR (0.0203), accumbens SUVR (0.0154), and temporal pole SUVR (0.0152), along with the Rey Complex Figure test as a measure of cognitive performance (0.0194), were also nominated. Figure 3 shows a bar chart illustrating the relative importance of these early-phase PET combined with clinical features in predicting Aβ positivity.

Fig. 3
figure 3

Bar chart showing the importance of clinical features in predicting Aβ positivity.

When analyzing the top 3 SUVR from the combined model according to Aβ positivity, medial orbitofrontal SUVR (p = 0.50) and posterior transverse temporal SUVR (p = 0.39) both did not show statistically significant differences between amyloid positivity groups. For medial orbitofrontal SUVR, positivity 0 group showed values (mean: 1.01, range: 0.77–1.33) compared to positivity 1 group (mean: 1.00, range: 0.81–1.81). And for posterior transverse temporal SUVR, positivity 0 group showed values (mean: 1.17, range: 0.99–1.53), compared to positivity 1 group (mean: 1.15, range: 0.88–2.05). These differences between positivity groups are summarized in box plots in Fig. 4.

Fig. 4
figure 4

Distribution of SUVR values for medial orbitofrontal and posterior transverse temporal regions shown as box plots. (a) Medial orbitofrontal SUVR (p = 0.50) and (b) Posterior transverse temporal SUVR (p =0.39), illustrating the differences between Positivity 0 and Positivity 1 in each region.

Performance analysis across diagnostic groups

To evaluate how our models perform across different cognitive status groups, we conducted a subgroup analysis stratifying subjects into NC, MCI, and dementia groups. Table 4 presents the classification performance metrics across these diagnostic categories.

Table 4 Classification performance of different models across diagnostic groups. The table presents accuracy, ROC AUC, recall, precision, and F1 scores in NC, MCI, and dementia groups.

In the NC group, while overall accuracy remained relatively high (mostly around 88%), the models showed variable performance in predicting Aβ positivity. Random Forest, XGBoost, and K-Nearest Neighbors achieved the highest accuracy at 88%, though with moderate recall (0.50) and precision (0.75) scores.

The MCI group demonstrated notably strong classification performance. Most classifiers achieved accuracy above 90%, with Gradient Boosting, Random Forest, and XGBoost showing particularly robust performance (95% accuracy). These models also maintained high recall (0.95) and precision (0.96), resulting in F1 scores of 0.94.

In the dementia group, the models showed the highest classification performance, with Gradient Boosting and Decision Tree achieving perfect accuracy (100%) and maximum scores across all metrics (recall, precision, and F1 score of 1.00). Other classifiers like Random Forest also performed well (89% accuracy, F1 score 0.87).

Early-phase PET model performance for predicting cognitive status

The performance of classifiers in distinguishing cognitive status groups (MCI, dementia) from early-phase PET information in the total group is shown in Table 5. Most classifiers showed accuracy above 80%, with Decision Tree achieving the highest accuracy at 83.33%. F1 scores also showed good performance across most classifiers, ranging from 0.82 to 0.90.

Table 5 This table summarizes the effectiveness of different machine learning classifiers for predicting cognitive status using early-phase PET only. The metrics include accuracy, AUC, recall, precision, and F1 score for each model.

In the subgroup analysis by Aβ status, the Aβ-positive group showed generally higher classification performance (Supplemental Table 2). Most classifiers achieved accuracy above 88%, with Random Forest, Naive Bayes, XGBoost, Bagging Classifier, and K-Nearest Neighbors achieving 100% Recall. F1 scores were excellent, ranging from 0.90 to 0.94.

The Aβ-negative group showed relatively lower classification performance (Supplemental Table 3). Accuracy ranged from 40–65%, with Naive Bayes showing the highest accuracy at 65%. ROC AUC ranged from 0.41–0.62, with K-Nearest Neighbors showing the highest at 0.62. F1 scores ranged from 0.45–0.72, showing lower performance compared to the other group.

In an additional analysis incorporating amyloid burden (global SUVR measured from delayed-phase PET) with early-phase PET features, the Decision Tree classifier exhibited the highest accuracy of 86.11%, while other classifiers showed performance similar to or slightly improved compared to the early-phase PET-only model (Supplemental Table 4). Feature importance analysis revealed that the amyloid burden variable had the greatest importance (0.0902) followed by early-phase PET features in predicting cognitive status (Supplemental Fig. 2).

Discussions

In this study, we developed and validated machine learning models for predicting Aβ positivity using two different approaches: early-phase PET only and a combined model of early-phase PET with clinical data. Additionally, we investigated the utility of early-phase PET in predicting cognitive status. The early-phase PET only model achieved moderate performance in Aβ positivity prediction, with Random Forest showing the highest accuracy of 80.56% among various classifiers. The combined model of early-phase PET and clinical data showed superior performance in Aβ positivity prediction, reaching 88.89% accuracy using Gradient Boosting, suggesting synergistic effects of multimodal features. For cognitive status prediction, early-phase PET demonstrated robust performance with most classifiers achieving accuracy above 80%. Notably, Decision Tree achieved the highest accuracy at 83.33% with excellent recall (0.90) and precision (0.90). Interestingly, subgroup analysis revealed that prediction performance varied by Aβ status - the model showed particularly strong performance in Aβ-positive subjects (accuracy > 88%) compared to Aβ-negative subjects (accuracy 40–65%). This differential performance suggests that cognitive status in Aβ-positive cases may be more tightly coupled with characteristic patterns of metabolic change detectable in early-phase imaging. This finding not only validates the clinical utility of early-phase PET but also provides insights into the relationship between amyloid pathology, metabolism, and cognitive status. This performance compares favorably to previous studies. Using a machine learning approach, Park et al. achieved 81.1% accuracy using a logistic regression classifier with MMSE scores and regional MRI volumes as features22. Kim et al. reported 73.3% accuracy using a deep learning-based model using FDG PET data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Korean Brain Aging Study for the Early Diagnosis and Prediction of AD (KBASE) cohorts11. Alongi et al. achieved 79.56% accuracy using discriminant analysis to predict amyloid positivity13. In this study, we proposed a high-performance model by utilizing both early-phase FBB PET and clinical data, while also demonstrating its utility in cognitive status prediction.

Early-phase PET random forest model revealed that medial temporal lobe structures were predominant among the top features for predicting Aβ positivity. The hippocampus showed the highest importance (0.086), followed by the isthmus of cingulate (0.051), entorhinal cortex (0.038), parahippocampal (0.031), and temporal pole SUVR (0.029). Moreover, these regions showed significantly lower SUVR values in the amyloid positivity group compared to the negative group, particularly in hippocampus, isthmus of cingulate, and entorhinal cortex (Fig. 2). This finding is particularly significant as neurofibrillary tangles in the entorhinal cortex and hippocampus are the cardinal neuropathological features of AD23. Our results are consistent with recent findings by Kang et al., who reported that hypometabolism in bilateral hippocampus and entorhinal cortex on FDG-PET was independently associated with AD, suggesting that entorhinal-hippocampal hypometabolism could be a characteristic feature of AD-related neurodegeneration24. Alongi et al. reported that hypometabolism in the limbic area, which includes the hippocampus, para-hippocampal gyrus, and entorhinal cortex, showed the best performance in predicting amyloid PET positivity on FDG PET12. Rasi et al. revealed that radiomics features, particularly the Short Run Low Grey Level Emphasis feature, which measures the distribution of short runs of low grey-level values in an image, extracted from the hippocampus, inferior parietal lobule, and isthmus cingulate regions played a significant role in predicting amyloid positivity13. Additionally, the isthmus of cingulate, which connects the posterior cingulate cortex to the parahippocampal gyrus, ranked as a feature of considerable importance in our amyloid positivity prediction model. Such findings regarding the isthmus of cingulate align with previous studies highlighting the significance of the posterior cingulate region in the process of amyloid accumulation: notably, the posterior cingulate cortex is well-established as a hub of amyloid deposition in AD, being particularly vulnerable to early accumulation of amyloid β-protein, with reports indicating that amyloid deposition in this region is associated with alterations in functional connectivity25,26.

The combined early-phase PET and clinical data model achieved a higher accuracy of 88.89% for predicting Aβ positivity, which is a considerable improvement over the early-stage PET-only model at 80.56%. APOE genotype emerged as the most significant predictor (0.2485), followed by medial orbitofrontal SUVR (0.0996) and hippocampus SUVR (0.0663). The high importance of APOE genotype in the combined model reflects its established role as a crucial risk factor for Aβ accumulation27. The contribution of APOE genotype in constructing Aβ positivity prediction models has been confirmed in several studies. Hwang et al. reported that the best-performing machine learning model combining MRI-based ROI and APOE achieved an accuracy of 0.798 for predicting Aβ positivity28. Tosun et al. also demonstrated that the joint prediction performance of MRI features and APOE genotype was significantly better than when each predictor was used individually29. Following APOE genotype, the medial orbitofrontal cortex and hippocampal SUVR emerged as the most important features, both of which are subregions of the limbic system. Nestor et al. found a significant decrease in metabolism across the limbic structural network in mild AD patients30. The role of limbic damage in AD has been suggested to be closely related to memory impairment, but the exact role and mechanism are still controversial and have not been fully elucidated. However, the significant contribution of the limbic subregions identified in our study (medial orbitofrontal cortex and hippocampus) to the Aβ positivity prediction model suggests that limbic metabolic decreases may be strongly associated with Aβ accumulation in AD.

Our subgroup analysis across diagnostic groups revealed interesting patterns in the performance of our models in predicting Aβ positivity. The NC group, while maintaining relatively high overall accuracy (88%), showed limited performance in predicting Aβ-positive cases, with moderate recall (0.50) and precision (0.75) scores. This could be attributed to more subtle metabolic changes in cognitively normal individuals, where early pathological changes may not yet have manifested as distinct metabolic patterns. However, for the NC group, validation in a larger cohort would be necessary given the particularly small number of Aβ-positive cases in our study. In contrast, the MCI group demonstrated notably robust classification performance. Most classifiers achieved accuracy above 90%, with Gradient Boosting, Random Forest, and XGBoost showing particularly strong performance (95% accuracy). This suggests that metabolic alterations associated with Aβ pathology may be more pronounced and distinctive in the MCI stage, potentially reflecting a critical period where pathological changes are sufficiently established to create recognizable patterns while cognitive function is not yet severely impaired. The dementia group showed the highest classification performance, with Gradient Boosting and Decision Tree achieving perfect accuracy (100%) and maximum scores across all metrics. This exceptional performance likely reflects more advanced pathological changes manifesting as distinct metabolic patterns in established dementia. The strong predictive performance in both MCI and dementia groups suggests that our model is particularly effective in stages where cognitive impairment is present, possibly due to more pronounced metabolic alterations associated with disease progression.

The machine learning models employed in this study reflect notable technical advancements tailored to the specific challenges of predicting Aβ positivity. By systematically applying a diverse array of classifiers—including Random Forest and Gradient Boosting—each optimized through rigorous hyperparameter tuning, the study achieved robust predictive performance across two distinct approaches: early-phase PET and a combined model. These methods were carefully chosen based on their suitability for handling structured medical data with varying feature types, ensuring accurate and interpretable outcomes.

Our study’s methodological approach emphasizes the importance of feature selection and importance analysis in identifying key predictors of Aβ pathology. While the relevance of medial temporal lobe structures and APOE genotype in AD is well-established, our approach demonstrates how these known factors can be systematically identified and prioritized using machine learning techniques. This method not only confirms previous findings but also provides a reproducible framework for analyzing complex, multimodal datasets in AD research. The significance of our work lies in its reproducibility and adaptability. By utilizing standard machine learning techniques and publicly available tools, we have developed a methodological framework that can be easily replicated and extended to incorporate additional biomarkers or applied to other neurodegenerative diseases. This approach enhances model interpretability and provides a foundation for developing more sophisticated clinical decision support systems. Furthermore, our study showcases the potential of machine learning in refining early diagnostic processes for AD. By systematically identifying and prioritizing key features across different modalities (imaging, genetic, and clinical data), we provide a data-driven approach to understanding the multifaceted nature of Aβ pathology. This methodology could be particularly valuable in personalizing interventions and improving risk stratification in AD research and clinical practice.

This study utilized dual-phase amyloid PET protocols to demonstrate the potential of combining early-phase perfusion data, which strongly correlates with FDG PET metabolism information, with clinical indicators to predict amyloid status. In particular, early-phase amyloid PET offers the advantage of simultaneously acquiring metabolism and amyloid information from a single scan, while most previous FDG PET-based prediction models have relied on FDG and amyloid PET scans acquired at different timepoints11,12,13. This approach has important implications as our findings suggest that similar predictive models could be developed using FDG PET, which is more widely available and cost-effective than amyloid PET. Moreover, the early-phase data from amyloid PET could potentially serve additional clinical purposes beyond amyloid status prediction, such as predicting cognitive status trajectories.

The dual-phase protocol offers practical advantages by enabling acquisition of both metabolism and amyloid information from a single radiotracer injection, reducing costs and radiation exposure compared to separate amyloid and FDG PET scans. By developing models that maximize the utility of early-phase data, we aim to extract multiple clinically valuable insights from these scanning sessions, enhancing the overall value proposition of amyloid PET imaging while potentially informing future FDG PET-based prediction approaches.

There have been a growing body of compelling evidence for a strong correlation between early-phase amyloid PET and FDG PET metabolism information. Boccalini et al.17 showed that early-phase Florbetapir and Flutemetamol images provide similar diagnostic performance to FDG PET in discriminating AD patients. Peretti et al.15 found a high correlation between early PIB perfusion and FDG metabolism in AD. Hsiao et al.16 reported that early-phase Florbetapir perfusion correlated well with FDG metabolism, particularly in regions vulnerable to AD. Son et al. investigated the clinical utility of early-phase FBB PET as an alternative to FDG PET31. They found significant correlations between early-phase FBB and FDG PET images across all brain regions, suggesting early-phase FBB PET could potentially be a surrogate biomarker for dementia diagnosis. These consistent findings provide robust evidence for the tight coupling between FDG PET metabolism and early-phase amyloid PET perfusion. By leveraging this relationship, our study explores the potential of developing prediction models that could potentially be translated to the more widely available and cost-effective FDG PET imaging.

As mentioned in the introduction, various amyloid-specific radiotracers are widely used in clinical practice, and our study utilized FBB. Further validation is needed to determine whether our model can be applied to early-phase images of other amyloid tracers, such as Florbetapir and Flutemetamol, as these tracers may exhibit slight differences in their binding properties and kinetics despite targeting the same pathological hallmark of AD, namely Aβ plaques32. Such validation would enhance the generalizability and clinical utility of our approach. Nonetheless, considering the high extraction fraction of lipophilic radiotracers into the brain33, which is common among these tracers, we anticipate that our model, which relies on early-phase amyloid PET image features reflecting cerebral perfusion and metabolism, would likely demonstrate similar predictive capabilities across different amyloid imaging agents.

Our study has several limitations. First, we did not perform cross-validation between early-phase PET and FDG PET. However, many studies have already demonstrated that these imaging modalities provide similar information on cerebral blood flow and metabolism. Second, our machine learning models were trained using early-phase PET data collected from a single center with a relatively small sample size. Unlike FDG PET, which benefits from large databases such as ADNI, early-phase PET data for amyloid imaging currently lacks such an extensive multi-center repository. Therefore, future validation in a large-scale multi-center study will be necessary to confirm the generalizability and robustness of our findings. Lastly, from a technical perspective, although we employed robust classifiers and hyperparameter tuning, these models, particularly when handling high-dimensional datasets, are susceptible to reduced performance when applied to larger, more heterogeneous populations. Furthermore, the absence of external validation using multi-center datasets limits our ability to assess the model’s robustness in real-world clinical settings. Addressing these technical limitations through future work with larger datasets and multi-center collaborations will be essential for ensuring the clinical utility and scalability of the proposed approach.

Conclusion

This study demonstrates the effectiveness of machine learning models for predicting Aβ positivity, utilizing early-phase PET imaging and a combined approach that intergrates early-phase PET with clinical data. The models trained on early-phase PET alone achieved moderate accuracy, with Random Forest showing the best performance and identifying medial temporal lobe structures, such as the hippocampus and entorhinal cortex, as critical predictors. These findings align with established neurodegenerative changes associated with Alzheimer’s disease.

Notably, the combined model not only improved predictive accuracy but also provided insights into the complementary roles of genetic, imaging, and cognitive features in Alzheimer’s pathology. These results reinforce the potential of machine learning to enhance early diagnostic capabilities and personalized interventions in Alzheimer’s disease. Furthermore, the framework developed in this study offers adaptability for broader applications, including incorporating additional biomarkers and extending to other neurodegenerative disorders.