XGBoost models based on non imaging features for the prediction of mild cognitive impairment in older adults

Fernández-Blázquez, Miguel A.; Ruiz-Sánchez de León, José M.; Sanz-Blasco, Rubén; Verche, Emilio; Ávila-Villanueva, Marina; Gil-Moreno, María José; Montenegro-Peña, Mercedes; Terrón, Carmen; Fernández-García, Cristina; Gómez-Ramírez, Jaime

doi:10.1038/s41598-025-14832-0

Download PDF

Article
Open access
Published: 13 August 2025

XGBoost models based on non imaging features for the prediction of mild cognitive impairment in older adults

Miguel A. Fernández-Blázquez¹,
José M. Ruiz-Sánchez de León²,
Rubén Sanz-Blasco²,
Emilio Verche³,
Marina Ávila-Villanueva¹,
María José Gil-Moreno^4,5,
Mercedes Montenegro-Peña^2,6,
Carmen Terrón^7,8,
Cristina Fernández-García⁹ &
…
Jaime Gómez-Ramírez¹⁰

Scientific Reports volume 15, Article number: 29732 (2025) Cite this article

1591 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

The global increase in dementia cases highlights the importance of early detection and intervention, particularly for individuals at risk of mild cognitive impairment (MCI), a precursor to dementia. The aim of this study is to develop and validate machine learning (ML) models based on non-imaging features to predict the risk of MCI conversion in cognitively healthy older adults over a three-year period. Using data from 845 participants aged 65 to 87 years, we built five eXtreme Gradient Boosting (XGBoost) models of increasing complexity, incorporating demographic, self-reported, medical, and cognitive variables. The models were trained and evaluated using robust preprocessing techniques, including multiple imputation for missing data, Synthetic Minority Oversampling Technique (SMOTE) for class balancing, and SHapley Additive exPlanations (SHAP) for interpretability. Model performance improved with the inclusion of cognitive assessments, with the most comprehensive model (Model 5) achieving the highest accuracy (86%) and area under the curve (AUC = 0.8359). Feature importance analysis revealed that variables such as memory tests, depressive symptoms, and age were significant predictors of MCI conversion. In addition, an online risk calculator has been developed and made available free of charge to facilitate clinical use and provide a practical, cost-effective tool for early detection in diverse healthcare settings (https://aimar-project.shinyapps.io/MCI-risk-calculator/). This study highlights the potential of non-imaging ML models for early detection of MCI and emphasizes their accessibility and clinical utility. Future research should focus on validating these models in different populations and examining their integration with personalized intervention strategies to reduce dementia risk.

Differential diagnosis of neurodegenerative dementias with the explainable MRI based machine learning algorithm MUQUBIA

Article Open access 13 October 2023

A multimodal machine learning model for predicting dementia conversion in Alzheimer’s disease

Article Open access 29 May 2024

Cognitive performance classification of older patients using machine learning and electronic medical records

Article Open access 24 February 2025

Introduction

The progressive aging of the global population is a phenomenon driven by increased life expectancy and declining birth rates, leading to a demographic shift with significant implications for public health systems worldwide and the maintenance of well-being and quality of life standards. According to the United Nations (2020), the number of people aged 65 years or older is expected to double from 703 million in 2019 to over 1.5 billion in 2050¹, representing a growing portion of the population vulnerable to age-related diseases, including dementia, and in particular Alzheimer’s disease (AD). Dementia affects millions of individuals worldwide, leading to significant cognitive decline and impacting daily functioning and quality of life². It not only causes a progressive and irreversible deterioration of the patient’s mental faculties but is also the leading cause of dependency and, collaterally, represents a substantial economic, social, and personal cost. Its prevalence is expected to rise substantially, with estimates suggesting that the number of people with dementia will triple in the next 30 years, from 50 million to 152 million by 2050³. The situation is even more concerning considering that this type of dementia does not appear to be reversible once diagnosed, and there have been no truly significant scientific advances in this regard for over 20 years. However, the onset of dementia does not occur suddenly; on the contrary, the initial pathophysiological and cognitive changes associated with dementia, although subtle, can be observed up to two decades before clinical diagnosis in the absence of clearly perceptible symptoms^4,5. Therefore, there is a large window of time in which it is possible to identify individuals at risk of future dementia and then subject them to personalized prevention programs. The problem is that there are currently no accurate tools for early detection of these individuals in preclinical stages. It is precisely these pre-dementia stages that are of utmost importance. Implementing effective therapeutic interventions at these stages could potentially halt or even reverse disease progression⁶.

Apart from age, which is the main known risk factor for the development of dementia, potentially modifiable risk factors have been identified that show a consistent association with AD and other related dementias. Precisely, the Lancet Commission on Dementia Prevention, Intervention, and Care has been studying this issue in the last years. Since its inception in 2017, the Commission has assessed evidence related to potentially modifiable dementia risk factors, synthesizing scientific data from various meta-analyses and intervention studies. So, the 2017 report found the potential for dementia risk reduction by addressing twelve established risk factors across the lifespan: low education, hearing loss, hypertension, smoking, obesity, depression, physical inactivity, diabetes, excessive alcohol consumption, traumatic brain injury, air pollution, and social isolation⁷. The recent 2024 report reaffirms the role of the previously identified risk factors and adds two more: midlife elevated LDL cholesterol and untreated vision loss⁸. This finding means that through a combination of factors, it would be possible to identify individuals at high risk of developing dementia very early. This combination of factors could be non-linear and, therefore, not intuitive. For example, unlike in younger age groups, from the age of 85, elevated levels of hemoglobin A1c⁹ cholesterol¹⁰ or homocysteine¹¹ are associated with a lower risk of dementia. Thus, to unravel the role and actual weight of each factor in the onset of dementia, computational techniques based on artificial intelligence (AI) should be used, as they allow for handling massive data and developing dynamic algorithms that help trace personalized evolutionary trajectories¹².

Recent advancements in AI and machine learning (ML) offer promising alternatives for early dementia diagnosis using non-imaging-based features. ML has emerged as a powerful tool in medical diagnostics, offering the potential to analyze complex datasets and identify patterns that may not be apparent to human observers. In the context of dementia diagnosis, ML algorithms can be trained on large datasets containing demographic, cognitive, socioeconomic, clinical information, lifestyles, and neuroimaging to identify patterns indicative of dementia and develop predictive models. By leveraging ML, it is possible to develop diagnostic tools that are not only accurate but also cost-effective and easily integrated into clinical practice to enable early intervention in dementia¹³. In fact, in the last few years, several studies have been published that build predictive models using AI and ML to identify individuals at high risk of dementia. These risk prediction models have been developed primarily based on a combination of cognitive test scores, sociodemographic, health, lifestyle, and environmental variables, with varying levels of accuracy in distinguishing outcomes¹⁴. For example, Barnes et al. (2009) developed one of the first models to consider a late-life dementia risk index¹⁵. The model used logistic regression on 3,375 subjects and features such as demographics, cognition, physical activity, neuroimaging, social behavior, among others, to identify individuals at risk of dementia within six years. Exalto et al. (2013) reported a Cox proportional hazards model using a combination of 45 predictors from self-reported data, medication, laboratory data, and clinical histories in a cohort of 29,961 individuals with type 2 diabetes for a 10-year dementia risk¹⁶. Nori et al. (2019) used ML with a large dataset including over 10,000 features to predict 5-year dementia risk¹⁷. A more recent model presented a multi-diagnostic approach using structural magnetic resonance imaging (MRI) and ML to diagnose mild cognitive impairment (MCI) and AD¹⁸. The researchers used data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS), training classifiers, such as support vector machines (SVMs) and neural networks. The classifiers demonstrated high accuracy, with the “healthy controls (HC) vs. AD” classifier achieving a balanced accuracy of 90.6%.

Although the results of these models are promising, the major limitation is that it may be very difficult for a clinician to apply the algorithms as their availability via online applications is quite limited. In addition, another weakness is that it may not be possible to collect a large set of variables for each patient, which usually include MRIs, cognitive tests, self-reported data, laboratory results, etc. A final limitation is that most ML algorithms rely on imaging techniques such as MRI and positron emission tomography (PET), which may not be readily available to physicians, especially in primary care. On the contrary, incorporating non-imaging features such as demographic information, cognitive test results, clinical history and lifestyles are often easier and less expensive to collect compared to imaging data, although the relationship between all of these features is not yet fully understood.

Objectives of our study

The main goal of the study was to develop ML models to predict MCI in people over 65 years of age over a three-year period. For this purpose, only non-imaging features such as demographic, clinical and cognitive variables as well as lifestyle data were analyzed. All of this data can be collected relatively easily through a short clinical interview and the use of specific cognitive tests. A total of five prediction models were developed, which included a smaller or larger number of variables depending on the ease of collecting information.

The specific objectives were:

1.
To identify the most relevant features for MCI diagnosis.
2.
To develop and train five ML models using these features.
3.
To evaluate the performance of the models using multiple metrics and selecting the best hyperparameters.
4.
To implement the most powerful models as an online diagnostic tool and make them available to primary and specialized healthcare professionals to assist them in data-driven clinical decision-making.

Developing an online ML diagnostic tool for MCI offers several potential benefits for physicians. First, it can improve the accessibility and affordability of early dementia diagnosis, particularly in low-resource settings. Second, it can facilitate early intervention and management, potentially slowing the progression of the disease and improving patient outcomes. Finally, it can reduce the burden on healthcare systems by enabling more efficient and cost-effective screening and diagnosis.

Methods

In this study, we adopted a robust machine learning framework that leverages the eXtreme Gradient Boosting (XGBoost) algorithm to predict conversion to MCI. As shown in Fig. 1, the methodology encompasses several critical stages, including data split, data preparation, feature engineering, model training with hyperparameter optimization, evaluation, and interpretability analysis. We built all of our machine-learning models in R v.4.2.1.

Data source

The participants of this study comprised 845 cognitively healthy individuals aged 65 to 87 years at baseline, living at home, without relevant psychiatric, neurological or systemic disorders. All of them were part of two equivalent datasets from two previous funded projects namely Elderly Relatives of Alzheimer’s Disease (ERAD): Influence of Subjective Cognitive Decline and Factors associated with healthy and pathologically aging in a sample of elderly people over 90 in the city of Madrid (MADRID + 90). The main goal of the two projects was to both improve the early detection and prevention of MCI and dementia and to increase knowledge about the determinants of healthy aging¹⁹.

After signing the informed consent, all participants underwent a detailed assessment protocol, which was usually completed within four hours with appropriate breaks. The full assessment included a clinical semi-structured interview and a neurological and neuropsychological examination. Therefore, our dataset contains information on a variety of variables grouped in five categories (Supplementary Table S1): Sociodemographic, Self-reported from a clinical interview, Medical examination, and Cognitive assessment.

The participants were followed for three years. Clinical diagnoses were always agreed upon in consensus discussions between neurologists and neuropsychologists. When available, MRIs were considered to exclude the presence of macroscopic lesions or significant vascular damage. Each participant was then independently diagnosed after each visit based on their age, gender, cognitive reserve, functional information, and cognitive scores. However, the diagnoses were not based on psychometrically stable thresholds values, but rather on clinical impressions. The NIA-AA’s criteria were used to diagnose core MCI²⁰ which included: (i) self- or informant-reported cognitive complaints; (ii) objective evidence of impairment in one or more cognitive domains; (iii) preserved independence in functional abilities; and (iv) not demented. All participants were diagnosed as cognitively healthy at baseline.

Data preprocessing

First, the dataset was divided into training and testing sets by 80% and 20%, respectively. Then, essential R packages were loading, including xgboost²¹ for model training, ParBayesianOptimization²² for hyperparameter tuning, caret²³ for data partitioning, pROC²⁴ for performance metrics, ggplot2²⁵ for visualization, and additional packages such as mice²⁶ smotefamily, and ROSE²⁷ for data preprocessing and balancing the training dataset. A seed value of 42 was set to ensure that our results were reproducible across different runs.

The dataset included various features along with the target variable, mci_conversion, which was transformed into a factor with levels “No” and “Yes” to represent non-conversion and conversion to MCI, respectively. A methodology was used to clean and prepare the features, which included the identification of numerical and categorical variables. We then conducted a preliminary analysis of the baseline data to find out their distribution and examine the nature and distribution of missing values. Then, we performed a multiple imputation procedure under a fully conditional specification method to impute values that are as close as possible to the ideal predicted observations. These imputed values were generated based on existing variables. The imputation procedure replaced each missing observation with a set of plausible values representing the uncertainty about the appropriate value to be imputed. The process was repeated five times, yielding the corresponding five data sets, the coefficients of which varied from set to set. The imputed data sets were analyzed using the usual full data procedure. Distributions of imputed and observed values were compared using density plots, showing similar patterns across all variables. Convergence was monitored across − 20 iterations and confirmed using a convergence threshold of 1e-4. In addition, a sensitivity analysis comparing results across different imputation models showed comparable conclusions, supporting the robustness of our approach. Finally, the results of these analyses were combined to draw valid statistical inferences of the data. The mice package for R was used to perform the multiple imputation procedure²⁶. No normalizations of variables were applied because the tree models in the various experiments did not show significant improvements when these transformations were added to the pipeline.

Selection of variables

A two-step process was used to select those variables that finally were included in the prediction models. First, a set of variables were selected if scientific evidence and expert knowledge give them a relevant role in the diagnosis of MCI (e.g. age, education, SCD, etc.). Second, a permutational feature importance method was then applied so that two random variables were generated and added to the original dataset. Next, the model was trained and the feature importance was calculated. Variables were selected if their feature importance exceeded that of at least one of the previously included random variables.

Class imbalance

Our dataset exhibited class imbalance, with a higher proportion of cognitively healthy stable individuals compared to MCI converters. To address this class imbalance, we decided to apply the Synthetic Minority Oversampling Technique (SMOTE) on the training set, a widely used method in machine learning for generating synthetic samples to balance the distribution of classes. This technique generates synthetic samples for the minority class to balance the class distribution and thereby improve the performance of classifiers. The SMOTE algorithm enables the generation of new, unique samples and reduces the risk of overfitting using the nearest neighbors technique.

Model training

After experimenting with several machine-learning algorithms (i.e., random forest, logistic regression, and gradient boosting), our choice was XGBoost, a high-performance supervised learning algorithm based on decision trees, designed to optimize both speed and performance. It is based on the principle of boosting, a technique that combines multiple weak models (in this case, decision trees) to create a stronger model²¹. The key idea is to sequentially add trees to the model, where each new tree aims to correct the errors made by the previous trees. This iterative process continues until a specified number of trees are added or the model’s performance stops improving. As compared to other gradient-boosting algorithms, previous literature has found that XGBoost provides an optimal balance between accuracy and training speed²⁸.

We built five XGBoost models with the goal of predicting MCI conversion over a three-year period. The XGBoost classifier had two potential classes eligible for its output: stable cognitively healthy individuals and MCI converters. The reasons for developing the five prediction models, which incorporate increasing complexity and a larger number of variables, are described in detail below. Initially it was considered appropriate to design predictive models based on the potential availability of necessary information. Some features can be easily and directly obtained from a basic clinical interview conducted by any healthcare professional. However, other features require more effort and a certain level of clinical knowledge, as they require either a short medical examination or the application of specific cognitive tests. By integrating all these possibilities, the development of the following five models was deemed appropriate:

Model 1: Demographics and self-reported features.
Model 2: Demographics, self-reported and medical features.
Model 3: Demographics, self-reported and cognitive screening features.
Model 4: Demographics, self-reported, medical and cognitive screening features.
Model 5: Demographics, self-reported, medical, cognitive screening and further cognitive assessment features.

Hyperparameter optimization and cross-validation

Hyperparameter optimization plays a crucial role in machine learning model development, particularly for complex algorithms like XGBoost, which contains over thirty configurable hyperparameters. These parameters significantly influence model performance and must be carefully tuned to account for the unique characteristics of each dataset. Our study implemented a comprehensive optimization and validation strategy that integrated Bayesian optimization with robust cross-validation methodology.

The optimization process was conducted using the ParBayesianOptimization package, with an objective function designed to maximize the area under the receiver operating characteristic curve (AUC). The process began with an initial data split of 80–20 for training and testing sets, respectively. Within the training phase, we implemented 5-fold cross-validation to ensure robust performance estimation during hyperparameter optimization. Our optimization strategy focused on six key hyperparameters selected for their significant impact on model complexity and generalization ability: maximum depth (ranging from 3 to 10), learning rate (between 0.01 and 0.3), gamma (from 0 to 0.2), column sampling ratio (between 0.6 and 1.0), minimum child weight (from 1 to 10), and subsample ratios (between 0.6 and 1.0).

The Bayesian optimization process consisted of 20 total iterations, beginning with an exploration phase of 10 random configurations to establish a baseline understanding of the hyperparameter space. The subsequent optimization phase was guided by the “expected improvement” acquisition function, systematically exploring promising regions of the parameter space. The objective function implementation used ‘binary: logistic’ with AUC as the evaluation metric, returning the maximum AUC score achieved during cross-validation for each parameter combination evaluated.

To ensure model robustness and prevent overfitting, several safeguards were implemented. First, early stopping mechanism that halted training if no improvement was observed after 10 rounds. Second, parameter validation checks that returned -Inf for invalid configurations, steering the optimization toward feasible solutions. Third, bootstrap validation (n = 1000) to calculate 95% confidence intervals for all performance metrics. Fourth, comprehensive cross-validation during the optimization process to ensure reliable performance estimation. Thus, this integrated approach to hyperparameter optimization and validation ensured both optimal model performance and reliable generalization capability. The combination of Bayesian optimization with rigorous cross-validation provided a robust framework for model development while maintaining computational efficiency.

Final model training and evaluation

Upon identifying the optimal hyperparameters, the final XGBoost model was trained. The model was configured using binary logistic loss as the objective function for training. This loss function transforms the model’s outputs through a logistic function to produce probabilities between 0 and 1, making it appropriate for our binary classification task of predicting MCI conversion. The model’s performance was evaluated using the AUC metric, which measures the model’s ability to distinguish between the two classes. The model was configured with a maximum limit of 100 boosting rounds, with the actual number of iterations determined dynamically by the model’s performance on the validation set. Predictions on the test set were generated, and a threshold of 0.5 was applied to classify predictions as “Yes” or “No”. The model’s performance was evaluated using a confusion matrix to compare predicted classifications against actual outcomes. Key performance metrics, including accuracy, precision, recall, F1 score, and AUC, were calculated to provide a comprehensive evaluation of the model’s predictive power. The ROC curve, plotted using ggplot2 package, visualized the trade-off between true positive rate (sensitivity) and false positive rate (1-specificity).

Feature importance and SHAP analysis

XGBoost classifiers also allow for individual input variable importance to be evaluated, providing insights into which features had the most significant impact on model predictions. To further enhance model interpretability, we conducted feature selection using SHapley Additive exPlanations (SHAP)²⁹a unified, model-agnostic framework for evaluating variable importance at both the model and individual prediction levels. SHAP values, which offer a detailed view of how individual features contribute to each prediction, were calculated using the SHAPforxgboost package³⁰. The SHAP summary plot depicted the distribution of SHAP values across all features, highlighting those with the most substantial impact on predictions. This analysis not only aids in validating the model’s decisions with domain knowledge but also enhances transparency and trust in the model’s predictions.

Ethics approval

Ethical approval for the ERAD and MADRID + 90 studies was granted by the Research Ethics Committee of the Hospital Universitario Clínico San Carlos (Madrid, Spain) and the Ethics Committee of the Carlos III Institute of Health (Madrid, Spain), respectively. Written informed consent was obtained from all participants. The authors confirm that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the 1964 Helsinki Declaration and its later amendments. All methods were performed in accordance with the relevant guidelines and regulations.

Results

The study commenced with an examination of baseline characteristics between cognitively stable individuals and those who eventually converted to MCI over time (14.2% of the sample). As depicted in Table 1, the baseline characteristics were analyzed using t-tests for continuous variables and chi-square tests for categorical variables. These assessments provided foundational insights into the demographic and clinical attributes differentiating the two groups at the study’s onset.

Table 1 Baseline characteristics of the sample for cognitively healthy individuals and MCI converters over time.

Full size table

Preprocessing and selection of variables

The original dataset comprised 219 variables, necessitating substantial preprocessing and cleaning to enhance analytical clarity and model efficiency. A total of 148 variables were excluded due to a high incidence of missing values, redundancy, or lack of informativeness. This reduction process resulted in a refined dataset containing 71 variables. These underwent multiple imputation to handle missing data.

An important step in preprocessing was the application of the permutational feature importance method, shown in Fig. 2. This technique prioritized features based on their predictive power and revealed cognitive variables, particularly immediate free recall on the Free and Cued Selective Reminding Test (FCSRT), as very informative. Integration of demographic, self-reported, and medical parameters was also found to be crucial in predicting MCI risk. Expert knowledge and these analytical insights guided the selection of 29 features for building of the predictive models, as detailed in Table 2. In the table, these features are categorized into demographic, self-reported, medical, cognitive screening and further cognitive assessment groups. Model 1, the simplest, uses 13 features from the demographic and self-reported categories. In contrast, Model 5, the most complex, includes all 29 features and effectively integrates demographic, self-reported, medical, and complete cognitive data.

Table 2 List of features used to build the five predictive models grouped by their typology.

Full size table

Models performance

The study developed five predictive models, each with different levels of complexity, to predict the conversion to MCI. To this end, we introduced a robust machine learning framework using XGBoost classifiers. The performance of each model was improved through careful hyperparameter tuning. The process included optimization of parameters such as maximum depth, learning rate (eta), gamma, column sampling ratio, minimum child weight, and subsample ratios. These parameters were fine-tuned using Bayesian optimization, which efficiently explored the hyperparameter space to find the optimal settings for each model configuration. Table 3 shows the best combination of hyperparameters for each model.

Table 3 Best combination of hyperparameters by each model.

Full size table

As shown in Table 4, Model 1 achieved 82.25% accuracy, 93.10% recall, 87.10% precision, 89.99% F1-score, and an AUC of 0.6687 across all models, indicating solid underlying performance. Model 2, which included additional medical parameters, showed slight improvements with an accuracy of 82.30%, a recall of 92.41%, a precision of 87.58%, an F1-score of 89.90%, and an AUC of0.7293, reflecting improved sensitivity and discrimination ability. Model 3, which included cognitive screening, showed an accuracy of 82.35%, recall of 89.66%, precision of 89.66%, F1-score of 89.66%, and AUC of0.7682, highlighting the crucial influence of cognitive features. Model 4, which included medical and cognitive screening measures, showed a better general performance with an accuracy of 82.84%, a recall of 86.84, precision of 93.62%, F1-score of 89.93%, and AUC of 0.7744. Finally, Model 5, the most comprehensive, achieved the highest metrics with an accuracy of 86.39%, recall of 90.13%, precision of 94.48%, F1-score of 92.22%, and AUC of0.8359, demonstrating superior balance and robustness in predictive performance by integrating a full spectrum of demographic, medical, and cognitive data. Confusion matrices and ROC curves for all models are shown in the Supplementary Fig. S1.

Table 4 Predictive performance of the five models for MCI prediction.

Full size table

Model validation and overfitting assessment

To ensure robust generalization of our models, we conducted comprehensive analyses using training and validation loss curves. The pattern of misclassifications errors for learning curves revealed low training errors at the beginning followed by stabilization while the validation performance was improving as sample sizes increased, obtaining final validation errors ranging from 0.10 to 0.15. This approach demonstrated that our models maintain their predictive power when applied to new, unseen data. Detailed visualizations and metrics are provided in the Supplementary Fig. S2.

Feature importance and SHAP values

The SHAP value analysis revealed distinct patterns of feature importance across five increasingly complex models. As shown in Fig. 3, GDS emerged as the dominant predictor across Models 1–4, demonstrating consistent and robust influence in MCI prediction. Age, SCD, and family history of AD also maintained notable relevance across different model configurations. Clinical self-reported parameters demonstrated model-specific importance (e.g. hyperlipidemia in Model 1, hypertension in Model 2, diabetes in Model 4), but their influence lacked consistency across different model configurations.

In Model 1, beyond GDS’s prominence, hyperlipidemia showed significant contribution among self-reported factors. Model 2, incorporating medical variables, showed right-hand tapping emerged as the most influential medical variable. Hypertension also exhibited notable contribution in this model configuration. However, the introduction of cognitive screening variables in Model 3 marked a significant inflection point. MMSE and semantic fluency test (animals) positioned themselves as crucial predictors alongside GDS, while previously important variables assumed secondary roles. This pattern persisted in Model 4, where diabetes emerged as a relevant clinical factor. Model 5, the most comprehensive, revealed a significant shift in predictor hierarchy. For the first time, immediate free episodic memory performance surpassed GDS as the most important predictor. Information processing speed also emerged as a crucial predictor. Notably, even in this more complex model, both GDS and family history of AD maintained their relevance as important predictors.

Discussion

This study implements an advanced machine learning framework utilizing the XGBoost algorithm to predict the conversion to MCI in cognitively healthy older adults. Five predictive models were developed, each progressively adding complexity by incorporating demographic, medical, and cognitive information. The methodology addresses challenges such as class imbalance and missing values by employing modern techniques like SMOTE and multiple imputation. The results highlight that the multimodal approach and the use of tools like SHAP for interpretability significantly improve predictive accuracy. An original aspect of this work is the provision of a free online calculator (https://aimar-project.shinyapps.io/MCI-risk-calculator/), enabling interested readers to apply the model in practice (instructions and more details about the calculator can be found in the Supplementary Note). This study represents progress toward the early detection of MCI and the personalization of preventive interventions.

Main findings

In terms of performance, Models 1 and 2 demonstrated solid baseline accuracy using easily acquired demographic, self-reported, and basic medical data for practitioners. The incorporation of cognitive screening measures in Model 3 and 4 and more complex cognitive tests in Model 5 led to significant improvements in prediction accuracy. Thus, our results highlight the importance of integrating a wide range of data types. Importantly, supplementary analyses confirmed no overfitting across all five models.

Distinct patterns of feature importance across models were revealed. GDS emerged as the dominant predictor across Models 1–4, while immediate free episodic memory performance became the most influential factor in Model 5. This progression highlights how the integration of increasingly sophisticated cognitive assessments enhances predictive accuracy while maintaining the relevance of affective symptoms. Although demographic and self-reported information provide a useful baseline, our comprehensive models that integrate cognitive assessments demonstrated superior prognostic utility in identifying individuals at risk for MCI. This finding aligns with the broader literature emphasizing the importance of cognitive performance in predicting MCI^38,39. In our study, cognitive variables, particularly memory tests and digit-symbol substitution tasks, were identified as the most critical predictors in the more complex model.

The consistent importance of GDS across models, coupled with the emergence of memory performance as the leading predictor in Model 5, suggests that both affective and cognitive domains play crucial roles in MCI prediction. Notably, family history of AD maintained its relevance as an important predictor even in the most comprehensive model, highlighting the value of considering genetic risk factors. Moreover, while clinical parameters showed model-specific importance (e.g., hyperlipidemia, hypertension, diabetes), their influence varied across different model configurations.

Variable importance and clinical implications

Our SHAP analysis revealed a hierarchical pattern of predictive factors that provide valuable insights for clinical practice. The basic model identified three critical predictors that should trigger initial clinical attention: GDS (with scores ≥ 5 warranting special attention), age (particularly significant for patients over 70 years), and the presence of cognitive complaints, especially when these factors appear in combination. Family history of AD also maintained consistent importance across models, suggesting its value as an early screening indicator.

Building upon these foundational indicators, our analysis revealed that certain self-reported clinical parameters showed model-specific importance (hyperlipidemia in Model 1, hypertension in Model 2, diabetes in Model 4). Right-hand tapping emerged as the most relevant medical variable in Model 2. These factors should be evaluated as secondary indicators, especially in patients already showing basic risk factors.

Our more sophisticated Models 3–5 highlighted specific cognitive patterns that deserve particular attention in clinical settings. The introduction of cognitive screening measures (MMSE and animals) in Model 3 marked a significant shift in predictive patterns. In Model 5, the most comprehensive, immediate free episodic memory performance emerged as the strongest predictor, surpassing GDS, while information processing speed also showed crucial predictive value.

Based on these findings, we propose a stepwise evaluation approach for practitioners: (i) Initial screening focusing on depressive symptoms, advanced age, and positive family history of AD; (ii) Secondary assessment incorporating specific clinical parameters based on individual risk profiles; and (iii) Comprehensive cognitive evaluation, with particular emphasis on episodic memory and processing speed (i.e. FCSRT and digit-symbol scores below normative ranges). Moreover, it would be advisable to continuously monitor patients presenting multiple risk factors, especially those showing high-risk combinations identified in Models 3–5. This structured approach emphasizes the importance of both affective symptoms and cognitive performance in MCI prediction, while acknowledging the role of demographic and clinical factors in creating a comprehensive risk profile.

Clinical utility and accessibility

The findings of this study highlight the critical importance of early detection and targeted prevention strategies in addressing the growing burden of MCI and dementia. The successful implementation of an online risk calculator provides a practical solution for clinicians. By relying exclusively on non-imaging features, the calculator overcomes the cost, accessibility, and infrastructure barriers typically associated with neuroimaging-based approaches. In particular, the availability of simpler models (e.g., Model 1) broadens the accessibility of these tools to primary care settings, where cognitive assessments or specialized medical equipment may not be readily available. However, the incorporation of cognitive variables in Models 3-5significantly enhances predictive accuracy, making them more suitable for specialized settings or when cognitive testing can be performed. These findings suggest that the tool’s flexibility allows it to be adapted to different healthcare contexts, from resource-limited environments to advanced clinical facilities.

Ethical and data protection implications

The implementation of machine learning models for predicting cognitive decline raises critical ethical considerations that extend beyond technical performance. While our models demonstrate robust predictive accuracy and stable validation performance, this high accuracy carries significant responsibility in clinical practice. The psychological impact of early MCI diagnosis demands careful handling of both positive and negative predictions, necessitating clear protocols for result communication and appropriate support systems. Furthermore, ensuring fairness across different demographic groups is paramount, requiring regular validation to prevent potential healthcare disparities and maintain trust in clinical settings.

From a data protection perspective, the implementation of these predictive models demands a comprehensive framework that addresses both privacy concerns and practical clinical integration. This includes robust security protocols for data transmission and storage, strict adherence to GDPR and HIPAA requirements, and clear procedures for data retention and deletion. The successful integration of these models in clinical practice requires maintaining a balance between algorithmic predictions and human oversight, ensuring that AI augments rather than replaces clinical judgment. Additionally, healthcare providers must be equipped with proper training and support systems to effectively communicate AI-derived predictions while maintaining transparency about the model’s capabilities and limitations. This approach ensures that technology serves to enhance patient care while protecting individual privacy and maintaining ethical standards in medical practice.

Limitations

While the results of this study are promising, several limitations must be acknowledged. First, the study cohort was derived from two datasets in Madrid, Spain, which may limit the generalizability of the findings to other populations. Validation in diverse cohorts and geographic regions is essential to ensure broader applicability. To address the inherent geographical limitations of our current dataset, we are planning a comprehensive validation strategy to address this concern that will involve two parallel approaches: first, external retrospective validation using established international longitudinal databases (such as the National Alzheimer’s Coordinating Center (NACC) and the Survey of Health, Ageing and Retirement in Europe (SHARE)) that contain equivalent variables to those in our study; and second, clinical prospective validation using patient data from multiple Spanish medical centers. This dual validation approach will enable both model refinement through new data incorporation and verification of generalizability across diverse populations. This systematic validation process will strengthen the robustness and broader applicability of our predictive models across different geographical and demographic contexts.

Second, while our models demonstrate strong predictive performance within a three-year window, we acknowledge that this follow-up period may not fully capture the progression of cognitive decline, particularly in individuals with slower disease trajectories. Longer follow-up periods could provide valuable insights into different progression patterns and potentially improve the calibration and predictive performance of our models. This limitation highlights the importance of future longitudinal studies with extended follow-up periods to validate and refine these predictive tools.

Finally, while our models incorporating cognitive assessments demonstrate superior predictive performance, we acknowledge the potential implementation challenges these assessments may present in various clinical settings. To address this limitation, we have developed a flexible, multi-tiered approach that accommodates different levels of assessment capability. For settings where comprehensive cognitive testing may not be feasible such as primary care, our simpler models (Models 1 and 2) provide reliable predictive performance using readily available clinical and demographic data. In any event, several emerging alternatives could complement or enhance traditional cognitive assessments in future iterations of these models. Digital physiological markers from wearable devices have shown promising correlations with cognitive function and offer the advantage of continuous, passive monitoring without requiring specialized testing environments. The incorporation of cognitive reserve proxies, such as educational attainment and occupational history, could also provide valuable predictive information without direct cognitive testing.

Broader implications and future research

The comparative analysis across our five models reveals two key insights. First, cognitive features were found to be the dominant predictors, particularly those assessing memory, which emphasizes their critical role in predicting MCI conversion. Second, there is also an interplay of demographic, self-reported, and medical factors, particularly in simpler models. This interplay suggests that while cognitive decline is central to MCI, other factors such as age, depression, cognitive complaints, family history of AD, and cardiovascular health significantly contribute to risk stratification.

According to Livingston et al. (2024), 45% of future dementia cases could potentially be prevented if the 14 risk factors identified by the Lancet Commission on Dementia Prevention, Intervention, and Care were eliminated⁸. This emphasizes the need to integrate machine learning frameworks, such as the one developed in this study, into public health initiatives aimed at reducing modifiable risk factors. Future research should explore how predictive models can be combined with personalized interventions to mitigate these risks at an individual level. Furthermore, the development of user-friendly tools, such as the online calculator proposed in this work, will facilitate the practical application of advanced algorithms, empowering healthcare professionals and policymakers to make data-driven decisions. Expanding the dataset to include diverse populations and longitudinal outcomes beyond three years could enhance the generalizability and robustness of predictive models. Extended follow-up studies could enable the development of more sophisticated time-dependent models capable of predicting cognitive decline across various timeframes. Such models could potentially be calibrated to provide predictions for different temporal horizons (e.g., 3-year, 5-year, and 10-year risk), offering clinicians more comprehensive prognostic information. Additionally, longer follow-up periods would allow for the development of dynamic prediction models that could be updated as new patient data becomes available, potentially improving the accuracy and clinical utility of the predictions over time. Finally, interdisciplinary collaboration between clinicians, researchers, and technology developers is essential to ensure that these innovations translate effectively into real-world applications, fostering a global effort to reduce the incidence of dementia.

Conclusion

The early diagnosis of dementia, particularly AD, is critical for effective intervention and management. Traditional diagnostic methods, while effective, have several limitations that make them less feasible for widespread use. ML offers a promising alternative, with the potential to develop accurate, interpretable, and online accessible diagnostic tools using non-imaging features. This study aims to address the gaps in current research by developing and validating a non-imaging-based diagnostic tool for dementia that represents a significant step toward early detection and personalized prevention of cognitive decline. With further validation and refinement, this tool could play a pivotal role in mitigating the growing global burden of dementia.

Data availability

The datasets generated during the current study are not publicly available due to privacy restrictions but are available from the corresponding author on reasonable request.

References

Affairs, U. N. D. of E. and S. World Population Ageing 2019. United Nations, Department of Economic and Social Affairs, Population Division (2020). https://doi.org/10.18356/6A8968EF-EN
Better, M. A. Alzheimer’s disease facts and figures. Alzheimers Dement 19, 1598–1695 (2023).
Google Scholar
World Health Organization. Global Action Plan on the Public Health Response To Dementia 2017–2025. Geneva: World Health Organization (World Health Organization, 2017).
Hampel, H., Lista, S. & Khachaturian, Z. S. Development of biomarkers to chart all Alzheimer’s disease stages: The Royal road to cutting the therapeutic gordian knot. Alzheimers Dement. 8, 312–336 (2012).
Article CAS PubMed Google Scholar
Porsteinsson, A. P., Isaacson, R. S., Knox, S., Sabbagh, M. N. & Rubino, I. Diagnosis of early Alzheimer’s disease: Clinical practice in 2021. J. Prev. Alzheimers Dis. 8, 371–386 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sanz-Blasco, R. et al. Transition from mild cognitive impairment to normal cognition: Determining the predictors of reversion with multi-state Markov models. Alzheimers Dement. 18, 1177–1185 (2022).
Article PubMed Google Scholar
Livingston, G. et al. Dementia prevention, intervention, and care. Lancet 390, 2673–2734 (2017).
Article PubMed Google Scholar
Livingston, G. et al. Dementia prevention, intervention, and care: 2024 report of the lancet standing commission. Lancet 404, 572–628 (2024).
Article PubMed Google Scholar
Huang, J. et al. Haemoglobin A(1c) and cognitive function in very old, cognitively intact men. Age Ageing. 41, 125–128 (2012).
Article PubMed Google Scholar
West, R. et al. Better memory functioning associated with higher total and low-density lipoprotein cholesterol levels in very elderly subjects without the Apolipoprotein e4 allele. Am. J. Geriatr. Psychiatry. 16, 781–785 (2008).
Article PubMed PubMed Central Google Scholar
West, R. K. et al. Homocysteine and cognitive function in very elderly nondemented subjects. Am. J. Geriatr. Psychiatry. 19, 673–677 (2011).
Article PubMed PubMed Central Google Scholar
Gómez-Ramírez, J., Ávila-Villanueva, M. & Fernández-Blázquez, M. Á. Selecting the most important self-assessed features for predicting conversion to mild cognitive impairment with random forest and permutation-based methods. Sci. Rep. 10, (2020).
Wang, H. et al. Develop a diagnostic tool for dementia using machine learning and non-imaging features. Front. Aging Neurosci. 14 (2022).
Stephan, B. C. M., Kurth, T., Matthews, F. E., Brayne, C. & Dufouil, C. Dementia risk prediction in the population: Are screening models accurate? Nat. Rev. Neurol. 6, 318–326 (2010).
Article PubMed Google Scholar
Barnes, D. E. et al. Predicting risk of dementia in older adults: The late-life dementia risk index. Neurology 73, 173–179 (2009).
Article CAS PubMed PubMed Central Google Scholar
Exalto, L. G. et al. Risk score for prediction of 10 year dementia risk in individuals with type 2 diabetes: A cohort study. Lancet Diabetes Endocrinol. 1, 183–190 (2013).
Article PubMed PubMed Central Google Scholar
Nori, V. S., Hane, C. A., Martin, D. C., Kravetz, A. D. & Sanghavi, D. M. Identifying incident dementia by applying machine learning to a very large administrative claims dataset. PLoS One 14 (2019).
Diogo, V. S., Ferreira, H. A. & Prata, D. Early diagnosis of Alzheimer’s disease using machine learning: A multi-diagnostic, generalizable approach. Alzheimers Res. Ther 14 (2022).
Fernández-Blázquez, M. A. et al. MADRID + 90 study on factors associated with longevity: Study design and preliminary data. PLoS One 16 (2021).
Albert, M. S. et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 7, 270–279 (2011).
Article PubMed Google Scholar
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Dis 13–17-August-2016, 785–794 (2016).
Snoek, J., Larochelle, H. & Adams, R. P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 4, 2951–2959 (2012).
Google Scholar
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008).
Article Google Scholar
Robin, X. et al. pROC: An open-source package for R and S + to analyze and compare ROC curves. BMC Bioinform. 12 (2011).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
van Buuren, S. & Groothuis-Oudshoorn, K. Mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45, 1–67 (2011).
Article Google Scholar
Lunardon, N., Menardi, G. & Torelli, N. ROSE: A package for binary imbalanced learning. R J. 6, 79–89 (2014).
Article Google Scholar
Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021).
Article Google Scholar
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 4766–4775 (2017).
Yang, L. & Just, A. SHAPforxgboost: SHAP Plots for ‘XGBoost’ R package version 0.1.3 Preprint at (2023). https://CRAN.R-project.org/package=SHAPforxgboost
Fernández-Blázquez, M. A., Ávila-Villanueva, M. & Medina, M. The dimensional structure of subjective cognitive decline. Neuromethods 137, 45–62 (2018).
Article Google Scholar
Yesavage, J. et al. Development and validation of a geriatric depression screening scale: A preliminary report. J. Psychiatr Res. 17, 37–49 (1983).
Article CAS Google Scholar
Rabin, R. & De Charro, F. EQ-5D: A measure of health status from the EuroQol group. Ann. Med. 33, 337–343 (2001).
Article CAS PubMed Google Scholar
Folstein, M. F., Folstein, S. E. & McHugh, P. R. Mini-mental state: A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr Res. 12, 189–198 (1975).
Article CAS PubMed Google Scholar
Buschke, H. Cued recall in amnesia. J. Clin. Neuropsychol. 6, 433–440 (1984).
Article CAS PubMed Google Scholar
Rey, A. L’examen psychologique Dans les Cas d’encéphalopathie traumatique. Arch. Psychol. (Geneve). 28, 215–285 (1941).
Google Scholar
Wechsler, D. Wechsler Adult Intelligence Scale-III (The Psychological Corporation, 1997).
Belleville, S., Fouquet, C., Hudon, C., Zomahoun, H. T. V. & Croteau, J. Neuropsychological measures that predict progression from mild cognitive impairment to Alzheimer’s type dementia in older adults: A systematic review and meta-analysis. Neuropsychol. Rev. 27, 328–353 (2017).
Article PubMed PubMed Central Google Scholar
Gomar, J. J., Bobes-Bascaran, M. T., Conejero-Goldberg, C., Davies, P. & Goldberg, T. E. Utility of combinations of biomarkers, cognitive markers, and risk factors to predict conversion from mild cognitive impairment to Alzheimer disease in patients in the Alzheimer’s disease neuroimaging initiative. Arch. Gen. Psychiatry. 68, 961–969 (2011).
Article PubMed Google Scholar

Download references

Funding

This work was supported by multiple funding sources. The collection of the database used in this study was made possible through funding from the Spanish Ministry of Science, Innovation and Universities (project reference: RTI2018-098762-B-C31) and from the Fundación General de la Universidad de Salamanca (FGUSAL) through the Centro Internacional sobre el Envejecimiento (CENIE) under the Grant 0348_CIE_6_E by Fondos FEDER EU. The data analysis, including the application of machine learning techniques, was carried out within the framework of the project funded by the Spanish Ministry of Science and Innovation through the 2022 Knowledge Generation Projects call (project reference: PID2022-141966OB-I00).

Author information

Authors and Affiliations

Department of Biological and Health Psychology, School of Psychology, Universidad Autónoma de Madrid, C/ Iván Pavlov, 6, Madrid, 28049, Spain
Miguel A. Fernández-Blázquez & Marina Ávila-Villanueva
Department of Experimental Psychology, Cognitive Processes and Speech Therapy, Universidad Complutense de Madrid, Madrid, Spain
José M. Ruiz-Sánchez de León, Rubén Sanz-Blasco & Mercedes Montenegro-Peña
Department of Psychobiology and Methodology in Behavioural Sciences, Universidad Complutense de Madrid, Madrid, Spain
Emilio Verche
Health Research Institute of the San Carlos Clinic Hospital (IdISSC), Madrid, Spain
María José Gil-Moreno
Neurology Department, Hospital Clínico San Carlos, Madrid, Spain
María José Gil-Moreno
Centre for the Prevention of Cognitive Impairment, Madrid Salud, Madrid City Council, Madrid, Spain
Mercedes Montenegro-Peña
Nuestra Señora del Rosario University Hospital, Madrid, Spain
Carmen Terrón
Sanitas La Zarzuela University Hospital, Madrid, Spain
Carmen Terrón
Cognitive Impairment Unit, Neurology Service, Sanitas La Moraleja University Hospital, Madrid, Spain
Cristina Fernández-García
Psychophysiology and Neuroimaging Group, Institute of Biomedical Research Cadiz (INiBICA), Cadiz, Spain
Jaime Gómez-Ramírez

Authors

Miguel A. Fernández-Blázquez
View author publications
Search author on:PubMed Google Scholar
José M. Ruiz-Sánchez de León
View author publications
Search author on:PubMed Google Scholar
Rubén Sanz-Blasco
View author publications
Search author on:PubMed Google Scholar
Emilio Verche
View author publications
Search author on:PubMed Google Scholar
Marina Ávila-Villanueva
View author publications
Search author on:PubMed Google Scholar
María José Gil-Moreno
View author publications
Search author on:PubMed Google Scholar
Mercedes Montenegro-Peña
View author publications
Search author on:PubMed Google Scholar
Carmen Terrón
View author publications
Search author on:PubMed Google Scholar
Cristina Fernández-García
View author publications
Search author on:PubMed Google Scholar
Jaime Gómez-Ramírez
View author publications
Search author on:PubMed Google Scholar

Contributions

M.A.F-B. and J.M.R-SdL. conceptualized and designed the study. M.J.G-M., M.M-P., C.T., and C.F-G. were responsible for data collection. M.A-V., E.V., and R.S-B. managed the database. M.A.F-B. and J.G-R. performed the data analysis, including the application of machine learning techniques and the development of the risk calculator. J.M.R-SdL., E.V., and R.S-B. contributed to the interpretation of the results and provided critical revisions of the manuscript. M.A-V. and J.G-R. participated in drafting and final editing of the manuscript. M.A.F-B. supervised the overall project. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Miguel A. Fernández-Blázquez.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Fernández-Blázquez, M.A., Ruiz-Sánchez de León, J.M., Sanz-Blasco, R. et al. XGBoost models based on non imaging features for the prediction of mild cognitive impairment in older adults. Sci Rep 15, 29732 (2025). https://doi.org/10.1038/s41598-025-14832-0

Download citation

Received: 14 March 2025
Accepted: 04 August 2025
Published: 13 August 2025
DOI: https://doi.org/10.1038/s41598-025-14832-0

Subjects

Abstract

Similar content being viewed by others

Differential diagnosis of neurodegenerative dementias with the explainable MRI based machine learning algorithm MUQUBIA

A multimodal machine learning model for predicting dementia conversion in Alzheimer’s disease

Cognitive performance classification of older patients using machine learning and electronic medical records

Introduction

Objectives of our study

Methods

Data source

Data preprocessing

Selection of variables

Class imbalance

Model training

Hyperparameter optimization and cross-validation

Final model training and evaluation

Feature importance and SHAP analysis

Ethics approval

Results

Preprocessing and selection of variables

Models performance

Model validation and overfitting assessment

Feature importance and SHAP values

Discussion

Main findings

Variable importance and clinical implications

Clinical utility and accessibility

Ethical and data protection implications

Limitations

Broader implications and future research

Conclusion

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links