An integrated predictive model for Alzheimer’s disease progression from cognitively normal subjects using generated MRI and interpretable AI

Aghaei, Atefe; Moghaddam, Mohsen Ebrahimi

doi:10.1038/s41598-025-13478-2

Download PDF

Article
Open access
Published: 04 August 2025

An integrated predictive model for Alzheimer’s disease progression from cognitively normal subjects using generated MRI and interpretable AI

Atefe Aghaei¹ &
Mohsen Ebrahimi Moghaddam¹

Scientific Reports volume 15, Article number: 28340 (2025) Cite this article

2242 Accesses
10 Altmetric
Metrics details

Subjects

Abstract

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that begins with subtle cognitive changes and advances to severe impairment. Early diagnosis is crucial for effective intervention and management. In this study, we propose an integrated framework that leverages ensemble transfer learning, generative modeling, and automatic ROI extraction techniques to predict the progression of Alzheimer’s disease from cognitively normal (CN) subjects. Using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, we employ a three-stage process: (1) estimating the probability of transitioning from CN to mild cognitive impairment (MCI) using ensemble transfer learning, (2) generating future MRI images using Transformer-based Generative Adversarial Network (ViT-GANs) to simulate disease progression after two years, and (3) predicting AD using a 3D convolutional neural network (CNN) with calibrated probabilities using isotonic regression and interpreting critical regions of interest (ROIs) with Gradient-weighted Class Activation Mapping (Grad-CAM). However, the proposed method has generality and may work when sufficient data for simulating brain changes after three years or more is available; in the training phase, regarding available data, brain changes after 2 years have been considered. Our approach addresses the challenge of limited longitudinal data by creating high-quality synthetic images and improving model transparency by identifying key brain regions involved in disease progression. The proposed method demonstrates high accuracy and F1-score, 0.85 and 0.86, respectively, in CN to AD prediction up to 10 years, offering a potential tool for early diagnosis and personalized intervention strategies in Alzheimer’s disease.

Comorbidity-based framework for Alzheimer’s disease classification using graph neural networks

Article Open access 10 September 2024

Explainable early detection of Alzheimer’s disease using ROIs and an ensemble of 138 3D vision transformers

Article Open access 12 November 2024

Generalizable deep learning model for early Alzheimer’s disease detection from structural MRIs

Article Open access 17 October 2022

Introduction

Alzheimer’s disease is an acquired, generalized, and usually progressive disorder that based on some references appears in three distinct stages¹. The first stage, known as the preclinical phase, involves subtle changes in the brain, blood, and cerebrospinal fluid without noticeable symptoms in the patient². This stage can begin 20 years before symptoms become evident³. The second stage called mild cognitive impairment (MCI), is characterized by noticeable symptoms affecting cognitive abilities, although these do not significantly impact daily life. Not all individuals with MCI progress to Alzheimer’s disease, but it is estimated that 10-15 percent of them develop annually^4,5. MCI patients are further categorized into progressive MCI (pMCI), who develop Alzheimer’s during a follow-up period (usually 1 to 3 years), and stable MCI (sMCI), who do not⁶. The final stage, Alzheimer’s dementia, involves clear symptoms of memory, cognitive, and behavioral impairments that significantly interfere with daily functioning⁷.

Diagnosing Alzheimer’s disease typically involves a comprehensive medical evaluation, including a medical history review, mental state test, physical tests, and neuroimaging techniques such as structural MRI, functional MRI, and PET techniques. Among the methods of diagnosing Alzheimer’s disease, MRI imaging is a neuroimaging technique that is the most common technique for identifying Alzheimer ’s-related brain atrophy among Alzheimer’s diagnosis and prediction biomarkers⁸.

In recent years, with the advent of deep learning, methods such as convolutional neural networks (CNNs), Generative Models, and Transformers have been increasingly utilized for medical image processing, including Alzheimer’s diagnosis and prediction. A variety of methods, including three-dimensional MRI images of the whole brain,^9,10,11, converting 3D MRI into two-dimensional MRI slices of the brain^12,13,14, converting the image into several three-dimensional patches^15,16,17, as well as methods Based on regions of interest (ROI) focusing on known regions^18,19,20 are used to diagnose or predict Alzheimer’s Disease. On one hand, using the 3D volume of the brain increases the computational complexity, on the other hand, converting this volume into 2D slices or 3D patches will cause the loss of some important image information²¹.

ROI-based methods focus on specific brain regions such as the hippocampus, but might miss varying disease characteristics across different stages. Therefore, in some papers, many brain regions are selected as ROIs. For example, in²² 134 regions are selected for diagnosing Alzheimer’s disease, and in the end, some of them are identified as the most informative ROIs. Therefore, one of the challenges in these methods is extracting the best ROIs. Since most of the Machine learning methods, specifically Deep Learning methods, are not interpretable, the reliability of these methods decreases. To this end, post hoc Explainable methods have recently been widely used to make machine learning models interpretable²³. For example, in some papers, SHAPLY is used as a post-hoc interpretability method for Alzheimer’s prediction^24,25, and some other papers used LIME to make their model explainable^14,26. Another interpretable method is Grad-CAM, which is used for image-based deep learning models for AD diagnosis^27,28. Grad-CAM is also used to automatically ROI extraction in the study²⁹. In this study, a model is trained on one task (AD vs CN) to automatically extract ROIs using Grad-CAM. After that, the model is trained on the pMCI vs sMCI task using the given images and extracted ROIs. In this paper, customized transfer learning is also used to improve the results.

The results in recent articles show that, since the number of medical images is usually small, the use of transfer learning, including the use of pre-trained data models such as ResNet, AlexNet, VGGNet, MobileNet, etc., for diagnosis and prediction of Alzheimer’s disease have achieved good results^30,31,32. Since these methods have been trained on natural images and considering that these models have been trained on two-dimensional data, in some studies, instead of using pre-existing methods, pre-trained customized methods have been used, and using these models researchers have been able to extract good features and achieve good results^33,34.

Diagnosing Alzheimer’s disease is a complex process, but its complexity also depends on the stage at which the disease is diagnosed. It is more difficult to diagnose people with Alzheimer’s in the early stages because most of the symptoms are not clear³⁵. Another problem in diagnosing Alzheimer’s disease before symptoms appear (when a person is in the CN category) is the lack of data to train a model for predicting Alzheimer’s disease. One of the ways to solve this problem is to generate data using generative methods. Techniques such as Generative Adversarial Networks (GANs), Deep Convolutional GANs (DCGANs), and Diffusion models have demonstrated considerable success in the medical field^36,37,38. These methods have been effectively employed for various purposes, including data augmentation^39,40,41, addressing the issue of missing data^42,43, and converting data across different modalities in multimodal approaches⁴⁴. There are limitations in predicting Alzheimer’s Disease from Cognitively Normal subjects. One of the limitations is data leakage. Training a model that can predict Alzheimer’s Disease from CN subjects needs lots of CN subjects who are converted to AD. Another limitation is generalizability. using one model that trains on a dataset and also tests on the same dataset decreases the generalizability of the model. The other limitation is the lack of confidence of experts in the results of two-class classification. In other words, when we use a threshold to classify data, it may give a wrong probability that just because of binary classification data is placed in a class randomly.

In this study, we propose an integrated framework that leverages ensemble transfer learning and generative modeling to predict the progression of Alzheimer’s disease from cognitively normal subjects. To enhance generalizability, we utilize customized transfer learning methods, trained on various datasets and tasks, to improve predictive accuracy. we employ a combination of two pre-trained models to estimate the probability of a healthy individual converting to mild cognitive impairment (MCI). Given the substantial influence of brain age on Alzheimer’s disease, as indicated in the literature, one of the pre-trained models is the brain age estimation model proposed in⁴⁵. The other is the pre-trained sMCI and pMCI classifier, which has extracted good features. We have used the method introduced in our previous article⁴⁶ which uses interpretable methods to extract ROIs automatically to improve our work. To overcome the challenge of lack of data for training CN to AD prediction, we have used generative methods to generate data for predicting Alzheimer’s disease from CN people. We have used generative methods to generate an image of a healthy person’s brain after two years (healthy or MCI) and use this image to predict AD.

The reason for choosing two years to generate the image is the lack of data for training the model for more than two years. Since the model has well recognized the brain changes in the two-year follow-up and even in the test data which have been converted to MCI in more than two years, it has accurately predicted Alzheimer’s from the image two years later, this shows that the model has good generalizability and if there is enough data to train the generative model for more than two years, it will be more accurate. Since the distance between a healthy person and Alzheimer’s is high, and the reliability of the model which predicts whether a person converts to AD or not decreases, we refrain from announcing the results definitively and emphasize the obtained probability.

Since there is not enough data to train a model to predict Alzheimer’s disease from healthy people, we proposed a method that combines the prediction model of the CN to MCI conversion and the prediction model of the conversion of MCI to AD to estimate the probability of predicting the CN to AD conversion. To do this, we have used the multiplication of the probability of CN to MCI conversion and the probability of converting synthetic images to AD. In our proposed method for predicting the probability of a healthy person to Alzheimer’s disease, first, an MRI image of the person two years later from the baseline image is produced. Then it estimates the probability of Alzheimer’s disease from the synthetic image. In other words, the goal is to see if the person will get MCI after two years or not, and then the synthetic image is fed into the MCI to AD prediction model to obtain the probability of AD progression. Ultimately, the probability of MCI from the CN subject is multiplied by the probability of synthetic image to AD. Corrected probabilities are crucial for reliable predictions in medical diagnosis tasks. Calibration methods adjust the predicted probabilities to better reflect the true probabilities observed in the data^47,48. In this study, we use isotonic regression, a non-parametric method that ensures a monotonic relationship between predicted and true probabilities, to improve the calibration of our model’s predictions and correct the biased probabilities. Also, to make the predicted model more robust, we add some demographic features that the literature focuses on to predict AD. By combining advanced machine learning techniques with a focus on interpretability, our study aims to improve the accuracy of AD progression prediction and provide a valuable tool for early diagnosis and personalized intervention strategies. Our approach consists of four main contributions:

$\bullet$ We propose an integrated approach to predict the development of AD in CN individuals up to 10 years prior to diagnosis.

$\bullet$ Our proposed method predicts the AD conversion probability using the multiplication of CN to MCI probability and MCI to AD probability.

$\bullet$ To increase the generalizability of our proposed method, we proposed an accurate Ensemble Transfer deep learning method to predict MCI conversion from CN subjects which is a combination of a fine-tuned model for Brain Age estimation and a fine-tuned model for MCI to AD prediction.

$\bullet$ We use a generative model (ViT-GAN) to generate brain MRI images of a subject after two years which shows the brain changes in two years to predict AD from CN subject accurately.

$\bullet$ To ensure interpretability, we employ Gradient-weighted Class Activation Mapping (Grad-CAM) to identify the most critical regions of interest (ROIs) that influence the model’s decisions. These ROIs are further analyzed using a 3D CNN to compute the probability of developing Alzheimer’s disease.

Our integrated approach not only aims to improve the accuracy of AD progression prediction but also emphasizes interpretability, ensuring that the model’s decisions are transparent and clinically meaningful. Our study’s results demonstrate significant results in predicting Alzheimer’s progression from healthy subjects up to 10 years. It achieves accuracy and f1-score equal to 0.85 and 0.86 respectively. Also, we compare our proposed framework with a baseline model that classifies CN, MCI, and AD. The results show that the baseline model did not achieve good results compared to our method, especially in AD detection.

In the remainder of this paper, we detail our dataset and the proposed integrated framework for predicting Alzheimer’s disease progression in the Methods section. The Results section presents the outcomes of our experiments, showcasing the performance of our models through various evaluation metrics. We also provide qualitative and quantitative analyses of our generative models. In the Discussion section, we discuss the proposed method and obtained results, and also interpret our findings. Finally, we conclude our work and suggest future works in the conclusion and future works section.

Method

Overview

Based on Fig. 1, In the first step, the images are pre-processed. So using B1 Correction, N3 [36], and Grad Warp [35], the artifacts created by the imaging machine are removed, and then the brain is extracted from the skull and scalp using FreeSurfer. Next, images as well as metadata (including Age, Gender, Marital Status, and Education) are fed into Ensemble Transfer learning. The Ensemble transfer learning model is the combination of the results of two fine-tuned models which are explained in “Ensemble Transfer Learning” sub-section. In the next phase, the pre-processed images are fed into the generative model to synthesize the brain image in the next two years. The details of image generation have explained in “Image Generation” sub-section. In the last phase, Generated Images are fed into a 3D CNN model trained on real sMCI and pMCI to obtain the probability of MCI to AD conversion. In this phase, conversion from MCI to AD is obtained using ROIs. At the end of this phase, the probability of CN to MCI and the probability of MCI to AD are multiplied. In the last sub-section, the details of this phase are illustrated.

Data description

In this paper, T1-weighted (T1w) MRI data exclusively from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset⁴⁹ are used to train and test the proposed method. ADNI is a multicenter longitudinal study aimed at predicting and tracking Alzheimer’s disease, collecting subjects’ clinical, imaging, genetic, and biochemical biomarkers since 2004 across four phases: ADNI-1, ADNI-Go, ADNI-2, and ADNI-3. The parameters of the T1w MRI data include a field strength of 1.5 Tesla, a pulse sequence of T1-weighted MPRAGE (Magnetization Prepared RApid Gradient Echo), and a matrix size of $256 \times 256$. The subjects in our analysis are derived from the post-processed ADNI data, which underwent B1 correction, N3 bias field correction, gradient warp correction, and brain extraction using FreeSurfer. The data used in this article consists of three categories: cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer’s disease (AD). Within the CN and MCI groups, we identify subcategories, including healthy individuals who develop Alzheimer’s (pCN), those who do not (sCN), MCI individuals who convert to Alzheimer’s (pMCI), and those who remain stable (sMCI). We had 15 CN subjects who are converted to AD after years. We extract these subjects and 15 subjects who are not converted to MCI and AD from our whole dataset. We have used the remained data, including CN who are converted to MCI but not AD, CN who are not converted to MCI and AD, MCI who are converted to AD, and MCI who are not converted to AD, and AD subjects in each part.

We also utilized a comprehensive dataset comprising MRI images and associated meta-data to predict the progression from cognitively normal (CN) to Alzheimer’s (AD). The meta-data includes key demographic features such as age, gender, marital status, and education level of the participants. This demographic information provides essential context for interpreting the MRI images and enhances the validity of our predictive models by accounting for potential confounding factors that may influence disease progression. Information about the utilized data is presented in Table 1.

To ensure the robustness of our deep learning models, we implemented a clear data preparation strategy. While our primary test data consisted of pCN and sCN images that never used in training phase, each deep learning component within the framework required its own validation data for performance evaluation. For this, we applied an 80-20 split to the training data, using 80% for training the models and 20% for validation.

Table 1 The Demographic Information of subjects in used dataset.

Full size table

Ensemble transfer learning

In this section, we explain how we estimate the probability of MCI conversion using Ensemble Transfer learning illustrated in Fig. 2. According to Fig.2, the results of two Fine-tuned models are combined to estimate the probability more accurately. The fine-tuned models are the Age Estimation model, introduced in⁴⁵, and the MCI conversion to AD model introduced in⁴⁶. The reason for using Brain Age Estimation is that according to that article, Brain Age Gap(BAG) is a good biomarker for Alzheimer’s detection and obtained good results on AD detection as the only biomarker. Since brain age is estimated using MRI volume, to avoid redundancy, instead of using the brain age gap as a biomarker, the pre-trained model is used as transfer learning for feature extraction. Another reason is the Generalizability of the introduced model. The results of the article show that the model has good performance on the datasets which are not in the train set. The other fine-tuned model has good performance in the classification of sMCI and pMCI and the model learns the features of MCI patients. As illustrated in Fig.2, the output of the last feature extraction block (3D ResNet block in one model and 3D CNN block in another), fed into a Flatten block which consists of one flatten layer, Leaky ReLu Activation function $\alpha = 0.1$, BatchNormalization Layer, Dropout Laeyr with rate decay of 0.4, one fully connected dense layer with 32 features. After that, one 1D array for each demographic data (Age, Gender, Education, and Marriage Status) is concatenated to a feature vector. Therefore, 36 features were selected for classification for each data. At the end of each network, the softmax activation function is used for classification. Each model is fine-tuned using a train set separately. Figure 2 effectively illustrates the model fusion process. In this process, two fine-tuned models (the brain age estimation model and the classifier for sMCI and pMCI) are utilized. As shown in Fig. 2, the combination of the two models is executed using the Max(P) approach. Specifically, each model provides a probability of belonging to a class for the i-th test sample. The maximum of the probabilities from both models is selected and returned as the final output.

Image generation

In this study, we focus on predicting the progression from CN to AD using generated images from the baseline image. This section details the process of image generation. As mentioned in the Introduction, we simulate changes in the brain by generating images of healthy subjects after two years. Our approach employs the ViT-GAN model, which generates future MRI images based solely on baseline images. This approach allows us to effectively train the model to generate predictive images while maintaining a focus on using baseline data for our final predictions. While our predictive model does not require longitudinal data, the ViT-GAN is trained using subjects who possess both baseline and two-year follow-up images. This selection ensures that our training set is robust and addresses potential issues related to data completeness. The choice of a two-year interval is primarily due to data availability limitations; however, the method is generalizable. If more extensive longitudinal data were available, such as three years or more, the results would likely be more accurate.

We applied normalization techniques that scaled the MRI intensity values to a standardized range of 0 to 1. This step is essential to enhance the model’s performance and ensure that it learns from data that is uniformly represented. To facilitate the training of our ViT-GAN model, all images were resized to dimensions of $128 \times 128 \times 128$, which ensures that the input data maintains consistency in shape. For our generative model, we specifically utilized longitudinal data, where baseline images served as the input, and images taken two years later were generated as output. In the ADNI dataset, longitudinal processing creates a within-subject template and initializes each time point with the template to reduce individual variability. This design allows the network to effectively learn the changes that occur over time, thereby improving its predictive capability. By employing these pre-processing steps and standardization techniques, we aimed to mitigate inter-scanner variability, thereby enhancing the reliability and validity of our results.

The generative model consists of a 3D Vision Transformer Encoder, a 3DCNN Decoder for the generator, and a 3DCNN for the Discriminator part. The details of the model are shown in Fig. 3. According to Fig. 3, $128 \times 128 \times 128$ input image is transferred into some patches, and the patch embedded is a 3D CNN. The output is then flattened, transposed, and fed into the Transformer Encoder. The Transformer Encoder includes LayerNorm, Multi-Head Attention, and Multi-Layer Perceptron. The output of Vit is fed into the 3DCNN Decoder, which includes 3D transpose convolution, Instance Normalization, and ReLU activation function. The Discriminator is a 3DCNN-based classifier consisting of downsampling layers, a Leaky ReLU activation function for each hidden layer, and a softmax activation function for the classification layer. Based on the original GAN network, two loss functions, Adversarial loss and reconstruction loss, are used in this network, which are introduced in Eq. 2 and Eq. 3, respectively. According to these equations, x is the original 3D MRI Volume from baseline, whereas z is the original output 3D MRI Volume for the next two years. In this equation, D is the Discriminator function and G is the generator function.

$$\begin{aligned} L_D= & \sum _{z \in d_{\text {output}} x \in d_{\text {input}}} \left( 1 - \log {D(z)} \right) + \log {D(G(x))} \end{aligned}$$

(1)

$$\begin{aligned} L_G= & \sum _{z \in d_{\text {output}} x \in d_{\text {input}}} \left( 1 - \log {D(G(z))} \right) + \left\| G(x) - z \right\| \end{aligned}$$

(2)

Alzheimer’s disease prediction

Since the data showing the conversion from CN to AD is not enough to train a classification model, we proposed a method that combines CN to MCI and MCI to AD for Alzheimer’s disease prediction from cognitively normal patients. In the proposed method, based on Fig. 4, we estimate the probability of CN to MCI conversion using the baseline MRI Image. After that, we generate the MRI volume of the subject from the baseline MRI volume. At this phase, we estimate the probability of MCI to Alzheimer’s disease conversion from the generated images. The process of the AD probability estimation is in Algorithm 2. For this purpose, the generated MRI images are fed into the sMCI and pMCI classification model introduced in the ’Ensemble Transfer Learning’ part. We also use an ROI-based model for AD probability prediction. Since the ROIs for each stage in Alzheimer’s phases are different, automatic ROI extraction is proposed in⁴⁶. Based on this method, ROIs are extracted from MRI images using the Explainable-AI-based method (Grad-CAM) for each patient. ROIs of each image are extracted from the feature importance obtained from the Grad-CAM algorithm. According to this method, we first fed MRI images into the sMCI/pMCI classification model which is a 3DCNN model, after that, feature weights for each feature map, $w_f$, are obtained using the gradient of the last layer of the network (before softmax layer) respect to each feature map of the last CNN layer based on Eq.4. After that, according to Eq.5, the feature importance, Heatmap of the image, is obtained using applying Global Average pooling on weights of feature maps. The details of used 3DCNN model is demonstrated in Table A1 in Appendix-1.

$$\begin{aligned} W_f= & \sum _i \sum _j \frac{\partial S_C}{\partial C_{ij}^f} \end{aligned}$$

(3)

$$\begin{aligned} H= & \text {ReLU} \left( \sum _f \left( \frac{1}{N} W_f \right) \times C_f \right) \end{aligned}$$

(4)

Based on⁴⁶, we calculate a mask for each image in which the most important parts of the image get a value of 1 and the pixel value of the other parts is 0. Finally, to obtain the ROI, the mask is multiplied by the MRI image.

To model the probability of transitioning from a cognitively normal (CN) state to Alzheimer’s Disease (AD) through the mild cognitive impairment (MCI) stage, we use conditional probabilities. We start with the law of total probability for the event of developing AD from a CN state (Eq. 6):

$$\begin{aligned} P(\text {AD} | \text {CN}) = P(\text {AD} \cap \text {MCI} | \text {CN}) + P(\text {AD} \cap \lnot \text {MCI} | \text {CN}) \end{aligned}$$

(5)

where, $P(\text {MCI} | \text {CN})$ is the probability of developing MCI given that a person is currently cognitively normal, $P(\text {AD} | \text {MCI})$ is the probability of developing AD given that a person is currently in the MCI stage, $P(\text {AD} | \text {CN})$ is the probability of developing AD given that a person is currently cognitively normal.

Given that the direct transition from CN to AD without passing through MCI is low⁵⁰, we assume:

$$\begin{aligned} P(\text {AD} \cap \lnot \text {MCI} | \text {CN}) \approx 0 \end{aligned}$$

(6)

Thus, the equation simplifies to:

$$\begin{aligned} P(\text {AD} | \text {CN}) \approx P(\text {AD} \cap \text {MCI} | \text {CN}) \end{aligned}$$

(7)

Using the definition of conditional probability:

$$\begin{aligned} P(\text {AD} \cap \text {MCI} | \text {CN}) = P(\text {AD} | \text {MCI} \cap \text {CN}) \cdot P(\text {MCI} | \text {CN}) \end{aligned}$$

(8)

Since MCI and CN are mutually exclusive states and knowing that MCI is a prerequisite stage before AD, we have:

$$\begin{aligned} P(\text {AD} | \text {MCI} \cap \text {CN}) = P(\text {AD} | \text {MCI}) \end{aligned}$$

(9)

Thus, the equation becomes:

$$\begin{aligned} P(\text {AD} \cap \text {MCI} | \text {CN}) = P(\text {AD} | \text {MCI}) \cdot P(\text {MCI} | \text {CN}) \end{aligned}$$

(10)

Combining the above, the final probability of transitioning from CN to AD is given by:

$$\begin{aligned} P(\text {AD} | \text {CN}) = P(\text {MCI} | \text {CN}) \cdot P(\text {AD} | \text {MCI}) \end{aligned}$$

(11)

As we mentioned before, a generative model is used for generating the MRI image of the brain two years into the future based on the current MRI and other relevant features; so let $\hat{I}^{t+2}$ denotes the generated MRI image of the brain two years from now. The generated MRI image $\hat{I}^{t+2}$ is used to predict the probability of AD; Let $P(\text {AD} | \hat{I}^{t+2})$ be the probability of developing AD given the generated image. This probability can also be obtained from the softmax output of the second neural network model. To find the overall probability of progressing from CN to AD, we need to consider the conditional probability $P(\text {AD} | \text {CN})$. Using the law of total probability and the conditional probability, we can express $P(\text {AD} | \text {CN})$ as:

$$\begin{aligned} & P(\text {AD} | \text {CN}) = P(\text {MCI} | \text {CN}) \times P(\text {AD} | \text {MCI}) \end{aligned}$$

(12)

$$\begin{aligned} & P(\text {AD} | \text {MCI})\leftarrow P(\text {AD} | \hat{I}^{t+2}) \end{aligned}$$

(13)

Since the generated image represents the brain state of an MCI subject after two years.

Probability calibration using isotonic regression

To improve the calibration of the predicted probabilities, we applied isotonic regression. Isotonic regression is a non-parametric method that fits a piecewise constant, monotonic function to the predicted probabilities, adjusting them to better match the true probabilities observed in the data. This method is particularly useful for handling complex relationships not well-captured by parametric models like logistic regression. The isotonic regression model is trained on a calibration set separate from the training and test sets used for the main prediction tasks. Isotonic regression ensures that the fitted probabilities are monotonic, i.e., if $p_i$ and $p_j$ are predicted probabilities and $p_i \le p_j$, then the calibrated probabilities $q_i$ and $q_j$ will satisfy $q_i \le q_j$. The isotonic regression problem can be formulated as Eq. 14:

$$\begin{aligned} \min \sum _{i=1}^n (y_i - q_i)^2 \quad \text {subject to} \quad q_i \le q_j \text { for all } i < j \end{aligned}$$

(14)

where $y_i$ are the true labels, and $q_i$ are the fitted probabilities.

Results

CN to MCI progression

As we mentioned in the proposed method, the first phase of the proposed method is MCI progression prediction. In this phase, MRI images of the baseline as well as Demographic features (including Age, Gender, Marital Status, and Education) as metadata, are fed into the model to predict MCI. To train the model, we use data from cognitively normal subjects (CN) who do not convert to MCI as class 1 and CN subjects who convert to MCI (not to AD) in up to three years, in addition to sMCI patients as class 2. The number and information of subjects are displayed in Table 1. This study is longitudinal; two years of MRI images of subjects are used for training the model. 644 and 640 MRI volumes are used as training sets for class 1 and class 2, respectively. Also, Demographic features such as Age, Gender, Education, and Marital status from baseline are added as extra features.

Implementation details

The proposed CN to MCI prediction model comprises two fine-tuned models that are ensembled to obtain the final result. The first model consists of three Residual Blocks, each Block includes two 3D CNN networks with a kernel size of 3$\times$3$\times$3, an Elu activation function, Batch Normalization, a concatenation layer, and a 2$\times$2$\times$2 Max-pooling. Also, there are two Attention Blocks that have attention layers in addition to the ResNet Blocks. The number of features in the Residual Blocks is eight, 32, and 128, and the number of features for the Attention Blocks is 16 and 64, respectively. Moreover, another fine-tuned model, the 3D CNN model, comprises three 3D CNN blocks, each block consists of 3D convolutional layers with a kernel size of 3$\times$3$\times$3, the LeakyReLU activation function, Max-Pooling with a size of 2$\times$2$\times$2, and dropout with a rate of 0.3. The number of features in the 3D CNN Blocks is 32, 64, and 64respectively. In the end, in both models, the output of the last 3D CNN Block is fed into the flatten Block to train and fine-tune, which consists of a flatten layer, Batch Normalization, Leaky Relu, a Dense layer with dimension 32, and a softmax activation function. The Adam optimization algorithm is employed with a learning rate of 0.001, in conjunction with the Cross Entropy loss function. The convergence is achieved after 100 epochs. The usage network for this part of the proposed method is the Keras library in Python 3.11 using Tesla T4 GTX and Intel(R) Xeon(R) CPU @ 2.20GHz.

CN to MCI classification

In our classification task, we applied a threshold of 0.5 to determine the final class labels as Eq. 15, and obtained evaluation metrics such as Accuracy, Precision, Recall, and F1-score based on Eq. 16 to Eq. 19. The decision rule is as follows:

$$\begin{aligned} \hat{y} = {\left\{ \begin{array}{ll} 1 & \text {if } P(y=1|X) \ge 0.5 \\ 0 & \text {if } P(y=1|X) < 0.5 \end{array}\right. } \end{aligned}$$

(15)

where $\hat{y}$ is the predicted class label, and $P(y=1|X)$ is the predicted probability of the positive class given the input features $X$. This threshold is chosen because it balances the trade-off between precision and recall, and is commonly used in binary classification tasks.

$$\begin{aligned} \text {Recall}= & \frac{TP}{TP + FN} \end{aligned}$$

(16)

$$\begin{aligned} \text {Precision}= & \frac{TP}{TP + FP} \end{aligned}$$

(17)

$$\begin{aligned} F1= & 2 \times \frac{\text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \end{aligned}$$

(18)

$$\begin{aligned} \text {Accuracy}= & \frac{TPR + TNR}{2} = \frac{TP + TN}{TP + FN + TN + FP} \end{aligned}$$

(19)

To obtain the best performance, we investigated some different situations and compared the results obtained from them in CN to MCI progression as an ablation study. For example, in Table 2, in the first row, Demographic features are used as the only biomarker for the classification of stable CN and CN to MCI conversion. Some models, such as Logistic Regression, SVM, Decision Tree, Random Forest, and XGBoost, are trained on these data, and XGBoost obtained the best results as shown in Table 2. In the second row of Table 2, the results obtained from the proposed model, a combination of two introduced fine-tuned models, using MRI volume as a biomarker, are displayed. The results of the proposed method on the whole data, a combination of MRI volume and Demographic features, are in the last row of Table 2. The results show that the best performance is for multimodal classification. Also, the comparison between the second and third rows shows that using demographic features as well as MRI images increases the performance of the classification.

Table 2 The comparison of results for using demographic features as the only biomarker, MRI images as the only biomarker, and a combination of these data on test data includes 68 sCN and 69 pCN.

Full size table

Table 3 shows a comparison of the results obtained from each fine-tuned model, 3D ResNet Attention, and 3D CNN. The results show that 3DCNN has obtained better performance than 3D ResNet Attention and the combination of these two methods has received the best result. As seen in Table 3, Ensemble Deep Learning, a combination of two introduced fine-tuned models, has increased by about two percent.

Table 3 A comparison of the results obtained from each fine-tuned model, 3D ResNet Attention, 3D CNN, and Ensemble learning on test data (MRI image and Demographic Features) includes 68 sCN and 69 pCN.

Full size table

Interpretablity of features

This section uses the XGBoost feature importance algorithm and LIME (Local Interpretable Model-agnostic Explanations) model to rank demographic features. The results of the XGBoost feature importance algorithm are shown in Fig. 5, and the results of the LIME model are displayed in Fig. 6. According to the LIME model, unlike XGBoost feature importance, which ranks features based on the whole test data, feature importance is obtained for each data. In Fig. 6, the results of two random samples, one for class 1 and one for class 2, are displayed. As the results show, Age and Gender are the most important features based on both algorithms. The difference between the two usage models is in two other features. The XGBoost model gives rank three to Marital Status, whereas in the LIME, Education is more important than Marital Status.