Abstract
Alzheimer’s disease (AD), one of the most widespread neurodegenerative disorders, can be mitigated through early recognition and treatment. Recent research has shown that multimodal fusion is effective for the early-stage diagnosis of AD. However, most existing methods do not adequately account for the differences in data modality domains, their interconnections, and relative relevance.In this paper, we introduce a robust Intra-scale Interaction and Cross-scale Fusion Network (ISI-CSFN) for AD progression detection. The proposed model employs a linearized convolutional attention module to enable interaction between global information captured by the Cascaded Transformer (CTransformer) and local features extracted by the Depthwise Separable Convolution Network (DSCN). This mechanism enhances the discriminative ability for AD progression detection by allowing each modality-specific branch to incorporate complementary contextual representations from the others while maintaining the integrity of its own features.Furthermore, the model integrates background (BG) information with multimodal temporal data for simultaneous prediction of several cognitive score variables. The proposed method achieves strong results for both regression and multi-class progression tasks. Specifically, the accuracy of our approach is 97.26% for NC vs. AD, 89.25% for NC vs. sMCI, and 84.74% for NC vs. pMCI classifications. In addition, the model attains the highest correlation coefficient and the lowest root mean square error (RMSE) across several clinical score regression tasks.
Introduction
Alzheimer’s disease (AD) is the most common neurodegenerative disorder, influencing nearly 60% of individuals over the age of 601. Neurodegeneration in sporadic AD is defined by gradual brain atrophy and anatomical abnormalities2. AD is portrayed by both movement-related and systemic manifestations because of the progressive death of Neurons involved in dopamine transmission in different areas of the brain3. Even though there is no known cure for AD, cognitive damage can be postponed in the early stages. To ensure a higher standard of living of patients and to improve prevention and treatment, it is very important to identify AD in its early stages and identify its pathological biological markers4.
Mild cognitive impairment (MCI) is a transitional stage between cognitively normal (CN) state and AD5. MCI is diagnosed based on memory impairment on standard tests when there is no discernible impairment in everyday life activities or dementia. According to severity, MCI can be categorized as either progressing (pMCI) or stable (sMCI)6,7. A number of machine learning techniques are currently being used for diagnosing AD and identifying relevant disease biomarkers based on neuroimaging data from MRI, fMRI, PET, CT, EEG, SPECT, and MEG8,9,10,11,12. The positron emission tomography (PET) and magnetic resonance imaging (MRI) have been extensively used in the literature to identify AD and cognitive decline13,14. However, in the early stages, even when indicators are clearly visible on MRI and/or PET data, it is still difficult to detect progression from MCI to AD dementia in clinical practice15.
The number of research on automated MCI conversion prediction methods has increased in recent years. Some focused on short-term prognosis, although it provided encouraging outcomes, was of little therapeutic importance because it was linked to subsequent treatments and it could not repair the neurological damage. Some conventional machine learning (ML)-based techniques in long-term predictive studies required complex feature engineering, which led to the exclusion of significant pathological signals16,17. These drawbacks can be addressed by end-to-end deep neural network (DNN) techniques, which also have a lot of potential for clinical decision assistance.
Deep learning-based models outperform classical models in AD detection tasks because of their remarkable nonlinear mapping capacity, which allows them to capture intricate texture features and fine image details18. For example, high-level abstract characteristics are extracted from multimodal or longitudinal neuroimaging data using convolutional neural networks (CNNs), stacked autoencoders (SAEs), and deep belief networks (DBNs)19,20,21. These models facilitate the identification of pathological changes by generating feature vectors or differential representations between time points or systems. The generated features are then added to clustering or classification structures to diagnosis Alzheimer’s disease. However, even though the aforementioned DNNs are fully utilized, models that rely only on single biomarkers are not sufficient to predict MCI change22.
Furthermore, single-task models cannot provide clinical professionals with relevant information about the patient’s potential cognitive behavior during progression. According to some research, estimating MCI development and some cognitive tests, such as the Mini-Mental State Examination (MMSE) and the Alzheimer’s Disease Rating Scale (ADS), are the signs of progression of AD23. Recent research on multimodality data has gathered a lot of interest since it can effectively overcome the drawbacks of a unimodal data, increasing the precision of prediction models used in disease detection24. Multi-level features extracted from MRI and PET images, as well as features that differ between the two modalities, are combined using an attention mechanism in multimodal data analysis. This is then utilized for feature reconstruction related to disease25,26. However, when attention is paid to reweighting the linked data across channel or spatial dimensions, it is still difficult to capture long-range relationships across methods. Combining local and global features allows deep learning models to capture fine-grained, region-specific changes (local features) as well as overall brain-level patterns (global features), which is critical for accurately detecting early-stage Alzheimer’s disease and differentiating it from normal aging27.
Even though earlier methods have shown encouraging outcomes, diagnosing AD with multimodality data still presents several difficulties. One major issue is that these methods frequently ignore interactions across multimodal data scales in favour of concentrating on interactions between Multi-modal inputs at one resolution. These arguments encourage us to suggest a new fusion network for the identification AD. The major contributions of this paper are provided as:
-
To propose a new dual-backbone framework that combines Depthwise Separable Convolution Networks (DSCN) and Cascaded Transformers (CTransformer) for efficiently extracting complementing features from MRI and PET images.
-
To suggest a novel Intra-Scale and Cross-Scale Fusion Mechanism that facilitates the efficient integration of multi-scale data and rich interactions between modality-specific aspects at the same scale. This could improve the discriminatory power of combined traits for detecting the progression of AD.
-
To integrate the FreeSurfer-derived anatomical characteristics and baseline demographic data into a single fully integrated architecture for boosting the performance across classification and regression tasks linked to AD development.
The rest of the paper is structured as follows: Sect. 2 reviews the existing papers on AD progression detection. Section 3 details about the proposed method. Section 4 validates the proposed approach using simulations. Section 5 provides the conclusion.
Literature survey
The effectiveness of AD prediction models with missing value imputation was influenced by the manually or statistically calculated static values such as zeros and means. Some of the missing value imputation are Mean imputation with LSTM (LSTM-M), Forward imputation with LSTM (LSTM-F), Zero imputation with Peephole LSTM (PLSTM-Z), Multi-directional RNN (MRNN), and Model filling with MinimalRNN (MinimalRNN)28,29,30. In Alzheimer’s data, these models frequently introduced bias or skew temporal dynamics. Sequence modeling was improved by more complicated models like MRNN and MinimalRNN, however they may have limited representation power, excessive complexity, or a danger of data leaking.
A new computational approach was presented by Jung et al.31 that can forecast cognitive scores, disease progression pathways, and MRI phenotypic biomarkers at several future time points. However, this usually encountered a large number of unexpected observations when using temporal data. These missing values can be identified by calculating temporal and multi-variable correlations in the time series data. To address the problems of missing value imputation, phenotypic measurement forecasting, cognitive score progression modeling, and patient health status forecasting based on time-series imaging markers, a deep recurrent network (DRN) was specifically designed. Particularly, well-defined loss function was used to train the hyperparameters of every models using the morphological features and cognitive scores as input.
Zuo et al.32 suggested a unique model called the cross-modal transformer generative adversarial network (CT-GAN) to successfully combine the structural and functional information found in diffusion tensor imaging (DTI) and functional MRI imaging (fMRI). From multimodal imaging data, the CT-GAN efficiently learned topological characteristics and provided multimodal connectivity from start to finish. Additionally, the shifting bi-attention mechanism was created to successfully improve the complementing features between modalities and gradually align shared features. The CT-GAN model detected AD-related brain connections by examining the produced connectivity features. But data modality types, their linkages, and differences in their relative relevance have not been taken into account.
Lu et al.33 have illustrated the importance of multimodal fusion in early AD identification by presenting a Hierarchical Attention-Based Multimodal Fusion framework (HAMF). It used clinical, genetic, and imaging data to detect AD early. Additionally, it made use of deep neural networks, stacked denoising autoencoders, and CNNs. Through hierarchical attention, the HAMF framework used attention strategies to acquire the proper weightage for each data source and comprehend how modalities interact. Sheng et al.34 suggested a multimodal ML architecture for improved AD characterisation by fusing cerebrospinal fluid (CSF), PET, and MRI. For concurrent feature selection and categorization, this model utilized a hybrid method that combined the Kernel Extreme Learning Machine (KELM) classifier with an improved Harris Hawks Optimization (HHO) algorithm known as ILHHO.
Abdelaziz et al.35 created a multimodal and multi-scale DL (MMSDL) that successfully takes advantage of the relationship between the multi-source and multi-resolution features of the neurological imaging data. Initially, they embedded every resolution level of the multimodal images using a CNN. Then both multi-head self-attention and multi-head cross-attention were used in a multimodal scale fusion process. It improved scale-aware feature extraction and correlation of MRI and PET images by capturing global relations between the embedding features and weighing the contribution of each modality to another. After that a cross-modality fusion module was presented for promoting global features from the earlier attention layers and merge MRI and PET image at different scales. In order to distinguish between the different stages of AD, all of the elements from each scale were finally combined. When compared to other traditional deep learning models such as ResNet, ResNext, MobileNet, ShuffleNetv2, and EfficientNet, this MMSDL produced the greatest results.
Recent progress on Alzheimer’s disease (AD) diagnosis has explored several deep learning architectures integrating multimodal data to enhance efficiency and accuracy. The AD-Transformer36 utilizes a single transformer-based architecture that encodes sMRI, clinical, and genomic data through a Patch-CNN for image tokenization and a linear projection layer for non-image data to realize state-of-the-art AUC scores on prediction of AD and MCI. Alternatively, through the integration of multimodal MRI and PET information through middle-fusion with cross-attention mechanisms, light-weight cortical surface-based models37 provide computationally efficient substitutes for traditional volumetric methods.Through integrating cross-sectional and temporal data between MRI, PET, and CT information through attention-driven feature fusion, lightweight topologies, and GAN-based data augmentation, FusionNet38 significantly enhances the accuracy of early AD diagnosis. EffiSwin-MCI39 merges EfficientNet and Swin Transformer with sliding-window spatial and temporal attention for discovering progressive neurodegenerative changes from longitudinal sMRI scans, surpassing current CNN and Transformer models to deal with temporal dynamics.
According to this analysis, current models for AD detection have achieved great advances in multimodal fusion (e.g., CT-GAN, HAMF, MMSDL) and handling of missing data (e.g., DRN, MRNN, MinimalRNN). However, they still have serious drawbacks, such as inadequate cross-scale and cross-modal interaction, a lack of specialized backbone architectures for modality-specific strengths, and end-to-end fusion insufficiencies. The majority of existing fusion frameworks (such as CT-GAN, HAMF, and MMSDL) do not specifically represent intra- and inter-scale interactions between fine-grained and global data across MRI and PET modalities. Techniques such as DRNs and LSTM-based imputation models can analyze time-series data, but they are not able to fully utilize spatially detailed neuroimaging information, especially functional patterns from low-resolution PET and local anatomical structures from high-resolution MRI. In general, existing methods do not take advantage of modality-specific network architectures, lack strong cross-scale and intra-modality interactions, and frequently do not include domain information from pre-processing tools such as FreeSurfer. These drawbacks highlight the requirement for more context-aware, scalable, and specialized frameworks. The proposed ISI-CSFN tackle these issues, through dedicated intra- and inter-scale feature modeling, modality-tailored backbones, and unified end-to-end learning.
Proposed method
In general, neurodegeneration is the progressive loss of a neuron’s structure or function. It occurs in a variety of illnesses, including Parkinson’s and Alzheimer’s. Among neurodegenerative diseases, AD is the most prevalent. Additional information about the brain is provided by multimodal neuroimaging data, such as PET and MRI. In this paper, a novel Intra-scale Interaction and Cross-scale Fusion Network (ISI-CSFN) is proposed for AD progression detection. First, the ADNI 1, ADNI GO, and ADNI 2 datasets are used to access the data. Next, the FreeSurfer is utilized for extracting the neuroimaging features of the MRI and PET modalities. Additionally, two sub-networks including Depthwise separable convolution network (DSCN) and cascaded Transformers (CTransformer) are used to concurrently extract fine-grained local features and global features from pre-processed MRI and PET, respectively as shoen in Fig:1. DSCNs effectively capture local spatial patterns through convolutional filters, so they are suitable for MRI images. CTransformers model long-range dependencies over the entire brain without depending on local structure, so they are more appropriate for PET imaging.
Furthermore, an intra-scale cross-interaction and an inter-scale feature fusion module are introduced for successfully fusing multiple scale feature maps mined from the two backbones. It can fuse feature maps of various scales and impose interaction of various feature maps at the same scale. Finally, the feature maps are combined with the baseline background data and the features extracted by FreeSurfer. The classification task is then handled by the Softmax, and the regression task by the Sigmoid as in Fig. 2.
Data pre-processing
Initially, the input MRI and PET images are pre-processed using FreeSurfer to extract neuroimaging features. FreeSurfer is a neuroimaging tool that divides the brain into anatomical regions (such as cortical and subcortical ROIs). Morphometric features such as cortical thickness, surface area, volume, depth, and so on will be retrieved for each cortical and subcortical ROI. The collected features and baseline data are then pre-processed to remove missing data and standardize the data.
Missing data handling: The features retrieved from freesurfer and bassline data are numerical and unrestricted. Hence, the missing values are processed appropriately. For the baseline static data, features with excess of 35% missing elements was removed. Next, the k-nearest neighbors (KNN) algorithm is utilized for imputing missing values40, which were then substituted with data from other subjects with the same diagnosis.
Data standardization: The accessible participant data for the baseline and time series are of varied magnitudes. It is difficult to achieve convergence when training an ML model using this data. The z-score method was utilized to standardize features in the data so that they all had the same amount of relevance. The z-score approach reduces outliers by converting data to a 0 mean and unit standard deviation.
Intra-scale interaction and cross-scale fusion network (ISI-CSFN)
This section introduces ISI-CSFN framework that primarily composed of the three steps as shown in Fig. 3. Initially, two sub-networks are used to extract highly detailed local features and label-level global features in parallel: the depth-wise separable convolution (DSCN) network and the cascaded transformer (CTransformer). DSCN and C-Transformer are used for MRI and PET feature extraction, respectively, because each modality has distinct characteristics. MRI data benefit from DSCN’s ability to capture local spatial patterns, while PET data, being more functional and temporal in nature, are better represented using C-Transformer, which can model long-range dependencies effectively. Using a single architecture for both modalities may not fully exploit these modality-specific properties, potentially limiting performance. Furthermore, an ISI-CSFN module are suggested to fuse feature maps of various scales and apply contact of various feature maps at the same scale. These modules can successfully fuse multi-scale feature maps mined from the two backbones.
Multi-scale local and global pattern mining
In this paper, two networks namely convolutional networks and Transformers are used simultaneously for obtaining multiple scale local and global feature maps, correspondingly. Especially, the widely used DSCN41 is used to obtain the multi-scale feature maps in the CNN branch and the enhanced CTransformer in the Transformer branch. Fig. 4 CNNs, frequently use the DSCN to lower the computational load of a typical multiple channel convolution. The depthwise convolution kernel and the pointwise convolution kernel are the two distinct kernels that are created from a normal 2D convolution kernel. When compared to a conventional convolution, the DSC minimizes computation by requiring fewer input changes as in Fig. 3.
The CTransformer uses patch-wise linear embedding to project images into token sequences. The network design of the CTransformer is depended on the well-known Swin Transformer, although it includes numerous enhancements to compensate for the shortcomings of the conventional Swin Transformer. As shown in Fig. 5, the normalization layer, activation function, and spatial data fusion are different from the original Swin transformer module. The concatenating layer and residual learning are replaced by the SK fusion and soft reconstruction layers. Inspired by SKNet42, the Selective Kernel Fusion (SKF) layer uses channel attention to fuse several branches.
The normalization layer is essential to deep learning model because it steadies the network’s training. Nevertheless, LayerNorm continues to eliminate the feature map’s mean and standard deviation. Therefore, Rescale Layer Normalization (RescaleNorm) is introduced and it adds the mean and standard deviation at the residual block’s output stage. RescaleNorm can be expressed as follows:
Where \(\:y\) represents the feature map resolution. µ and σ represent the mean and standard deviation, \(\:\delta\:\) and \(\:\alpha\:\) are trained scaling parameter and bias. It utilizes two linear layers with weights \(\:{\omega\:}_{\delta\:}\), \(\:{\omega\:}_{\alpha\:}\)and biases \(\:{\beta\:}_{\delta\:}\), \(\:{\beta\:}_{\alpha\:}\)for transforming µ and σ through \(\:\left(\sigma\:{\omega\:}_{\delta\:}+{\beta\:}_{\delta\:}\right)+(\mu\:{\omega\:}_{\alpha\:}+{\beta\:}_{\alpha\:})\).
Despite having derived GELU, ReLU, and LeakyReLU, some new transformer-based image processing networks pose a gradient inversion problem due to their non-singularity. Hence, SoftReLU is suggested and it is a straightforward smooth approximation to the ReLU:
where \(\:\epsilon\:\) is a shape parameter. Swin Transformer implements effective batch computing for shifted window partitioning using cyclic shift with masked MHSA. This mask changes the window size along the image boundary to be less than the specified window size. This work suggests using reflection padding to accomplish effective batch computing for shifted window subdivision without creating unwarranted inter-patch interactions.
Intra-scale interaction
Even though the feature maps produced by the two pathways are inherently in tensor dimension, the cross-attention calculations need a series of vectorized input. Every feature map must be anticipated into 3 transitional elements, i.e. query\(\:\stackrel{-}{Q}\), key \(\:\stackrel{-}{K}\)and\(\:\stackrel{-}{V}\), utilizing the spatially normalized features\(\:{F}_{D}\in\:{\mathfrak{R}}^{\hslash\:\times\:w\times\:{\mathbb{C}}_{1}}\)and \(\:{F}_{T}\in\:{\mathfrak{R}}^{\hslash\:\times\:w\times\:{\mathbb{C}}_{2}}\), and the three learned linear transformations\(\:{P}^{\stackrel{-}{Q}}\), \(\:{P}^{\stackrel{-}{K}}\) and \(\:{P}^{\stackrel{-}{V}}\). In the proposed model, \(\:{F}_{D}\)and \(\:{F}_{T}\) are processed together through parallel operations. In particular, the 1 × 1 convolution process is carried out for unifying and minimizing the channel size, followed by the traditional positional embedding process to give the positional data. The tensor flattened \(\:{\stackrel{-}{Q}}_{x}\) from DSCN and CTransformer are alternately utilized to complete feature fusion in order to enforce the data collaboration between \(\:{F}_{D}\) and \(\:{F}_{T}\). The same self-care mechanism module is used for simplicity. It should be noted that the traditional process uses Eq. 3 to formulate a scaled dot-product operation.
Where \(\:T\)denotes the transpose operator, the input series \(\:\stackrel{-}{Q},\:\stackrel{-}{K},\stackrel{-}{V}\in\:{\mathfrak{R}}^{M\times\:m}\), \(\:M=\hslash\:\times\:w\)and \(\:m\) is the condensed size. The suggested model achieves collaborative proliferation between the feature maps of the DSCN branch and the CTransformer branch on the same resolution level by factorizing the attention map and computing the second matrix multiplication (key \(\:\:\stackrel{-}{K}\)and value \(\:\stackrel{-}{V}\)). Equation (4) provides the specific operation.
Here softmax(·) is applied element-wise on all tokens of the array.
Additionally, the suggested model added a Relative Position-Aware Convolution (RPC) to the Conv Attention module, which means that the query and the adjacent features influence a token’s output. In particular, as demonstrated by Eq. 5, the proposed model describes the position-centric interactions between \(\:\stackrel{-}{Q}\) and \(\:\stackrel{-}{V}\) using the exceptional instance of relative position encoding implemented with depthwise convolution.
where \(\circ\) denotes the Hadamard product. The full Linear convolutional attention layer consists of 2 sub-modules, namely linear attention and RPC in order to accomplish the collaborative interaction of various branches and produce multiple scale interactive feature maps \(\:{F}_{1}\)and \(\:{F}_{2}.\) The whole structure is defined is as follows:
Here a Conv Attention layer is employed for strengthening interaction effectiveness. The Fusion unit aggregate the two outputs \(\:{F}_{1}\) and \(\:{F}_{2}\) after cross-interaction to further explore the extraction of admirable features and eliminate the interference of inappropriate features.
Cross-scale fusion network
A masked aggregation unit is used by this cross-scale fusion network to draw attention to relevant properties of interest in the reconstructed features \(\:{F}_{1}\)and \(\:{F}_{2}\). Here, the feature maps are convolved using Hadamard product to generate mask maps \(\:{MK}_{1}\)and \(\:{MK}_{2}\). It supresses vagueness introduced by the unnecessary features and activating the task-specific ones.
where \(\:{F}_{L}\) and \(\:{F}_{H}\) represent the lower-scale and higher-scale feature maps. \(\:Conv\), \(\:BNorm\), \(\:ReLU\), \(\:UpSamp\), and \(\:Concatenate\) stand for the convolution operation, batch normalization, activation function, upsampling technique, and aggregation. Furthermore, \(\:{g}_{1}\) is generated by applying \(\:{MK}_{1}\)attained from the \(\:{F}_{L}\)to \(\:{F}_{H}\), with an emphasis on gathering features such boundaries and spatial organization. On the other hand, background noise may affect the learned features. Additionally,\(\:\:{MK}_{2}\) is derived from the \(\:{F}_{H}\)to add rich semantic information to \(\:{F}_{L}\), helping to identify and reduce noise for \(\:{g}_{2}\). Consequently, the output \(\:g\) integrates a number of admirable attributes into multi-scale features.
Also, a spatial alignment unit is presented using the attention strategy to develop \(\:{F}_{3}\)through the fusion of \(\:{F}_{1}\) with local features and\(\:{F}_{2}\)with global features. The total spatial attention distribution is modelled by merging the various scale features \(\:{F}_{L}\) and \(\:{F}_{H}\)
Where\(\:\:\gamma\:\) stands for the attention coefficient and \(\:sig\) is the activation function. Target areas in\(\:\:\gamma\:\) end to have values of one, whereas background regions tend to have values of zero. These regions are in charge of capturing task-related aspects and reducing the interference of unimportant information.
Multi task cost function for ISI-CSFN
The suggested ISI-CSFN employs CTransformer for global features and DSCN for fine-grained local features from input PET and MRI images, respectively. For deeper feature learning, the learnt features from these blocks are then fed into two dense layers. Three dense layers then combine the output of the all streams to create deeper and more recognizable features. Furthermore, baseline information is used to improve the precision and assurance of the training procedure. This baseline information includes patient’s demographics and a few statistical characteristics taken from patient’s longitudinal time series data. A feed-forward neural network is utilized for extracting additional deep and unique features from the baseline data independently. To discover finer features for the categorization and regression tasks, a pair of shared dense layers fuses the output of these two-feature extraction phase.
The suggested model simultaneously learns a number of related tasks, such as four regression problems and a multiple class classification problem. We anticipate that this kind of data is essential for doctors to belief the outcomes of DL frameworks. The proposed framework makes use of shared parameters of the DL model for multitask learning.
The suggested DL framework uses a set of modalities \(\:N\) to mutually learn a set of associated tasks\(\:\:S\). Let \(\:P=\left\{{P}^{\left(1\right)},\dots\:,{P}^{\left(N\right)}\right\}\) represent \(\:N\) modalities of data and \(\:S=\left\{{s}^{\left(1\right)},\dots\:,{s}^{\left(L\right)}\right\}\) represent multiple tasks to be learned. \(\:{S}^{\left(i\right)}=\left\{{s}_{1}^{\left(i\right)},\dots\:,{s}_{M}^{\left(i\right)}\right\},\:i\in\:\{1,\dots\:,L\}\) describes every \(\:i\)th task. Every modality \(\:{P}^{n}\)is described as \(\:{P}^{n}=\left\{{p}_{1}^{\left(n\right)},\dots\:.{p}_{j}^{\left(n\right)},\dots\:{p}_{M}^{\left(n\right)}\right\}\)for \(\:M\) patient samples, each of which is a multivariate time series \(\:{p}_{j}^{\left(n\right)}=\left\{{p}_{j{t}_{1}}^{\left(n\right)},\dots\:.{p}_{j{t}_{2}}^{\left(n\right)},\dots\:{p}_{j{t}_{g}}^{\left(n\right)}\right\}\) for \(\:t=\text{1,2},\dots\:T\) time steps and \(\:g\)sets of univariate time series.
Let \(\:\left\{{p}_{j}^{\left(1\right)},\dots\:,{p}_{j}^{\left(n\right)},\dots\:.{p}_{j}^{\left(N\right)},{s}_{j}^{1},\:{s}_{j}^{2},\dots\:\:{s}_{j}^{L}\:\right\},\:j=1,\dots\:.,M,\:{s}_{j}^{1}\in\:\left\{\text{1,2},\text{3,4}\right\};\:j\in\:\{2,\dots\:.L\}\)be the label for the \(\:t\)th task of \(\:M\) patients. Here, both task-specific (\(\:{\varnothing\:}^{t}\)) and shared \(\:\left({\varnothing\:}^{s}\right)\) parameters need to be improved. The task-specific loss functions are \(\:{\mathbb{L}}^{t}\left(.,.\right):{s}^{\left(t\right)}\to\:{\mathfrak{R}}^{+}\)and the parametric theory for every task is\(\:{g}^{t}\left(p,{\varnothing\:}^{s},{\varnothing\:}^{t}\right):P\to\:{s}^{\left(t\right)}\). Gradient-based multiobjective optimization of task-dependent losses is a general optimization problem.
Where \(\:{\widehat{\mathbb{L}}}^{T}(.,.)\) denotes an application-driven loss function described as \(\:{\widehat{\mathbb{L}}}^{T}\left({\varnothing\:}^{s},{\varnothing\:}^{T}\right)=\frac{1}{M}\sum\:_{j}\mathbb{L}({g}^{T}\left({p}_{j};{\varnothing\:}^{s},{\varnothing\:}^{T}\right),{s}_{j}^{T})\). Both classification and regression tasks are performed by the suggested framework. The first part in loss function denotes the Class-weighted cross-entropy of multiple class categorization and the 2nd part denotes the mean squared loss for 4 regression problems.
Dataset description
This study uses data collected from ADNI 1, ADNI GO, and ADNI 2. Here, 1539 participants divided into 4 groups depending on their patient-specific clinical diagnoses at baseline and subsequent visits. Participants are divided into four groups. The first category contains 420 participants who were diagnosed with CN at baseline. The second category contains 474 participants who were classified with sMCI across all time points during the trial. The third class contains 306 subjects with pMCI at the baseline visit and who converted to AD at any point during the study. Finally, the fourth class contains 339 subjects with consistently diagnosed with AD at all visits43. Table 1 shows the subjects’ demographic and clinical data.
Results and discussion
The performance of the proposed model is validated by simulating it in python programming language. Also, the Keras library with TensorFlow is utilized as the backend for our tests on a computer system running Ubuntu, 506 Nvidia GTX Titan Xp x2, and an i7-6800 K. The network parameters are updated using stochastic gradient descent (SGD). The values for the batch size, learning rate, momentum, weight decay, and number of training epochs are optimized as 6, \(\:1.0{e}^{-2}\), 0.95, \(\:5.0{e}^{-2}\), and 300 correspondingly. The proposed model is tested utilizing the public ADNI dataset. Here, an 80 − 20 dataset split was considered, meaning that 20% of the data is utilized for validation and 80% for training. To guarantee that the proposed findings are not influenced by a specific split, we carried out this procedure five times using various random subsets. To deliver a more reliable indication of the generalizability of the model, the model’s ultimate performance is averaged over these five runs.
ROI importance analysis
The suggested model divides the brain into anatomical regions (such as cortical and subcortical ROIs) using the FreeSurfer tool. The top ten ROIs found for MRI and PET data over the 3 different classification tasks are shown in Fig. 6. The top 10 ROIs are identified through registration of the obtained network weights with the automatic-connected anatomical labeling and arranging them in descending order according to the mean values for every ROI. The hippocampus, cingulum, and occipital lobe are vital brain regions that play important roles in a variety of cognitive and affective processes. Their significance in AD research is further highlighted by their relationship with AD disease.
Evaluation of classification task
The efficacy of the suggested classification task is shown by implementing various setups for three binary tasks (NC vs. AD, NC vs. MCI, and sMCI vs. pMCI). In Table 2, the effectiveness of the ISI-CSFN is compared with a number of deep learning methods, such as ResNet44, ResNext45, MobileNet46, ShuffleNetv247, EfficientNet48, and MMSDL35. With accuracy of up to 97.26% (95% CI: 97.01–97.51), sensitivity of up to 99.27% (95% CI: 97.61–100.93), and specificity of up to 94.72% (95% CI: 92.20–97.24), the proposed model outperformed all other models for NC vs. AD classification. MMSDL also performed competitively with 95.25% (95% CI: 94.62–95.88) accuracy and an AUC of 94.68% (95% CI: 93.93–95.43). Light model types like ShuffleNetv2 and MobileNet, however, had high sensitivity but relatively reduced accuracy. The suggested model performed exceptionally, demonstrating good discrimination between normal controls and mild cognitive impairment, with an accuracy of 88.26% (95% CI: 86.51–90.01) and an AUC of 90.54% (95% CI: 88.79–92.29) for NC vs. MCI classification.
MMSDL performed better than MobileNet with an accuracy of 85.37% (95% CI: 83.62–87.12), while MobileNet performed the worst at 79.65% (95% CI: 77.90–81.40). The discrimination between sMCI and pMCI is inherently more difficult due to the fine distinctions between progressive and stable MCI. The proposed model possessed the highest sensitivity (85.73%; 95% CI: 83.98–87.48), accuracy (81.25%; 95% CI: 79.50–83.00), and AUC (83.47%; 95% CI: 81.72–85.22). MMSDL again fell significantly behind, whereas MobileNet and ShuffleNetv2 exhibited relatively poorer discrimination performance. The suggested model outperformed all the baseline models for all the classification tasks consistently and proved to be robust in tackling both binary and hard progressive classification issues. Also, it succeeded in discriminating between the NC, MCI, and AD classes.
Performance of the proposed ISI-CSFN model is presented in Table 3. Its accuracy of 97.26 ± 0.29, 95% CI of [96.90–97.62], and AUC of 95.58 [95.00–96.16] for the NC vs. AD task indicate strong discrimination. The model obtains 88.26 ± 0.32 accuracy for the NC vs. MCI task and 81.25 ± 0.28 accuracy for the sMCI vs. pMCI task with tight confidence intervals. The ISI-CSFN’s consistently high sensitivity (99.27) and specificity (94.72) contribute to the confirmation of its stability in a broad range of categorization conditions.
The performance of the suggested ISI-CSFN model in the five-fold cross-validation is presented in Table 4. With fold-wise variations being within ± 0.5% and average accuracy of 97.26% and sensitivity of 99.27%, the model performed repeatedly well on all folds in the NC vs. AD test. In the same way, for NC vs. MCI, the model’s average accuracy was 92.41% and its sensitivity was 88.26%, showing consistent identification despite the tiny class differences. In the most difficult sMCI vs. pMCI challenge, ISI-CSFN had an average accuracy of 81.25% and an F1-score of 82.01% with consistent performance across folds. These consistent results for all partitions confirm the robustness, stability, and generalizability of the proposed framework.
Also, the ROC curves are used to evaluate the effectiveness of various configurations of our technique as in Fig. 7. The ROC curves visually represent the true positive vs. false positive rates for every model over the three tasks. These configurations include: ResNet50, ResNext, MobileNet, ShuffleNetv2, EfficientNet, MMSDL and the proposed model. The proposed model performs better at differentiating between classes since it obtains the best AUC across all classification tasks.
The findings of the proposed model are contrasted with those derived from individual designs of Multi-scale Local and Global Pattern Mining (MS-LGPM), MS-LGPM + ISI, and MS-LGPM + ISI + CSFN. Figure 8 compares the performance of the suggested ISI-CSFN framework and its individual design elements, MS-LGPM, MS-LGPM + ISI, and MS-LGPM + ISI + CSFN, in terms of Accuracy, F1 Score, Precision, and Recall over a range of time step. As demonstrated, the entire model (MS-LGPM + ISI + CSFN) exhibits improved resilience and stability over longer time sequences, consistently outperforming the other configurations across the majority of criteria. The baseline MS-LGPM is greatly enhanced by the addition of intra-scale interaction (ISI), especially in Accuracy and F1 Score. However, the optimal trade-off between precision and recall is achieved by combining ISI and cross-scale fusion (CSFN), which keeps performance high even as temporal complexity increases. This demonstrates how well the suggested ISI-CSFN design captures contextual global relationships as well as fine-grained local dependencies for time-series classification tasks.
We performed several sets of classification trials to assess the impact of MRI and PET fusion quantitatively. First, the classification performance is analysed by considering only functional brain imaging (MRI) (as indicated by the red hue in Fig. 9). Next, the classification performance is evaluated using PET (as seen by the blue hue in Fig. 8). At last, combined MRI and PET images were considered (indicated by green line). According to the experimental results, classification accuracy can be increased by 8% to 10% by combining structural and functional brain imaging. Additionally, the results show that combining MRI and PET brain imaging to study AD can increase the ability to predict AD compared to using only one modality.
Cognitive scores prediction
The MAE comparison of the four clinical cognitive tests including FAQ, ADAS, CDR, and MMSE for the suggested MS-LGPM variants such as MS-LGPM, MS-LGPM + ISI, and MS-LGPM + ISI + CSFN is shown in Table 5. The findings show that the addition of intra-scale interaction (ISI) increases prediction accuracy over the baseline MS-LGPM for the majority of measures. Among the four tasks, the whole model, MS-LGPM + ISI + CSFN notably achieves the lowest MAE values: 0.085 ± 0.01 for MMSE, 0.076 ± 0.01 for ADAS, 0.075 ± 0.01 for CDR, and 0.107 ± 0.01 for FAQ. This suggests that the model is better at capturing intricate correlations in multi-scale temporal data. These improvements show how better incorporating intra- and cross-scale interactions across various cognitive measures can reduce prediction error.
Table 6 presents the error analysis of MMSE and ADAS cognitive score prediction by MMSE and ADAS cognitive score prediction using various baseline models versus the proposed ISI-CSFN. It is shown that traditional imputation-based recurrent models, i.e., LSTM-M and LSTM-F, produce very high RMSE values, reflecting their poor capability for detecting sophisticated temporal relationships in multimodal data. Despite fewer mistakes, cross-modal interactions also cannot be represented optimally by more sophisticated models like MRNN, PLSTM-Z, and MinimalRNN. The advantages of robust recurrent architectures for sequential prediction are again shown by the DRN method, which lowers RMSE to 2.31 (95% CI: 2.20–2.42) for MMSE and 4.31 (95% CI: 4.16–4.46) for ADAS. Recommended ISI-CSFN does best on average, with RMSEs of 1.28 (95% CI: 1.17–1.39) for MMSE and 3.18 (95% CI: 2.96–3.40) for ADAS.
As the values are much lower than those of all baseline models, they demonstrate how well the intra-scale and cross-scale fusion method takes advantage of complementary information across modalities. The narrow confidence intervals also corroborate the stability and robustness of our approach across multiple experiments. These findings show that ISI-CSFN gives more accurate and solid predictions of cognitive impairment compared to state-of-the-art recurrent and deep learning methods. Through the introduction of multi-scale and modality-specific information, the proposed method not only enhances forecasting performance but also has vast potential for clinical use in tracking the development of Alzheimer’s disease.
Ablation study
To examine the influences of various model blocks on the prediction results, we compared the performance of ISI-CSFN with its variants. Namely, the model with only MRI image (MRI-ISI-CSFN), the model with only PET (PET- ISI-CSFN), the model with only DSCN (DSCN-ISI-CSFN), the model with only CTransformer (CTransformer-CSFN) and Combined MSLG- ISI-CSFN model. As illustrated in Fig. 10, t-SNE feature visualizations are conducted for various ISI-CSFN technique configurations across three classification tasks: NC vs. AD, NC vs. MCI, and sMCI vs. pMCI. The results demonstrate clear variations in the models’ capacity to distinguish between class clusters. CTransformer-CSFN and DSCN-ISI-CSFN exhibit a considerable degree of overlap and separation, especially when it comes to differentiating between sMCI and pMCI and NC and MCI.
The proposed MSLG-ISI-CSFN can capture both fine-grained textures (local) and broader contextual patterns (global) within brain images, making it more effective at identifying subtle differences between normal and diseased brain images. Combining the MRI, PET, DSCN, and CTransformer modules allows the model to take advantage of complimentary data from many modalities and architectures, producing feature representations that are more reliable.
While the suggested ISI-CSFN performs well on the ADNI set, its basis on this benchmark limits its direct generalizability to other sets or clinical environments. Differences in imaging protocols, scanner platforms, demographic distributions, and clinical diagnostic criteria across cohorts can impact the model’s robustness when applied directly in other environments. Also, in contrast to ADNI participants, typically well-curated, real-world clinical data tend to have greater heterogeneity, noise, and missing values.
Table 7 presents a comparative performance analysis of the proposed ISI-CSFN model against existing multimodal approaches. The proposed model achieved the highest accuracy of 97.26%, with sensitivity and specificity values of 99.27% and 94.72%, respectively. Compared to Multimodal RNN49 and LMDP-Net50, our method demonstrates a substantial improvement, indicating its robustness in predicting Alzheimer’s disease progression.
Limitations and future work
Although the proposed model is more accurate than baseline models, it has more parameters and higher FLOPs per inference, which adds to computational complexity. Besides its deployment on efficient hardware, future work will investigate optimization methods such as pruning, quantization, and knowledge distillation to cut down computational loads while keeping performance intact. In addition, the supplementary information from MRI and PET methods might not be similarly helpful to all patients or disease phases. Single-modality models with strong performance or adaptive fusion methods can ensure performance while cross-modal complementarity is weak. The comparatively limited dataset size is another drawback, which may limit generalizability and heighten the risk of overfitting. Robustness can be enhanced by validation on bigger, multi-center datasets with data augmentation, transfer learning, and cross-cohort testing. Inclusion of participants with mild preclinical abnormalities within the cognitively normal (CN) control group might bias comparisons against the MCI and AD groups. Subsequent research should take into account stricter screening protocols and cross-validation within different cohorts to ensure that the CN group represents a representative population baseline.
Conclusion
Alzheimer’s disease (AD), a progressive neurodegenerative disorder, poses a major global health challenge because of its medical, societal, and commercial impacts. This study used a disease progression model to define the AD progression prediction problem, which included clinical stage prediction tasks over multiple time periods and prediction of MRI biomarker and cognitive test scores. A Joint multimodal multitask learning framework using the mixture of DSCN and cTransformer is specifically suggested for the purpose of learning AD multiple class categorization and four cognitive score regression concurrently. Furthermore, the inter-scale feature fusion unit aims to combine low-level features that support structural sketching with high-level representations that aid localization, while the intra-scale cross-interaction module is introduced to focus on the cooperation between subtle local patterns and application-driven global benefits. Also, an exhaustive experiment is conducted to prove the model effectiveness by matching with other comparison methodologies in the literature. The suggested framework outperformed opposing methods in numerous quantitative metrics. Therefore, it provides new perspectives in identifying the diverse neural circuits linked to AD. The primary drawback of our model is its excessive resource usage, which is a result of the pre-trained DSCN and CTransformer backbones being introduced in two separate branches. Therefore, future research will focus on lightweight models in more detail.
Data availability
The datasets analysed during the current study are available in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) repository, https://adni.loni.usc.edu/.
References
Jiang, Q. et al. Antiageing strategy for neurodegenerative diseases: from mechanisms to clinical advances. Signal. Transduct. Target. Therapy. 10 (1), 76 (2025).
Pan, D. et al. Tianzi Jiang, and alzheimer’s disease neuroimaging initiative (ADNI). Deep learning for brain MRI confirms patterned pathological progression in alzheimer’s disease. Adv. Sci. 10 (6), 2204717 (2023).
Chatterjee, I. & Bansal, V. LRE-MMF: A novel multi-modal fusion algorithm for detecting neurodegeneration in parkinson’s disease among the geriatric population. Exp. Gerontol. 197, 112585 (2024).
Alorf, A. & Muhammad Usman Ghani Khan. Multi-label classification of alzheimer’s disease stages from resting-state fMRI-based correlation connectivity data and deep learning. Comput. Biol. Med. 151, 106240 (2022).
Lu, X., Zhang, Y., Tang, Y., Bernick, C. & Guogen Shan. Conversion to alzheimer’s disease dementia from normal cognition directly or with the intermediate mild cognitive impairment stage. Alzheimer’s Dement. 21 (1), e14393 (2025).
Olaimat, A., Martinez, M. J. & Saeed, F. PPAD: a deep learning architecture to predict progression of Alzheimer’s disease. Bioinformatics 39, i149–i157 (2023).
Pradeep, K., Rambabu, B., Surendran, R. & Balamurugan, K. S. January. Early Prediction of Alzheimer’s Dementia Using Random Forest Machine Learning Algorithm: A Data-Driven Approach. In 2025 8th International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech) (pp. 1–6). IEEE. (2025).
El-Assy, A. M., Hanan, M., Amer, H. M., Ibrahim & Mohamed, M. A. A novel CNN architecture for accurate early detection and classification of alzheimer’s disease using MRI data. Sci. Rep. 14 (1), 3463 (2024).
Sadeghi, M. et al. Detecting alzheimer’s disease stages and frontotemporal dementia in time courses of resting-state fmri data using a machine learning approach. J. Imaging Inf. Med. 37(6), 2768–2783 (2024).
Choudhury, C., Goel, T. & Tanveer, M. A coupled-GAN architecture to fuse MRI and PET image features for multi-stage classification of alzheimer’s disease. Inform. Fusion. 109, 102415 (2024).
Ni, Y. C. et al. Classification prediction of Alzheimer’s disease and vascular dementia using physiological data and ECD SPECT images. Diagnostics 14(4), 365 (2024).
Liu, Y., Wang, L., Ning, X., Gao, Y. & Wang, D. Enhancing early alzheimer’s disease classification accuracy through the fusion of sMRI and RsMEG data: a deep learning approach. Front. NeuroSci. 18, 1480871 (2024).
Elazab, A. et al. Alzheimer’s disease diagnosis from single and multimodal data using machine and deep learning models: Achievements and future directions. Expert Syst. Appl. 255, 124780 (2024).
Odusami, M., Maskeliūnas, R., Damaševičius, R. & Misra, S. Explainable deep-learning-based diagnosis of alzheimer’s disease using multimodal input fusion of PET and MRI images. J. Med. Biol. Eng. 43 (3), 291–302 (2023).
Joypriyanka, M. & Surendran, R. February. Chess Game to Improve the Mental Ability of Alzheimer’s Patients using A3C. In 2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT) (pp. 1–6). IEEE. (2023).
Wang, Y. et al. Predicting long-term progression of alzheimer’s disease using a multimodal deep learning model incorporating interaction effects. J. Translational Med. 22 (1), 265 (2024).
Aqeel, A. et al. A long short-term memory biomarker-based prediction framework for Alzheimer’s disease.. Sensors 22(4), 1475 (2022).
Archana, R. & Eliahim Jeevaraj, P. S. Deep learning models for digital image processing: a review. Artif. Intell. Rev. 57 (1), 11 (2024).
Rahim, N., El-Sappagh, S. & Rizk, H. Omar Amin El-serafy, and tamer Abuhmed. Information fusion-based bayesian optimized heterogeneous deep ensemble model based on longitudinal neuroimaging data. Appl. Soft Comput. 162, 111749 (2024).
Nanthini, K., Tamilarasi, A., Sivabalaselvamani, D. & Suresh, P. Automated classification of alzheimer’s disease based on deep belief neural networks. Neural Comput. Appl. 36 (13), 7405–7419 (2024).
Ferri, R. et al. Stacked autoencoders as new models for an accurate alzheimer’s disease classification support using resting-state EEG and MRI measurements. Clin. Neurophysiol. 132 (1), 232–245 (2021).
Khan, A. & Zubair, S. An improved multi-modal based machine learning approach for the prognosis of alzheimer’s disease. J. King Saud University-Computer Inform. Sci. 34 (6), 2688–2706 (2022).
Morar, U. et al. Prediction of cognitive test scores from variable length multimodal data in alzheimer’s disease. Cogn. Comput. 15 (6), 2062–2086 (2023).
Abdelaziz, M., Wang, T. & Elazab, A. Alzheimer’s disease diagnosis framework from incomplete multimodal data using convolutional neural networks. J. Biomed. Inform. 121, 103863 (2021).
Zhang, J. et al. Multi-modal cross-attention network for alzheimer’s disease diagnosis with multi-modality data. Comput. Biol. Med. 162, 107050 (2023).
Joypriyanka, M. & Surendran, R. August. Checkers game therapy to improve the mental ability of alzheimer’s patient using ai virtual assistant. In 2023 Second International Conference on augmented intelligence and sustainable systems (ICAISS) (pp. 96–102). IEEE. (2023).
Venugopalan, J., Tong, L., Hassanzadeh, H. R. & May, D. Multimodal deep learning models for early detection of Alzheimer’s disease stage.. Scientific reports 11(1), 3254 (2021).
Jeong, S., Jung, W., Sohn, J. & Heung-Il, S. Deep geometric learning with monotonicity constraints for alzheimer’s disease progression. IEEE Trans. Neural Networks Learn. Systems (2024).
Yoon, J., Zame, W. R. & Mihaela Van Der Schaar. Estimating missing data in Temporal data streams using multi-directional recurrent neural networks. IEEE Trans. Biomed. Eng. 66 (5), 1477–1490 (2018).
Zhang, C. et al. Cross-dataset Evaluation of Dementia Longitudinal Progression Prediction Models. medRxiv (2024).
Jung, W., Jun, E., Suk, H. I. & Alzheimer’s Disease Neuroimaging Initiative. and. Deep recurrent model for individualized prediction of Alzheimer’s disease progression. NeuroImage 237 : 118143. (2021).
Zuo, Q. et al. Alzheimer’s disease prediction via brain structural-functional deep fusing network. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 4601–4612 (2023).
Lu, P., Mitelpunkt, L. H. A., Bhatnagar, S., Lu, L. & Liang, H. A hierarchical attention-based multimodal fusion framework for predicting the progression of alzheimer’s disease. Biomed. Signal Process. Control. 88, 105669 (2024).
Sheng, J. et al. A hybrid multimodal machine learning model for detecting alzheimer’s disease. Comput. Biol. Med. 170, 108035 (2024).
Abdelaziz, M., Wang, T. & Anwaar, W. Multi-scale multimodal deep learning framework for alzheimer’s disease diagnosis. Comput. Biol. Med. 184, 109438 (2025).
Yu, Q. et al. Wenyuan Li, and alzheimer’s disease neuroimaging Initiative. A transformer-based unified multimodal framework for alzheimer’s disease assessment. Comput. Biol. Med. 180, 108979 (2024).
Duong, Q., Anh, S. D., Tran & Jin Kyu Gahm. Multimodal surface-based transformer model for early diagnosis of alzheimer’s disease. Sci. Rep. 15 (1), 5787 (2025).
Muksimova, S., Umirzakova, S. & Baltayev, J. Multi-Modal fusion and longitudinal analysis for Alzheimer’s disease classification using deep learning.. Diagnostics 15(1), 717 (2025).
Zheng, G., Lu, Y. & Chen, H. Deep learning based framework for predicting mild cognitive impairment progression in neurology using longitudinal MRI. IEEE Access (2025).
Keerin, P. & Boongoen, T. Improved Knn imputation for missing values in gene expression data. Computers Mater. Continua. 70 (2), 4009–4025 (2021).
Lu, G., Zhang, W. & Wang, Z. Optimizing depthwise separable Convolution operations on Gpus. IEEE Trans. Parallel Distrib. Syst. 33 (1), 70–87 (2021).
Li, X., Wang, W. & Yang, J. Xiaolin Hu, and Selective kernel networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 510–519. (2019).
Im, C. et al. Investigating the effect of brain atrophy on transcranial direct current stimulation: A computational study using ADNI dataset. Comput. Methods Programs Biomed. 257, 108429 (2024).
Nithya, V. P., Mohanasundaram, N. & Santhosh, R. An early detection and classification of alzheimer’s disease framework based on ResNet-50. Curr. Med. Imaging. 20 (1), e250823220361 (2024).
Borhade, R., Ravindra, S. S., Barekar, Sharada, N., Ohatkar, Piyush, K. & Mathurkar Ravindra Honaji Borhade, and pushpa Manoj Bangare. ResneXt-Lenet: A hybrid deep learning for epileptic seizure prediction. Intell. Decis. Technol. 18 (3), 1675–1693 (2024).
Sethi, M., Singh, S. & Arora, J. Classification of Alzheimer’s disease using transfer learning mobilenet convolutional neural network. In International Conference on Emergent Converging Technologies and Biomedical Systems, pp. 19–28. Springer Nature(2022).
Gullipalli, N., Misra, A. & Pappu, S. R. Mallikharjuna Rao Karreddula, and Balajee Maram. Residual Shufflenet with optimization: A novel approach for brain tumor detection in MRI images. International J. Biomathematics (2025).
Agarwal, D. Manuel Álvaro Berbís, Antonio Luna, Vivian Lipari, Julien Brito Ballester, and Isabel de La Torre-Díez. Automated medical diagnosis of Alzheimer´ s disease using an efficient net convolutional neural network. J. Med. Syst. 47 (1), 57 (2023).
Lee, G., Nho, K., Kang, B., Sohn, K. A. & Kim, D. Predicting Alzheimer’s disease progression using multi-modal deep learning approach.. Scientific reports 9(1), 1952 (2019).
Dao, D. P., Yang, H. J., Kim, J. & Ngoc-Huynh, H. Longitudinal Alzheimer’s disease progression prediction with modality uncertainty and optimization of information flow. IEEE J. biomedical health informatics 29(1), 259–272 (2024).
Author information
Authors and Affiliations
Contributions
T.B, S.R and M. R. wrote the main manuscript text and A.P prepared all figures. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
All work was done in an ethical manner, following the appropriate guidelines and regulations set by ADNI Research. We confirm that we obtained approval to use the ADNI dataset via the ADNI Data Use Agreement and that all ADNI participants provided written informed consent at the time of data collection.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Babu, T., Mahendran, R., Ajitha, P. et al. Intra-scale interaction and cross-scale fusion network for detecting the progression of neurodegeneration in Alzheimer’s disease. Sci Rep 16, 1377 (2026). https://doi.org/10.1038/s41598-025-31179-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-31179-8









