Introduction

Alzheimer’s Disease (AD) is a degenerative neurological disorder which is caused by progressive cognitive decline, synaptic dysfunction, and neuronal loss. It is considered as incurable and one of the leading causes of dementia worldwide. More than 50 million individuals currently suffer from AD globally, with estimated increase to nearly 152 million by 20501,2,3. AD typically progresses through three clinical stages mild, moderate, and severe as summarized in Table 1. Despite extensive research, available pharmacological treatments approved by the U.S. Food and Drug Administration (FDA) primarily target symptomatic management rather than treatment of the disease4,5. Therefore, early detection of Mild Cognitive Impairment (MCI) or prodromal stage is essential for timely intervention before neurological damage occurs6,7,8.

Table 1 Progression of Alzheimer’s Disease: Neuropathological, Cognitive, and Functional Characteristics.

These progressive neuropathological changes underscore the urgent need for early and accurate diagnostic frameworks that can identify subtle cognitive and neural alterations well before severe cortical damage occurs.

To address this challenge, researchers have proposed electroencephalography (EEG) a non-invasive, cost-effective, and temporally precise technique capable of capturing subtle neural dynamics associated with cognitive impairment9,10,11. EEG signals reflect the brain’s oscillatory activity across multiple frequency bands, each corresponding to distinct cognitive and awareness states. Figure 1 summarizes these canonical frequency bands and their associated mental states12.

Fig. 1
figure 1

EEG frequency bands and corresponding cognitive states.

Abnormalities such as elevated theta/delta activity and reduced alpha/beta power have been consistently observed in AD patients13,14. Consequently, EEG-based biomarkers hold immense promise for non-invasive and early-stage Alzheimer’s detection. Alzheimer’s disease is also accompanied by emotional and affective disturbances including apathy, anxiety, depression, irritability, and emotional lability15. These symptoms further degrade quality of life and complicate caregiving. Recent EEG studies have explored emotional-state recognition in both virtual and physical environments, demonstrating that EEG dynamics can reliably encode affective states16,17. Although the present work focuses on diagnostic classification, the integration of emotional correlates could, in future, enhance multimodal AD assessment frameworks.

Standard processing pipelines typically involve data preparation, filtering, segmentation, feature extraction, and classification18,19,20. However, methodological diversity in preprocessing, feature engineering, and model interpretability continues to hinder clinical reproducibility. Table 2 summarizes key prior works, highlighting their main approaches and limitations.

Table 2 Summary of Related Work on EEG-based Alzheimer’s Disease Detection.
Table 3 Description of OpenNeuro Datasets Used in This Study. Both ds004504 (Eyes-Closed) and ds006036 (Eyes-Open) share the same 88 participants; differences correspond to recording condition and duration.
Table 4 Description of the OSF EEG dataset used for external generalization evaluation. This independent dataset differs in demographics and acquisition protocols from the OpenNeuro datasets, enabling assessment of cross-site robustness of NeuroFusionNet.

As evident from Table 2, prior studies have often relied on isolated feature-learning schemes lacking interpretability, cross-subject robustness, or rigorous clinical validation. While handcrafted features (spectral, statistical, and connectivity) capture domain-specific patterns, few studies have combined them with deep-learned features to achieve better classification accuracy. This gap motivates the development of a unified, explainable, and generalizable framework capable of learning both interpretable and high-level representations from EEG data.

While the primary evaluation of NeuroFusionNet was conducted on the OpenNeuro eyes-closed dataset (ds004504), reliable clinical translation demands testing under varied demographic conditions. Hence, two additional datasets were including the OpenNeuro ds006036 (eyes-open) for assessing condition robustness, and the OSF EEG dataset have been used for external validation.

A broader survey of additional EEG datasets related to Alzheimer’s disease is summarized in Table 5. These datasets differ in demographics, sampling rates, and recording durations. Table 6 provides the comparison of the top performing methods on AD detection using EEG signals on different datasets. 

Table 5 Overview of publicly available EEG datasets for Alzheimer’s disease research. Tables 3, 4, 5 provide detailed subject-level information for the three datasets used in this work, while this table summarizes additional datasets for completeness.
Table 6 Comparison of reported benchmark accuracies from prior EEG-based Alzheimer’s studies. NeuroFusionNet’s results are later contextualized against these models to illustrate generalization and clinical competitiveness.

Key contributions and novelty

To clearly delineate the novelty of our work and its distinction from prior EEG-based Alzheimer’s frameworks, we summarize the main contributions of NeuroFusionNet as follows:

  • Combined both handcrafted (spectral, statistical, and connectivity) and CNN-derived deep features to achieve both interpretability and abstraction.

  • Employed Pearson correlation analysis, PSO-based selection, and bottleneck fusion to maximize complementarity and minimize redundancy unlike prior naive concatenation approaches.

  • Integrated Grad-CAM and SHAP back-projection to link AI reasoning with neurophysiological regions and rhythms.

  • Comprehensive robustness validation: Demonstrates condition and site generalization via ds004504, ds006036, and OSF datasets.

  • Achieved \(\sim\)21% improvement in signal-to-noise ratio (8.12\(\rightarrow\)9.89 dB) and 21% spectral stability gain, ensuring reliable EEG interpretation.

  • Demographic fairness: Subgroup (gender, age) analyses confirm unbiased performance (\(p>0.05\)).

  • 0.94M parameters, 4.1MB footprint, and 6.5 ms/sample CPU latency support real-time, portable EEG use.

  • Outperformed classical (SVM, RF, KNN, XGBoost) and recent deep models, highlighting hybrid deep–shallow superiority.

The following sections present the methodological design, preprocessing pipeline, feature level fusion, and interpretability.

Proposed methodology

The proposed NeuroFusionNet framework follows a systematic six-stage pipeline that includes Data acquisition and preparation, EEG preprocessing, Feature extraction, Feature fusion and model construction, Training and Evaluation along with interpretability. This modular design ensures transparency, reproducibility, and flexibility for deployment across multiple EEG datasets. Figure 2 provides flow diagram of the AD detection method proposed in this research.

Fig. 2
figure 2

Overall workflow of the proposed NeuroFusionNet framework. The architecture integrates hybrid feature fusion, robust preprocessing, and SHAP/Grad-CAM explainability modules for clinically interpretable Alzheimer’s classification.

Data acquisition and preparation

To ensure comprehensive evaluation and generalization, we utilized three publicly available EEG datasets representing distinct recording conditions and demographic distributions: the OpenNeuro ds004504 (eyes-closed), OpenNeuro ds006036 (eyes-open), and the OSF EEG dataset for external validation. This multi-dataset design enables condition and site-level robustness assessment under heterogeneous acquisition protocols.

OpenNeuro ds004504 (eyes-closed):

This primary dataset contains EEG recordings from 88 subjects (AD: 36, FTD: 23, CN: 29) publicly available at https://openneuro.org/datasets/ds004504/versions/1.0.8. Each participant underwent neuropsychological evaluation using the Mini-Mental State Examination (MMSE), with average scores of 30 (CN), 22 (FTD), and 18 (AD). Recordings were performed using 18 electrodes positioned according to the international 10–20 system with A1–A2 references and maintained impedance below 5 k\(\Omega\). Signals were captured at 500 Hz using a Nihon Kohden EEG-2100 system during eyes-closed resting conditions.

OpenNeuro ds006036 (eyes-open):

This dataset includes EEG recordings from the same 88 participants under eyes-open resting conditions. It enables direct comparison of physiological and cognitive state variations. The acquisition protocol matched ds004504 in montage and impedance, with a 500 Hz sampling rate to ensure consistent temporal resolution.

OSF EEG dataset (external validation):

For independent generalization testing, we included the OSF EEG dataset (https://osf.io/), containing data from 109 subjects (AD and Cognitively Normal groups). Recordings were performed using 21-channel systems sampled at 512 Hz under eyes-closed resting conditions. This dataset differs in demographics and instrumentation, offering an ideal test set for assessing cross-site transferability and model robustness. The structure of EEG data across all datasets can be represented as:

$$\begin{aligned} \textbf{X} = \left\{ \textbf{X}^{(i)} \mid i = 1, 2, \ldots , N \right\} , \quad \textbf{X}^{(i)} \in \mathbb {R}^{T \times C}, \end{aligned}$$
(1)

where \(\textbf{X}^{(i)}\) denotes the EEG recording of the \(i^\text {th}\) subject, \(N\) is the number of recordings, \(T\) the number of temporal samples, and \(C\) the number of EEG channels. All signals were preprocessing using notch filtering, ICA-based artifact removal, and z-score normalization. EEG datasets for dementia diagnosis are imbalanced, often containing fewer AD or FTD samples compared to cognitively normal controls. To mitigate this imbalance problem, we employed the Synthetic Minority Oversampling Technique (SMOTE)38, which generates synthetic feature vectors for minority classes via linear interpolation between nearest neighbors. Unlike random oversampling, SMOTE preserves local structure and class boundaries, reducing the risk of overfitting. Table 7 summarize class distributions before and after applying SMOTE on the ds004504 dataset. SMOTE yielded approximately uniform representation across classes, stabilizing convergence and ensuring equitable gradient updates during training. To avoid overfitting, SMOTE was applied strictly within training folds during cross-validation; validation and test folds remained unchanged. This approach aligns with best practices in biomedical machine learning by maintaining independent evaluation and unbiased model assessment.

Table 7 Sample distribution across classes before and after SMOTE balancing on the OpenNeuro ds004504 (eyes-closed) dataset.

Preprocessing

EEG signals are susceptible to non-neural artifacts such as ocular blinks, muscle activity, and electrode drifts, which can distort frequency-domain characteristics relevant for dementia classification. To guarantee reproducibility and minimize observer bias, the NeuroFusionNet pipeline employs a fully standardized and semi-automated denoising strategy comprising three sequential stages: (1) 1–45 Hz band-pass filtering to suppress DC drift and high-frequency interference, (2) Artifact Subspace Reconstruction (ASR) for transient artifact attenuation, and (3) Independent Component Analysis (ICA) using the Infomax algorithm to isolate and remove ocular and myogenic components. ICA components with kurtosis> 5, skewness > 2, or variance exceeding ± 3 SD from the mean were automatically flagged as non-neural. Components showing strong correlation with EOG reference channels or muscular topographies were excluded. To further enhance reproducibility, the Multiple Artifact Rejection Algorithm (MARA)39 was employed to validate and refine component classification. All thresholds were applied uniformly across subjects and datasets to ensure consistent denoising. The effectiveness of preprocessing was quantified using the Signal-to-Noise Ratio (SNR) and Residual Artifact Power (RAP) metrics summarized in Table 8. This standardized pipeline achieved reproducible noise suppression while preserving relevant neural information.

Table 8 Quantitative evaluation of artifact-removal performance across datasets.
Table 9 Quantitative evaluation of EEG preprocessing robustness on OpenNeuro ds004504 dataset.

Across all datasets, SNR increased by roughly 21 % while RAP decreased by 36–38 %, confirming that ICA–MARA effectively suppresses artifacts without distorting neural oscillations. To further quantify denoising reliability, we measured the Signal-to-Artifact Ratio (SAR), Power Spectral Stability (PSS), and Channel Retention Rate (CRR) on 88 subjects from ds004504 (Table 9). The results verify that spectral integrity and spatial consistency were preserved after preprocessing. Figure 4 illustrates the progression from raw EEG to fully cleaned signals, showing suppression of ocular and EMG spikes while retaining alpha–theta rhythms. Complementarily, robustness under synthetic noise perturbations (Gaussian \(\sigma =0.05\)–0.2) was tested; Figure 3 depicts that model accuracy declined by less than 2.1 % under high-noise conditions, confirming strong noise resilience.

Fig. 3
figure 3

Model stability under simulated EEG noise. Boxplot shows accuracy distributions across five folds for Gaussian noise levels (\(\sigma =0.05\)–0.2).

Fig. 4
figure 4

Illustration of preprocessing robustness: (a) raw EEG with ocular / EMG artifacts, (b) signal after ASR denoising, and (c) final cleaned EEG after ICA component removal. Physiological alpha–theta rhythms are preserved while transient artifacts are eliminated.

Quantitative and visual results confirm that the proposed denoising pipeline enhances SNR by \(\approx 21\%\), improves spectral stability, and maintains spatial fidelity establishing a reliable foundation for subsequent feature extraction. Following validation of preprocessing quality, the next step was to formalize each transformation within the standardized workflow to ensure reproducibility. EEG recordings were further processed through the following steps, each contributing to the final denoised and normalized input used for feature extraction and classification. A 4th-order Butterworth filter (0.5–45 Hz) removed low-frequency drifts and high-frequency noise. Its response is defined as:

$$\begin{aligned} \left| H(j\omega ) \right| = \frac{1}{\sqrt{1 + \left( \frac{\omega }{\omega _c} \right) ^{2n}}}. \end{aligned}$$
(2)

where \(\omega\) is the angular frequency, \(\omega _c\) the cutoff frequency, and \(n=4\) the filter order.

Signals were re-referenced using the average of mastoid electrodes (A1 and A2) to reduce bias and enhance inter-channel comparability. ASR40 removed transient high-amplitude artifacts by projecting contaminated segments onto a clean covariance-derived subspace. EEG data were decomposed as

$$\begin{aligned} \textbf{X}^{(i)} = \textbf{A}\textbf{S}, \end{aligned}$$
(3)

where \(\textbf{X}^{(i)}\) is the observed EEG matrix, \(\textbf{A}\) the mixing matrix, and \(\textbf{S}\) the independent sources. Non-neural components were identified via EEGLAB and removed using the criteria described above. Each cleaned EEG signal was standardized using z-score normalization:

$$\begin{aligned} \tilde{\textbf{X}}^{(i)} = \frac{\textbf{X}^{(i)} - \mu _{\textbf{X}^{(i)}}}{\sigma _{\textbf{X}^{(i)}}}, \end{aligned}$$
(4)

with mean and standard deviation computed as

$$\begin{aligned} \mu _{\textbf{X}^{(i)}}&= \frac{1}{T}\sum _{t=1}^{T}\textbf{X}_t^{(i)}, \end{aligned}$$
(5)
$$\begin{aligned} \sigma _{\textbf{X}^{(i)}}&= \sqrt{\frac{1}{T-1}\sum _{t=1}^{T}(\textbf{X}_t^{(i)}-\mu _{\textbf{X}^{(i)}})^2}. \end{aligned}$$
(6)

Normalized signals were divided into non-overlapping segments of 1 000 samples:

$$\begin{aligned} \textbf{S}^{(i,j)} = \tilde{\textbf{X}}^{(i)}[jL : (j+1)L, :], \end{aligned}$$
(7)

where \(\textbf{S}^{(i,j)} \in \mathbb {R}^{L \times C}\), \(L=1000\), and \(j=0,1,\ldots ,\lfloor T/L\rfloor -1\).

Fig. 5
figure 5

Comparison of raw and normalized EEG signals for channels Cz and Fz. Normalization stabilizes amplitude variance across segments.

The proposed preprocessing framework integrates domain-standard filtering, referencing, and statistical denoising validated both quantitatively (Tables 8, 9) and visually (Figures  3, 4, 5).

Feature extraction

After robust preprocessing, both handcrafted and automatically learned features were extracted from each clean and segmented EEG window \(S^{(i,j)} \in \mathbb {R}^{L \times C}\). This dual-stream design captures complementary information domain-specific descriptors from handcrafted analysis and nonlinear temporal abstractions from deep learning. The complete feature extraction process is summarized below.

Handcrafted feature extraction

Each EEG segment was passed through a multi-domain analytical pipeline involving statistical, spectral, wavelet, and entropy-based descriptors. Features were computed channel-wise and concatenated into a unified vector \(\textbf{f}_{\text {hand}}^{(i,j)} \in \mathbb {R}^{d_1}\). Table 10 lists the extracted feature domains. Power spectral density (PSD), \(P(f)\), was estimated using Welch’s method, and the logarithmic bandpower was computed for canonical EEG frequency bands (delta, theta, alpha, beta, gamma) as:

$$\begin{aligned} P_{\text {band}} = \log \left( \int _{f_1}^{f_2} P(f) \, df + \epsilon \right) , \end{aligned}$$
(8)

where \(P(f)\) denotes the spectral power, \(f_1\) and \(f_2\) represent the lower and upper frequency limits, and \(\epsilon\) is a stabilization constant. Temporal-frequency decomposition was conducted using Daubechies-4 (db4) wavelets up to level 5. From each coefficient set \(W_k\), we extracted the mean and standard deviation:

$$\begin{aligned} \begin{aligned} \mu _{W_k}&= \frac{1}{n} \sum _{j=1}^{n} W_k[j], \quad \sigma _{W_k} = \sqrt{\frac{1}{n-1} \sum _{j=1}^{n} (W_k[j] - \mu _{W_k})^2}, \end{aligned} \end{aligned}$$
(9)

Higher-order moments such as skewness and kurtosis were further estimated to capture waveform asymmetry and peakedness:

$$\begin{aligned} \gamma = \frac{1}{n} \sum _{j=1}^{n} \left( \frac{x_j - \mu }{\sigma } \right) ^3, \quad \kappa = \frac{1}{n} \sum _{j=1}^{n} \left( \frac{x_j - \mu }{\sigma } \right) ^4, \end{aligned}$$
(10)

Finally, permutation entropy quantified temporal complexity:

$$\begin{aligned} H_{\text {perm}} = - \sum _{k=1}^{m!} p_k \log p_k, \end{aligned}$$
(11)

where \(p_k\) is the probability of each ordinal pattern of length \(m\).

Table 10 Types of Handcrafted Features Extracted Per Channel.

Automated feature extraction (1D-CNN):

We propose a lightweight 1D Convolutional Neural Network (1D-CNN) for automated detection of features from multichannel EEG signals. It consists of three convolutional layers followed by max pooling and a global average pooling layer for dimensionality reduction. Each convolutional block includes batch normalization and ReLU activation function:

$$\begin{aligned} h^{(l)} = \text {ReLU}\big (\text {BN}(W^{(l)} * h^{(l-1)} + \textbf{b}^{(l)})\big ), \end{aligned}$$
(12)

where \(h^{(l)}\) is the \(l^\text {th}\) feature map, \(W^{(l)}\) and \(\textbf{b}^{(l)}\) are learnable parameters, and \(\text {BN}\) denotes batch normalization. The 1D-CNN architecture utilizes the temporal nature of EEG data, avoiding the computational burden of 2D convolutions while preserving interpretability. The number of filters (32–128) and kernel sizes (3, 5, 7) were tuned through structured grid search and validated via 5-fold cross-validation. The configuration of 64 filters and kernel size 5 achieved optimal validation accuracy and generalization. Dropout rates (\(p=0.3\)–0.4) were concurrently optimized to balance regularization and representational capacity, as further confirmed in the ablation results (Table 19).

To avoid overfitting while retaining multi-scale temporal features, the final CNN depth was limited to three convolutional blocks. This configuration demonstrated strong reproducibility and generalization across all datasets. To justify the 1D-CNN design choice, a controlled comparison was performed against an equivalent 2D-CNN architecture under identical training protocols. As shown in Table 11 and Fig. 6, the 1D-CNN achieved higher accuracy, faster inference, and lower variance. While the 2D-CNN could model inter-channel dependencies, it incurred 62% higher computational cost and tended to overfit due to limited EEG samples. Thus, temporal dependencies were prioritized over spatial correlations for short-segment EEG analysis.

Table 11 Performance comparison between 1D-CNN and 2D-CNN architectures under identical training conditions on the ds004504 dataset.
Fig. 6
figure 6

Comparison between 1D-CNN and 2D-CNN architectures. The 1D-CNN shows superior mean accuracy and reduced variance across five folds, confirming temporal modeling advantages under limited EEG conditions.

Hence, the 1D-CNN was retained within NeuroFusionNet for its temporal sensitivity, compactness, and generalization efficiency. To combine domain-level and learned representations, handcrafted and CNN-derived features were concatenated:

$$\begin{aligned} \textbf{f}^{(i,j)} = [\textbf{f}_{\text {hand}}^{(i,j)} \, || \, \textbf{f}_{\text {cnn}}^{(i,j)}] \in \mathbb {R}^{d_1 + d_2}, \end{aligned}$$
(13)

where \(\textbf{f}_{\text {hand}}^{(i,j)}\) and \(\textbf{f}_{\text {cnn}}^{(i,j)}\) denote handcrafted and CNN features respectively.

Class imbalance was mitigated via Synthetic Minority Oversampling Technique (SMOTE):

$$\begin{aligned} \textbf{x}_{\text {syn}} = \textbf{x}_a + \lambda (\textbf{x}_b - \textbf{x}_a), \quad \lambda \sim \mathcal {U}(0,1), \end{aligned}$$
(14)

where \(\textbf{x}_a\) and \(\textbf{x}_b\) are minority samples and \(\lambda\) is a random interpolation coefficient. All features were standardized via z-score normalization, and dimensionality was reduced using Principal Component Analysis (PCA) retaining 99% of variance:

$$\begin{aligned} Z = XW, \quad \text {where} \quad W = \underset{W}{\text {argmax}} \, \text {Var}(XW), \end{aligned}$$
(15)

with \(X\) as the normalized feature matrix and \(Z\) the reduced embedding. The final representation is defined as:

$$\begin{aligned} F_{\text {final}} = \{ \textbf{f}_{\text {pca}}^{(i,j)} \}_{i,j} \in \mathbb {R}^{M \times d_{\text {final}}}, \end{aligned}$$
(16)

where \(M\) is the number of EEG segments and \(d_{\text {final}}\) the PCA-reduced dimension.

Table 12 Estimated Number of Features at Key Stages.
Table 13 Summary of Feature Dimensions at Each Processing Stage.
Table 14 Training configuration for NeuroFusionNet. Training typically continued for 90–120 epochs until validation loss plateaued for 10 consecutive epochs, triggering early stopping.

This hybrid feature engineering strategy integrates interpretable handcrafted EEG descriptors with deep learned temporal embeddings. By combining spectral, wavelet, and statistical markers with CNN-derived patterns, NeuroFusionNet leverages both neuroscientific priors and data-driven abstraction. The resulting PCA-compressed feature set provides a balanced, noise-robust representation that enhances both accuracy and interpretability–serving as the foundation for the Model Construction phase described next (Tables 12, 13).

Classification network and optimization strategy

Final feature vector obtained by concatenating handcrafted and deep-learned features passed to the classification stage of NeuroFusionNet which employs a deep neural network (DNN) designed to learn class-discriminative EEG representations in an interpretable and computationally efficient manner. It consists of regularization, normalization, and residual connections to stabilize gradient flow and mitigate overfitting during training. A detailed layer-wise architecture of the proposed network is summarized in Table 15.

Let \(\textbf{f}_{\text {final}}^{(i)} \in \mathbb {R}^{d_{\text {final}}}\) represent the PCA-reduced feature vector for the \(i^{\text {th}}\) EEG segment. It is input to the classifier as:

$$\begin{aligned} \textbf{x}^{(0)} = \textbf{f}_{\text {final}}^{(i)} \in \mathbb {R}^{d_{\text {final}}}. \end{aligned}$$
(17)

The DNN consists of five fully connected (dense) layers, each followed by batch normalization (BN), dropout, and a non-linear activation. A skip connection between the first and third dense layers facilitates residual learning and promotes smooth gradient propagation. The transformation at the \(l^{\text {th}}\) dense layer is expressed as:

$$\begin{aligned} \textbf{h}^{(l)} = f(\textbf{W}^{(l)} \textbf{h}^{(l-1)} + \textbf{b}^{(l)}), \end{aligned}$$
(18)

where \(f(\cdot )\) denotes the LeakyReLU activation, and \(\textbf{W}^{(l)}\), \(\textbf{b}^{(l)}\) are learnable parameters. Dropout layers (\(p_1\)\(p_4\)) control overfitting, while the output layer applies the softmax activation to yield probabilistic class scores:

$$\begin{aligned} \hat{\textbf{y}} = \text {softmax}\left( \textbf{W}^{(L)} \textbf{h}^{(L-1)} + \textbf{b}^{(L)} \right) . \end{aligned}$$
(19)

The network is trained by minimizing the categorical cross-entropy loss with \(L_2\) weight regularization:

$$\begin{aligned} \mathcal {L} = -\sum _{k=1}^{K} y_k \log \hat{y}_k + \lambda \sum _{l=1}^{L} \left\| \textbf{W}^{(l)} \right\| _2^2, \end{aligned}$$
(20)

where \(y_k\) and \(\hat{y}_k\) represent the true and predicted class probabilities, and \(\lambda\) denotes the regularization coefficient. Optimization was performed using the Adam optimizer with an initial learning rate of \(3 \times 10^{-4}\). A dynamic learning rate adjustment was employed through the ReduceLROnPlateau scheduler (factor = 0.5, patience = 10), which automatically reduced the learning rate by half after 10 consecutive epochs without validation improvement. Typically, this mechanism lowered the learning rate to approximately \(7.5 \times 10^{-5}\) between epochs 20 and 40, enabling stable convergence and enhanced generalization. This schedule was consistently maintained across all datasets and cross-validation folds to ensure reproducibility.

Table 15 Layer-wise architecture of the NeuroFusionNet classification module.

A structured grid search was conducted to identify optimal model parameters, covering convolutional filters \(\{32, 64, 128\}\), kernel sizes \(\{3, 5, 7\}\), and dropout rates \(\{0.2, 0.3, 0.4, 0.5\}\). Each configuration was evaluated under five-fold cross-validation on the OpenNeuro training set to measure validation accuracy and model stability. The configuration of 64 filters, kernel size = 5, and dropout = 0.3–0.4 achieved the best trade-off between generalization and computational efficiency. All tuned hyperparameters are summarized in Table 14, and the final layer-wise network design is described in Table 15. The architecture of NeuroFusionNet is specifically designed to model the high-dimensional, nonlinear dependencies within fused EEG features while maintaining interpretability and training stability. It comprises five dense layers interleaved with normalization, non-linear activation, and dropout layers. The inclusion of a skip connection enhances feature propagation and facilitates residual learning, minimizing gradient degradation. A concise layer-wise description is presented in Table 15.

Fig. 7
figure 7

Modular architecture of NeuroFusionNet with skip connection from Dense-1 to Dense-3.

The modular design of NeuroFusionNet is presented in the Fig. 7. Each component block represents a distinct transformation stage, highlighting how low-level features from Dense-1 are reintroduced into Dense-3 via the skip pathway. This residual mapping provides hierarchical representation learning and enhances robustness to noisy EEG data. To prevent premature convergence or oscillatory loss behavior, a dynamic learning rate schedule was applied through ReduceLROnPlateau. The learning rate was reduced to half after extended periods of validation loss stagnation (patience = 10). In practice, this led to a controlled decay from \(3\times 10^{-4}\) to approximately \(7.5\times 10^{-5}\) around epochs 20–4041. This adaptive decay accelerates early-stage learning and fine-tunes convergence during later epochs, yielding more stable optimization and improved generalization across EEG datasets.

To improve the generalization, the OpenNeuro (ds004504) dataset was partitioned at the subject level. A stratified assignment maintained class balance across five folds, with each fold comprising 70% training, 15% validation, and 15% testing subjects. This ensures that no EEG segment from a single participant appears in more than one subset and prevents temporal or spatial leakage. The model achieving the highest validation F1-score within each fold has been used for testing. The overall subject distribution is summarized in Table 16.

Table 16 Subject-level distribution and data split for five-fold cross-validation on OpenNeuro (ds004504). Each fold preserves class balance across Alzheimer’s Disease (AD), Frontotemporal Dementia (FTD), and Control Normal (CN).

This rigorous data-partitioning framework ensures statistical reliability and generalization under limited-sample EEG conditions. Averaged metrics across five folds (mean ± standard deviation) are reported in subsequent sections as reliable indicators of model performance and reproducibility. In summary, the proposed classification module of NeuroFusionNet unifies efficient optimization, adaptive learning rate scheduling, and stratified cross-validation within a residual deep-learning framework.

Results and discussion

In clinical neuroscience applications, model interpretability is extremely important to gain trust among practitioners. To improve interpretability, we incorporated XAI techniques including SHAP and Grad-CAM to reveal model decision patterns. We conducted experiments on three datasets including OpenNeuro ds004504 (eyes-closed) as the primary dataset, OpenNeuro ds006036 (eyes-open) to assess condition robustness, and the independent OSF dataset for validation.

Table 17 Results from the ablation study on the OpenNeuro ds004504 (eyes-closed) dataset, evaluating the impact of different preprocessing, feature extraction, and classification strategies. The proposed NeuroFusionNet consistently outperforms other configurations.
Fig. 8
figure 8

Confusion matrix on the OpenNeuro ds004504 (eyes-closed) dataset.

Fig. 9
figure 9

ROC curves with corresponding AUC scores on the OpenNeuro ds004504 (eyes-closed) dataset.

Fig. 10
figure 10

(a) Accuracy and (b) loss during training and validation phases.

Table 18 Classification report of NeuroFusionNet on the OpenNeuro ds004504 (eyes-closed) dataset. Results are macro- and weighted-averaged across three classes.
Fig. 11
figure 11

SHAP global features importance plot.

Fig. 12
figure 12

Feature scatter plot (ds004504 eyes-closed dataset).

Table 19 Ablation study of NeuroFusionNet on the OpenNeuro ds004504 (eyes-closed) dataset.
Table 20 Five-fold cross-validation results on ds004504 dataset.
Table 21 Computational comparison of NeuroFusionNet with baseline EEG models on the OpenNeuro ds004504 dataset.
Fig. 13
figure 13

SHAP summary plot illustrating feature importance in physiologically meaningful terms.

An extensive ablation study was conducted to evaluate the effectiveness of key components of the suggested AD detection pipeline. We first evaluated the impact of different feature construction strategies and preprocessing steps (SMOTE, PCA). Combining domain-specific handcrafted features with CNN-based deep embeddings led to a marked increase in classification accuracy, as shown in Table 17. SMOTE-based oversampling and PCA-based dimensionality reduction contribute to class balance and improve the model’s generalization performance. The proposed model found a test accuracy of 94.27%, which suggests good generalization robustness over the three EEG classes. The macro-averaged precision, recall and F1-score were all roughly 0.94, as shown in Table 18. Figure 12 represents Feature scatter plot illustrating separability of AD, FTD, and HC classes using three key extracted features: Delta band power, Mean amplitude, and Standard Deviation. Figure 10 illustrates training dynamics, showing stable convergence and minimal overfitting. To further optimize the architecture, we conducted an ablation study evaluating the influence of various identical-parameter configurations on model performance. To empirically validate the selected hyperparameter configuration, an ablation analysis was conducted on the OpenNeuro ds004504 dataset. Each parameter (number of layers, dropout rate, learning rate, batch size) was varied independently while keeping all others fixed to assess its impact on model accuracy. The averaged results over three runs are summarized in Table 19. We observed that increasing the network depth to four hidden layers (256–128–64–32) consistently improved accuracy. A moderate dropout rate of 0.3 yielded the best generalization, striking a balance between regularization and learning stability. Similarly, a lower learning rate of 0.0005 facilitated more stable convergence, while a batch size of 32 outperformed smaller or larger batch sizes by ensuring optimal gradient estimation and batch-level diversity. These insights confirm both the robustness and transparency of our proposed model.

As shown in Table 19, the grid-search procedure identified a configuration with four dense layers, dropout = 0.3, learning rate = \(5 \times 10^{-4}\), and batch size = 32 as the most stable and accurate setup across three validation runs. These hyperparameter values were therefore adopted as the default configuration for all subsequent experiments reported in Tables 14 and 15. We performed an extensive cross-validation and regularization analysis on the OpenNeuro (ds004504, ds006036) and OSF datasets. A stratified 5-fold cross-validation protocol was applied, ensuring proportional representation of AD, FTD, and CN classes in each fold. The mean and standard deviation of key metrics across folds are summarized in Table 20. Low values of standard deviation indicates stable performance across folds, confirming that the model’s high accuracy is not due to the data partitioning. To reduce the effect of the overfitting, we employed multiple regularization techniques: (1) dropout layers (rate = 0.3) between dense blocks, (2) L2 weight decay (\(\lambda = 1\times 10^{-4}\)), and (3) early stopping with a patience of 10 epochs. The near-uniform fold-level performance (SD = 0.28%) and consistent F1/AUC scores demonstrate that NeuroFusionNet’s high accuracy is reproducible and not a consequence of overfitting or data leakage. NeuroFusionNet integrates both handcrafted and deep learning-based feature, however, its architecture has been optimized to ensure computational efficiency and scalability. The model comprises approximately 0.94 million trainable parameters, with a total storage of 4.1 MB required. Inference time was empirically measured to be 6.5 milliseconds per sample on a CPU (Intel Core i7, 2.8 GHz) which supports its real time implementation. Table 21 presents a comparative analysis of NeuroFusionNet against two representative baselines frequently employed in EEG-based Alzheimer’s disease classification: a standard 1D Convolutional Neural Network (1D-CNN) and a Transformer-based model46. The results clearly demonstrate that NeuroFusionNet achieved high classification accuracy with reduced computational cost. We further evaluated the deployment feasibility of NeuroFusionNet in real-world clinical settings. While achieving high classification accuracy is critical, practical usability requires low-latency inference, minimal hardware resources, and integration into existing EEG acquisition pipelines. To further demonstrate deployability in clinical and portable EEG environments, we summarize the deployment configuration and observed inference performance in Table  26. NeuroFusionNet achieves sub-10 ms latency on a mid-range CPU, enabling real-time decision support without the need for high-end GPUs. The lightweight parameter count and memory footprint make it suitable for bedside systems and embedded clinical workstations.

In terms of integration, the entire pipeline (preprocessing, feature extraction, NeuroFusionNet inference, and post-processing) was containerized using Docker to ensure reproducibility and easy installation across clinical workstations. Furthermore, the model was tested on streaming EEG input, where latency remained below 20 ms end-to-end, well within the clinical threshold for interactive diagnostic systems. This supports the feasibility of incorporating NeuroFusionNet into neurology clinics and research centers without specialized hardware. To ensure usability by non-technical clinicians, we also designed a prototype clinician-facing interface that provides tabular summaries and interpretable visualizations such as SHAP and Grad-CAM heatmaps at the patient level (see Fig. 16 and Table  29). This improves accessibility by allowing physicians to interpret predictions in terms of brain regions and EEG frequency bands, rather than abstract machine learning scores.

These deployment-oriented evaluations confirm that NeuroFusionNet is not only accurate but also computationally efficient, clinically interpretable, and deployable in real-time EEG workflows. Figure  8 shows the confusion matrix which shows the high class-wise prediction accuracy with low misclassification across classes. Class-wise ROC curves presented in the Fig. 9 to evaluate the classifier’s discriminative capabilities; all classes had Area Under Curve (AUC) values greater than 0.96. To ensure robustness and prevent information leakage, cosine similarity was computed between training and test feature vectors. The mean similarity score was only 0.0063, reflecting negligible overlap and strong generalization. To enhance the interpretability of the proposed deep learning model, Shapley Additive Explanations (SHAP)47 were utilized. SHAP Kernel Explainer was used to determine how each feature contributed to the model’s predictions. Figure 11 presents the SHAP summary plot, highlighting the most influential features that drive the classification outcomes, thus providing transparency into the decision-making process.

To ensure interpretability beyond the PCA space, we performed SHAP back-projection to the original EEG feature domains (Table 24). For each principal component \(PC_k\) with loading vector \(\textbf{w}_k\), the contribution of an original feature \(f_j\) was reconstructed as:

$$\begin{aligned} \phi _j = \sum _{k} \text {SHAP}(PC_k) \times w_{jk}, \end{aligned}$$
(21)

where \(\text {SHAP}(PC_k)\) represents the attribution score of the \(k^{\text {th}}\) principal component, and \(w_{jk}\) is the weight of feature \(f_j\) in that component. This formulation allows identification of the most contributing handcrafted features and EEG channels contributing to each prediction, thereby linking model explanations directly to neurophysiological sources and improving clinical interpretability.

These findings not only validate the predictive performance and explainability of the proposed architecture, but also reveal the effectiveness of combining domain relevant and data-driven representations for AD detection. Although hybrid feature strategies have been explored in past EEG classification works, our approach introduces a novel and more principled integration mechanism. Unlike prior studies that rely on direct concatenation or early fusion of handcrafted and deep features, NeuroFusionNet applies a mid-level bottleneck fusion after feature selection. Handcrafted features are first selected through Pearson-correlation analysis and PSO-based feature selection to remove features with high correlation. These optimized handcrafted features are then concatenated with CNN based automated features. Furthermore, unlike prior works that treat hybrid features as a black box, our pipeline incorporates SHAP-based explainability to interpret feature contributions at both global and instance levels. To the best of our knowledge, this is the first EEG-based Alzheimer’s disease detection framework that jointly optimizes fusion depth, dimensionality reduction, and interpretability within a clinically aligned workflow.

To enhance clinical relevance and figure clarity, interpretability results were reorganized for readability and medical insight. Beyond numerical attribution, the SHAP–PCA analysis revealed physiologically coherent patterns. High values of theta over frontal and central electrodes (F3, F4, Cz) and reduced alpha power in parieto-occipital sites (Pz, O1, O2) emerged as dominant discriminators between cognitive states. Entropy and wavelet-derived metrics from temporal parietal channels (T5, T6, P3) further reflect declining neural complexity and local desynchronization. These observations confirm that the model’s SHAP attributions correspond to clinically recognized neurophysiological abnormalities, strengthening the translational validity of NeuroFusionNet.

Table 22 Performance comparison between eyes-closed (ds004504) and eyes-open (ds006036) conditions from the OpenNeuro dataset.
Table 23 Classification performance of NeuroFusionNet on the independent OSF EEG dataset, demonstrating external generalization across different demographics and acquisition protocols.
Table 24 Top contributing EEG feature domains and channels reconstructed from SHAP–PCA back-projection.
Fig. 14
figure 14

Cross-dataset validation results of NeuroFusionNet across ds004504 (eyes-closed), ds006036 (eyes-open), and OSF datasets with mean±SD.

Table 25 Cross-dataset generalization performance of NeuroFusionNet (trained on ds004504, tested on OSF).
Table 26 Deployment Configuration and Inference Performance of NeuroFusionNet.
Fig. 15
figure 15

Temporal Grad-CAM visualization across EEG channels and time for Alzheimer’s (AD), Frontotemporal Dementia (FTD), and Healthy Controls (HC) on ds004504. Brighter areas represent stronger model attention.

Fig. 16
figure 16

Grad-CAM projection onto 2D EEG scalp topographies (10–20 layout) for AD, FTD, and HC groups. Attention maps align with known cortical regions affected in dementia.

Figure 13 further illustrates the SHAP beeswarm plot, showcasing the top 20 Principal Components (PCs) that contribute towards improved classification. Each point represents a single test instance, colored by feature value and spread horizontally based on the SHAP value (impact on output). Features that show a larger distribution are more variable among samples, which suggests that they are important for differentiating between AD, FTD, and HC. To further evaluate the separability of the learned EEG feature representations, we applied t-distributed Stochastic Neighbor Embedding (t-SNE) to the final fused feature set.

To evaluate the robustness of NeuroFusionNet under different physiological conditions, we tested the model on the OpenNeuro eyes-open dataset (ds006036). This dataset includes the same 88 participants as the eyes-closed dataset, but with photic stimulation and eyes-open resting state EEG. Accuracy decreased slightly compared to the eyes-closed condition due to increased noise and variability in alpha activity, yet the model maintained strong performance across classes. Table  22 shows that NeuroFusionNet remains stable across resting state conditions, with only a minor performance drop in the eyes-open setting. This robustness against physiological variability is critical for ensuring reliable performance in real-world clinical deployments. We evaluated NeuroFusionNet on the independent OSF EEG dataset, which differs in demographics, acquisition device, and sampling frequency (128 Hz). The proposed model achieved strong and balanced performance across AD, MCI, and HC classes. As presented in the Table 23, NeuroFusionNet maintained balanced precision, recall, and F1-scores across all classes. Although performance was slightly lower than on the OpenNeuro datasets likely due to shorter recording duration and lower sampling frequency the results confirm that the framework generalizes effectively to independent datasets with differing acquisition protocols, strengthening its clinical relevance.

The results across the three evaluated datasets highlight the robustness and generalization capability of NeuroFusionNet. Performance remained consistently high in both eyes-closed and eyes-open conditions, indicating resilience to physiological variability. The model achieved balanced accuracy on the independent OSF dataset, despite differences in recording duration, demographics, and acquisition protocols. This demonstrates that NeuroFusionNet is not overfitted to a single dataset, but rather capable of adapting to heterogeneous EEG data distributions. Such cross-dataset stability is critical for clinical translation, where models must generalize beyond controlled research datasets to diverse real-world populations. Figure 14 provides a visual comparison of NeuroFusionNet’s performance across the three evaluated datasets. The model achieved the highest accuracy on the OpenNeuro eyes-closed dataset (94.27%), with only a marginal decrease under the eyes-open condition (92.15%). On the independent OSF dataset, performance remained strong (89.5% accuracy, 89.3% macro F1), despite differences in demographics, sampling rate, and recording duration. This consistency across datasets underscores the robustness of the proposed framework and its ability to generalize beyond a single recording protocol, reinforcing its potential for clinical deployment in heterogeneous populations.

To provide formal cross-dataset validation evidence, we conducted an external evaluation protocol where the model trained on the OpenNeuro eyes-closed dataset (ds004504) was directly tested on the OSF dataset without retraining. As shown in Table 25, the model achieved 88.4% accuracy and 88.0% macro F1 under this zero-shot transfer setting. This result confirms that NeuroFusionNet retains strong discriminative capability even when applied to unseen data distributions, supporting its potential for real-world clinical deployment.

To enhance the interpretability of the proposed model, we employed Grad-CAM-based visualization in two complementary representations. As shown in Fig. 15, the first approach maps class-specific attention across EEG channels over time for AD, FTD, and CN groups. The model emphasizes the frontal and temporal regions in AD samples, consistent with established neurophysiological degeneration patterns. In contrast, FTD exhibits broader frontal activity, while CN demonstrates a more uniform distribution of attention across channels. Figure 16 projects the Grad-CAM activations onto a 2D scalp topography based on the standard 10–20 electrode placement. This spatial projection facilitates a clearer mapping between model-derived features and anatomically relevant brain regions, such as F3, T3, and Pz, thereby improving clinical insight. For a quantitative summary of cortical region band contributions derived from Grad-CAM and SHAP analyses, see Table 28.

Collectively, these visualizations reinforce the model’s ability to focus on diagnostically meaningful EEG patterns, supporting both its classification performance and its alignment with known clinical biomarkers.

Figure  17 presents the SHAP waterfall plot for a single prediction instance. Positive feature contributions push the model output towards a higher probability for the predicted class, while negative contributions decrease the probability. PC_3 was observed to have the highest positive influence, whereas PC_1 slightly reduced the prediction probability. Such instance-level interpretability is critical for clinical trust, enabling clinicians to understand which EEG-derived latent features dominantly influenced a specific diagnostic decision. To facilitate clinician-facing summaries, Table 29 provides a standardized patient-level interpretability report template linking key EEG biomarkers with model confidence.

Fig. 17
figure 17

SHAP waterfall plot on ds004504 dataset for a representative instance. Positive contributions increase probability of predicted class, while negative contributions reduce it.

Table 27 Demographic subgroup analysis of NeuroFusionNet on the OpenNeuro (ds004504) dataset.
Table 28 Region–band importance (normalized, mean ± SD) from SHAP backtracked to original EEG features.
Table 29 Clinician-facing per-patient report (generated at inference). The report combines calibrated risk, top explanatory cues, confidence, and a brief note to support actionable decisions.

To provide deeper clinical insight into the SHAP-based model explanations, we traced the most influential principal components (PCs), as identified by SHAP, back to the original EEG feature space. PCA constructs each PC as a linear combination of the original handcrafted and CNN features, we computed the contribution of each original feature to the top SHAP-ranked PCs using the PCA loading matrix. Table  30 summarizes the top EEG features associated with each principal component based on loading magnitude. This analysis revealed that PC1 is primarily influenced by frontal delta activity (e.g., F3_Delta), while PC2 emphasizes alpha band power in both frontal and parietal regions. Additionally, PC3 and PC5 show high loadings for spectral entropy and power in the theta and alpha bands across F3, Fz, and P4.

To ensure that NeuroFusionNet’s performance was not biased by demographic composition, we conducted a subgroup analysis across gender and age distributions within the OpenNeuro ds004504 dataset. The dataset included 42 females and 46 males, spanning ages 55–85 years (mean ± SD: 69.3 ± 7.4 years). Classification performance was computed separately for each subgroup using 5-fold cross-validation. As shown in Table 27, no significant difference in accuracy or F1-score was observed across gender or age groups (two-tailed t-test, \(p>0.05\)), indicating that the model generalizes fairly across demographic strata.

To mitigate potential bias, all preprocessing and training steps were applied uniformly across subjects, and class imbalance was corrected using SMOTE augmentation during feature fusion. These steps ensure that subgroup distributions do not disproportionately influence decision boundaries. NeuroFusionNet demonstrates demographic robustness, with negligible sensitivity to gender or age variations an essential attribute for equitable clinical AI systems.

Table  28 summarizes group-level region–band effects derived from SHAP-to-feature backtracking. Positive values indicate higher importance in the first-named class comparison (e.g., AD>CN), and negative values indicate relative protection or reduced relevance. Table  29 provides the concise, per-patient report template that we generate at inference time to facilitate clinical interpretation and actionability. Building upon these interpretability visualizations, we next present a clinician-oriented summary that translates model explanations into region–band effects and a concise per-patient report.

Table 30 Top contributing EEG features for the first five principal components on ds004504, derived from the PCA loading matrix and traced back to original EEG features.

To contextualize the performance of NeuroFusionNet, we compared it against recently published approaches evaluated on the OpenNeuro ds004504 (eyes-closed) dataset. Table 32 summarizes this comparison. Prior works achieved accuracies ranging from 76.0% to 91.8%, with varying levels of sensitivity and specificity. Zheng et al.48 and Puri et al.49 reported accuracies of 91.8% and 91.0%, respectively, using CNN-based and ensemble approaches. NeuroFusionNet achieved an accuracy of 94.27%, with balanced precision, recall, and F1-scores of 0.94, as shown in Table 18. These improvements were obtained with a lightweight architecture (0.94M parameters) and efficient inference time (6.5 ms/sample), as reported in Table 21. Performance of the proposed method is not limited to the primary OpenNeuro dataset: consistent performance across ds006036 (eyes-open) and the independent OSF dataset (see Tables 22 and 23, Fig. 14) further validates that NeuroFusionNet generalizes beyond a single controlled dataset. In contrast, most baseline methods were only validated on isolated datasets, limiting their clinical applicability.

While this study did not directly involve human raters, the model’s diagnostic accuracy was contextualized using publicly available datasets Clinical neurophysiologists typically achieve 78–85% accuracy when visually differentiating Alzheimer’s disease (AD) from normal EEGs50,51,52, whereas, NeuroFusionNet attained 94.2% mean accuracy and an AUC of 0.96 on the OpenNeuro ds004504 dataset, outperforming the typical human range. Importantly, this does not replace clinical judgment; rather, it complements it by providing quantitative, reproducible support for early-stage detection. Such comparison highlights that the proposed framework reaches or exceeds expert-level reliability while maintaining transparency through SHAP-based interpretability. To provide a more comprehensive benchmarking context, we extended our comparison to include traditional machine learning methods that have been widely used in EEG-based Alzheimer’s disease detection. These approaches rely on handcrafted spectral and statistical features without deep representation learning. The inclusion of these baselines allows a clearer assessment of the advantages provided by the proposed hybrid NeuroFusionNet framework, which fuses handcrafted and CNN-based representations.

Table 31 Comparison of NeuroFusionNet with traditional non-deep learning baselines on the OpenNeuro (ds004504) dataset.
Table 32 Comparison of NeuroFusionNet with state-of-the-art methods on the OpenNeuro ds004504 (eyes-closed) dataset. NeuroFusionNet outperforms all baselines while maintaining computational efficiency.

As summarized in Table  31, machine learning classifiers provided lower accuracy compared to the hybrid NeuroFusionNet. The improvements of approximately 5–10% demonstrate that the integration of handcrafted features with CNN based automated features provide better feature set for accurate classification.

Table 31 provides comparison of the results obtained using proposed method with the traditional deep learning models. SVM, RF, and XGBoost provided reasonable performance, however, they lack the ability to model nonlinear spatiotemporal dependencies present in EEG signals. NeuroFusionNet, by combining handcrafted and deep features within a unified framework, achieved the highest mean accuracy (94.27%) and F1-score (0.94). This consistent improvement across metrics illustrates the superiority of hybrid fusion architectures over shallow learning approaches for complex neurophysiological data. The results demonstrate that NeuroFusionNet not only establishes state-of-the-art performance on the benchmark OpenNeuro ds004504 dataset (Table 32), but also exhibits resilience across different physiological conditions (ds006036, Table 22) and external generalization to an independent dataset (OSF, Table 23). The consistency of results across all three datasets, further summarized in Fig. 14, confirms that the proposed framework is not overfitted to a single controlled dataset but is instead adaptable to heterogeneous EEG distributions. This cross-dataset stability, combined with interpretable visualizations and lightweight computational efficiency, underscores NeuroFusionNet’s potential for translation into real-world clinical practice.

Conclusion and future work

We propose NeuroFusionNet, a hybrid, interpretable, and computationally efficient EEG-based framework for the early detection of AD, FTD and CN using EEG signals. By integrating handcrafted spectral, statistical, wavelet, and entropy descriptors with CNN based automated features, the model captures both domain-specific and data-driven neurophysiological signatures. The balanced fusion of these complementary features, reinforced through SMOTE augmentation and PCA-based dimensionality reduction, enables robust discrimination between AD, FTD, and CN controls. NeuroFusionNet achieved  94.27% accuracy and 0.94 macro-F1 on the benchmark OpenNeuro (ds004504) dataset, while maintaining stability across eyes-open (ds006036) and external OSF datasets. Cross-validation analyses demonstrated minimal variance (SD \(<0.3\%\)) across folds, confirming that high accuracy is not driven by data partitioning bias. External generalization and zero-shot transfer evaluations further underscored the model’s adaptability to heterogeneous EEG sources, demographics, and acquisition conditions, an essential requirement for clinical scalability. Proposed framework emphasizes interpretability and transparency. SHAP-based attributions and Grad-CAM visualizations revealed physiologically coherent biomarkers–including posterior alpha attenuation and frontal theta enhancement consistent with established Alzheimer’s EEG patterns. These interpretable outputs not only provide mechanistic insight into the model’s decision process but also enhance clinical trust and acceptance. Moreover, subgroup analyses demonstrated demographic fairness, with balanced accuracy across gender and age strata (difference \(<0.6\%\)), supporting the framework’s equitable applicability in real-world populations.

In future EEG signals can be combined with functional neuroimaging modalities like MRI, PET, and fNIRS to capture complementary spatial–temporal correlates of neurodegeneration. NeuroFusionNet can be extended to track disease progression and predict MCI conversion trajectories through temporal feature embeddings and survival modeling. Real-time Implementation of the model on edge devices (e.g., NVIDIA Jetson, Raspberry Pi) and portable EEG headsets can be done in future to enable continuous, point-of-care cognitive monitoring.