Introduction

Alzheimer’s Disease (AD) is a neuropsychological disorder that is associated with the loss of mental faculties especially in the elderly. It is an irreversible, progressive and most common form of dementia. As projected by Alzheimer’s Disease International, it’s numbers will rise to 152 million people in 2050 and estimated annual cost to $2 trillion in 2030. Early diagnosis of AD can help delay its progression1. These techniques offer the opportunity to diagnose AD early by detecting the changes in the brain in-vivo2. Machine Learning (ML) techniques have been successfully applied for the early detection of AD and can aid a neuropathologist in making a cost-effective decision. These techniques can remove inter- and intra-rater differences among observers and can be a time-efficient module to provide a scalable compensation for multiple diagnoses3.

Deep Learning (DL), a subtype of ML, allows feature extraction through non-linear transformations in an end-to-end fashion. It works by reducing the difference between ideal and current output through a loss function using backpropagation algorithm. The extracted features include shapes such as dots, lines, and object edges4. It can reduce overfitting and achieve generalization whilst providing more information from a reduced set of samples. These techniques can effectively diversify the datasets, increasing the performance of DL architectures on the underlying task.

Hadeer A. Helaly et al.1 proposed a deep learning model, based on a pre-trained VGG-19 model, named E2AD2C for multiclass classification between AD, early Mild Cognitive Impairment (EMCI), late MCI and Normal Control (NC) classes achieving an accuracy of 97%. Carol Y. Cheung et al.5 used retinal images to train a binary classifier based on the EfficientNet-b2 network, achieving an accuracy of 83.6%. Janani Venugopalan et al.6 proposed an approach that uses stacked denoising auto-encoders and 3D Convolutional Neural Networks (CNNs) to extract features from genetic, clinical, and imaging data for multiclass classification into NC, MCI, and AD classes, achieving an accuracy of 79%. Shangran Qiu et al.7 report a DL framework to identify NC, MCI, AD, and non-AD dementias utilising imaging and non-imaging datasets, achieving an accuracy of 55.8%. Sheng Liu et al.8 developed an approach to utilize 3D-CNNs achieving area under the curve (AUC) of 85.12% for NC identification, 62.45% for MCI identification, and 89.21% for AD identification tasks. Marwa El-Geneedy et al.9 proposed a DL pipeline utilising MRI images for a multiclass classification task between NC, very mild dementia, mild dementia, and moderate dementia classes, achieving an accuracy of 99.68%. Andrea Loddo et al.10 proposed an ensemble approach for binary (AD/non-AD) and multiclass Classification tasks achieved accuracies of 98.51% and 98.67% for both cases, respectively, using MRI and functional MRI image features.

Suriya Murugan et al.11 proposed a DL architecture utilising 2D-CNN layers and MRI images for multiclass classification between very mild demented, mild demented, moderate demented, and non-demented subjects, achieving an accuracy of 95.23% on this task. Serkan Savaş12 compared the performances of 29 pre-trained models and found the accuracy of EfficientNetB0 model to be the highest at 92.98%. F M Javed Mehedi Shamrat et al.13 proposed a fine-tuned CNN architecture. They compared the performances of VGG16, MobileNetV2, AlexNet, ResNet50 and InceptionV3 architectures and found the performance of InceptionV3 architecture to be the best. They further modified the InceptionV3 architecture achieving an accuracy of 98.67% in identifying all five stages of AD and the NC class. Prasanalakshmi Balaji et al.14 proposed a hybrid DL approach combining CNN and Long Short Term Memory (LSTM) architectures and utilizing information from MRI and PET scans to achieve an accuracy of 98.5% in classifying cognitively normal controls from early MCI subjects. Pan et al. confirmed the recent finding that advanced, deep visual representation models are able to reproduce complex stimuli, indicating that highly flexible AI architectures can truly capture faint and fine details in biomedical imaging. This type of representation learning could potentially be applied to enhance the accuracy of FDG-PET-based diagnosis of Alzheimer’s15.

Zhu16 developed an AI classification model to identify memory impairment, which showed that AI was a versatile tool for identification of cognitive deficits. These same strategies could be useful in the development of PET-based diagnostic models for Alzheimer’s disease. Yin et al.17 presented an innovative feature fusion and temporal modelling framework for biomedical signal-based emotion recognition problems, proving the effectiveness of hybrid deep learning architectures. These fusion approaches are extendable to the improvement of tracer imaging analysis for early Alzheimer’s detection using PET signals.

Data augmentation is a less targeted area in the early diagnosis of AD using DL approaches. There is a need for further exploration of data augmentation techniques for small size datasets. Due to scarcity of samples in AD datasets because of high costs, data augmentation is of great interest in learning the features during training of DL architectures18.

In this article, we carried out experiments using PET scans. Six data augmentation methods are chosen: ellipsoidal averaging, Laplacian of Gaussian (LoG), local Laplacian, local contrast, Prewitt-edge emphasizing, and unsharp masking. We selected these techniques to improve the robustness of DL architectures, extract better features, reduce noise, focus on relevant structures, and enhance sensitivity to detect small variations in brain activity and structure.

This article is organised as follows: Dataset Description, which includes patient inclusion criteria and socioeconomic information; and the Methods section, which outlines all data augmentation strategies and deep learning architectures used for classification tasks. Results: Shows the result of an experiment and reflects possible interpretation in the Discussion segment. The conclusion section summarizes the main findings of the study.

Dataset description

Demographics of the subjects are summarized in Table 1. Values are expressed as mean (min-max), obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database website.

Table 1 Demographics of subjects considered in the study.

Methods

Data sources The data used in this study were downloaded from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. ADNI is a long-term, multicenter study designed to develop and test methods for the early detection of Alzheimer’s disease. Established in 2004, the initiative compiles standardized, high-quality data from subjects at various research sites—cognitively normal individuals, those with MCI and patients diagnosed with Alzheimer’s disease.

Data augmentation techniques

We chose six methods: ellipsoidal averaging, LoG, local Laplacian, local contrast, Prewitt- edge emphasizing, and unsharp masking; and studied their impact on the early diagnosis of AD using 3D PET scans. We considered only positive values. A description of these methods is provided next.

Ellipsoidal averaging

The 3D ellipsoidal averaging filter is a high-quality filter commonly utilized to enhance the smoothness of volumes. It relies on the insight that the affine mapping establishes a skewed 2D system around a source pixel. An ellipsoidal projection is subsequently calculated around this source pixel, which is utilized to filter the source image using a Gaussian whose inverse covariance matrix is represented by this ellipsoid. We used ‘fspecial3’ and ‘imfilter’ functions in MATLAB to implement the 3D ellipsoidal averaging filter.

Laplacian of Gaussian

The 3D LoG filter is commonly used to detect edges in volumes. It works by calculating the second derivative in the spatial domain. The LoG response can be zero, positive, or negative depending on its distance from the edge. The 2D LoG function that is centered on zero and with Gaussian standard deviation has the following mathematical form:

$$\:LoG\left(x,y\right)=\:-\:\frac{1}{\pi\:{\sigma\:}^{4}}[1-\frac{{x}^{2}+{y}^{2}}{2{\sigma\:}^{2}}]{e}^{-\frac{{x}^{2}+{y}^{2}}{2{\sigma\:}^{2}}}$$
(1)

We used ‘fspecial3’ and ‘imfilter’ functions in MATLAB to implement the 3D LoG filter.

Local laplacian

Local Laplacian is a type of filter that uses the amplitude of edges and smoothing of details with a Laplacian to control the dynamic range of an image. It can be deployed to increase the local contrast of a coloured image, to perform edge-aware noise reduction, as well as to smooth image details. We used ‘locallapfilt’ function in MATLAB to implement 3D Local Laplacian filter. We set to 0.4, and to 0.5.

Local contrast

Local contrast can be used to increase or decrease the local contrast of an image. It works by controlling the desired smoothing as well as intensity of strong edges. We used ‘localcontrast’ function in MATLAB to implement 3D Local contrast filter. We set ‘edgeThreshold’ to 0.4, and ‘amount’ to 0.5.

Prewitt-edge emphasizing

The Prewitt operator can be used to detect edges in both horizontal and vertical directions in an image using first-order derivatives. It is a separable filter because it’s kernels can be decomposed by averaging and differentiation operations. We used ‘fspecial3’ and ‘imfilter’ functions in MATLAB to implement the 3D Prewitt-edge emphasizing filter.

Unsharp masking

Sharpness is the difference in color between two or more colors. A strong transition from black to white is achieved quickly. It appears hazy as black gradually changes to grey and then to white. An image is sharpened by removing a blurry (unsharp) version of itself. This method is known as unsharp masking. We used ‘imsharpen’ function in MATLAB to implement the 3D unsharp masking operation.

Deep learning architectures

Figure 1 shows the generic architecture that we used throughout experiments. We input a 3D volume of size 79 × 95 × 69, normalised using a zero-centre normalisation procedure that subtracts the mean calculated from each channel.

Fig. 1
figure 1

Architecture used in the experiments.

Fig. 2
figure 2

Strided DL architecture for AD-NC task.

In addition, we also performed experiments using the strided convolution architecture given in Fig. 2, without augmentation for this task. We used 10 filters in the convolutional 3D layer.

A convolution layer is then used to extract the features in the input volume. We used a small kernel of size 3 × 3 × 3 to combine local features effectively. To further optimize the network, we apply weight and bias L2 regularization in order to reduce overfitting. A batch normalization layer is then used to mitigate internal covariance shift by normalizing the observations across channels independently. After that, we used Exponential Linear Unit (ELU) activation layer which the following equation can describe:

$$\:ELU\left(x\right)=\:\left\{\begin{array}{c}x,\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:x\ge\:0\\\:\alpha\:\left({e}^{x}-1\right),\:\:\:x<0\end{array}\right.$$
(2)

After that, we used the max-pooling operation to reduce the size of the feature maps by selecting the maximum value in a neighbourhood of pixels. We ensure that pooling regions do not overlap by keeping stride equal to the corresponding pool size of 2 in all dimensions.

We then applied fully connected or dense layers to capture global patterns followed by a softmax layer that works by applying exponential function to each element of the input and further normalizing these values according to the following equation:

$$\:softmax\left({z}_{i}\right)=\:\frac{{e}^{{z}_{i}}}{\sum\:_{j=1}^{K}{e}^{{z}_{j}}}$$
(3)

Finally, we applied classification layer that works by computing the cross-entropy loss for classification given by the following equation:

$$\:{CE}_{loss}=\:-\frac{1}{N}\sum\:_{n=1}^{N}\sum\:_{i=1}^{K}{w}_{i}{t}_{ni}\text{ln}{y}_{ni}$$
(4)

In Eq. 4, ‘N’ is the number of samples, ‘K’ is the number of classes, wi is the weight for class ‘i’, ‘tni’ indicates that sample ‘n’ belongs to class ‘i’, while ‘yni’ is the output for sample ‘n’ for class ‘i’.

We used 100 neurons in Fully-Connected (FC) layer 1, 30 neurons in FC layer 2 and two (binary classification) or three (multiclass classification) neurons in FC layer 3. We shuffle the training set after every epoch during training. We used ‘Adam’ as an optimiser, set the initial learning rate to 0.001, and trained the model for 50 epochs with a mini-batch size of 2. We also periodically dropped the learning rate after every 10 epochs by multiplying it by a factor of 0.1.

AD-NC binary classification tasks

For the AD-NC classification task without augmentation, we employed 10 filters in the convolutional 3D layer. For tasks involving augmentation, such as unsharp masking, Prewitt-edge emphasising, local Laplacian, local contrast, LoG, and ellipsoidal averaging, we utilised 15 filters in the convolutional 3D layer. We also performed experiments combining Prewitt-edge emphasizing augmentation and LoG augmentation schemes for this task. We used 15 filters in the convolutional 3D layer.

AD-MCI binary classification tasks

We used 10 filters in the convolutional 3D layer while for tasks involving unsharp masking augmentation, Prewitt- edge emphasizing augmentation, local Laplacian augmentation, local contrast.

We employed augmentation methods, including LoG augmentation and ellipsoidal averaging augmentation, and utilised 15 filters in the convolutional 3D layer. We also performed experiments combining Prewitt-edge emphasizing augmentation and LoG augmentation schemes for this task. We used 15 filters in the convolutional 3D layer.

MCI-NC binary classification tasks

We employed 10 filters in the convolutional 3D layer. For tasks involving augmentation, such as unsharp masking, Prewitt-edge emphasising, local Laplacian, local contrast, LoG, and ellipsoidal averaging, we utilised 15 filters in the convolutional 3D layer. We also performed experiments combining Prewitt-edge emphasizing augmentation and LoG augmentation schemes for this task. We used 15 filters in the convolutional 3D layer. In addition, we also performed experiments combining LoG, Local Laplacian, and Prewitt- edge emphasizing augmentation techniques for this task. We used 20 filters in the convolutional 3D layer.

AD-MCI-NC multiclass classification tasks

We employed 10 filters in the convolutional 3D layer. For tasks involving augmentation, such as unsharp masking, Prewitt-edge emphasising, local Laplacian, local contrast, LoG, and ellipsoidal averaging, we utilised 15 filters in the convolutional 3D layer. We also performed experiments combining Prewitt-edge emphasizing augmentation and LoG augmentation schemes for this task. We used 15 filters in the convolutional 3D layer.

Results

We deployed 5-fold Cross-Validation (CV) approach in our experiments. We used sensitivity (SEN), specificity (SPEC), F-measure, precision and balanced accuracy as performance metrics for the methods in Tables 2, 3, 4 and 5. In Table 5, we defined these metrics for each class.

For binary classification tasks, their definitions are given as follows:

$$SEN=\:\frac{TP}{TP+FN}$$
(5)
$$SPEC=\:\frac{TN}{TN+FN}$$
(6)
$$F-measure=\:\frac{2TP}{2TP+FP+FN}$$
(7)
$$Precision=\:\frac{TP}{TP+FP}$$
(8)
$$Balanced \:accuracy=\:\frac{SEN+SPEC}{2}$$
(9)

For multiclass classification, true positive (TP), true negative (TN), false positive (FP) and false negative (FN) are defined with the help of confusion matrix given in Fig. 3 as follows:

Fig. 3
figure 3

Confusion matrix for the multiclass classification task.

$$TP (AD) = cell \:1$$
(10)
$$FN (AD) = cell \:2 + cell \:3$$
(11)
$$FP (AD) = cell \:4 + cell \:7$$
(12)
$$TN (AD) = cell \:5 + cell \:6 + cell \:8 + cell \:9$$
(13)
$$TP (MCI) = cell \:5$$
(14)
$$FN (MCI) = cell \:4 + cell \:6$$
(15)
$$FP (MCI) = cell \:2 + cell \:8$$
(16)
$$TN (MCI) = cell \:1 + cell \:3 + cell \:7 + cell \:9$$
(17)
$$TP (NC) = cell \:9$$
(18)
$$FN (NC) = cell \:7 + cell \:8$$
(19)
$$FP (NC) = cell \:3 + cell \:6$$
(20)
$$TN (NC) = cell \:1 + cell \:2 + cell \:4 + cell \:5$$
(21)

Tables 2, 3, 4 and 5 present the results for this study.

Table 2 Results for AD-NC binary classification task.

Table 3 shows the performance metrics of different data augmentation techniques on a classification model.

Table 3 Results for AD-MCI binary classification task.

Table 4 presents the results for the MCI-NC binary classification task, evaluating various augmentation techniques.

Table 4 Results for MCI-NC binary classification task.

In Table 5, we present the results in the following format: AD, MCI, NC. For example, for no augmentation method, SEN for AD class is 0.7340, SEN for MCI class is 0.3711, while SEN for NC class is 0.6275.

Table 5 Results for AD-MCI-NC multiclass classification task.

Discussion

In Tables 2, 3, 4 and 5, it can be seen that augmentation may help in achieving better outcomes. The performance of the Prewitt-edge emphasising augmentation scheme for AD-MCI and AD-MCI-NC classification tasks is quite strong. Similarly, LoG augmentation shows excellent performance for AD-MCI and AD-NC classification tasks, whereas Local Laplacian augmentation performs well for the MCI-NC classification task. Furthermore, it can be noted that combining augmentation techniques may not result in better performances. The modified architecture, which uses strided convolution instead of maxpooling layers, does not perform as well as augmentation techniques without strided convolution. For MCI-NC classification task, we found the performance of local Laplacian augmentation to be the best. For both AD-MCI-NC and AD-MCI classification tasks, we found the performance of Prewitt-edge emphasizing augmentation to be the best. Finally, for AD-NC classification task, we found the performance of LoG augmentation to be the best. The differences in performance can be tied to the specific nature of the changes in brain structures at different stages of AD. Local Laplacian augmentation enhances both the edges and fine details in a PET scan, allowing the DL model to focus on regional variations in brain structures that are indicative of disease progression, especially occurring in the MCI stage. LoG augmentation emphasises rapid intensity changes, such as boundaries between different brain regions, making it easy for DL models to detect early changes in AD, especially in critical areas like the hippocampus. Thus, it could be a suitable candidate for NC to AD related progression detection. Prewitt-edge emphasising augmentation highlights transitions between different brain structures, revealing early signs of atrophy. AD typically affects specific brain regions like the hippocampus, and edge emphasizing ensures that these regions can be accurately detected and classified by DL models.

From these results, it is clear that the methods that utilized information present in the derivatives are the better performing ones. The Prewitt operator is a discrete differentiation operator that gives the direction of largest possible increase from high to low intensity. At regions of constant image intensity, it gives a zero vector. LoG can be used for blob detection. It uses a Gaussian kernel to return positive responses for dark blobs and negative responses for bright ones. It’s responses are covariant with affine transformations in the image domain. Laplacian of an image utilizes information in the second derivatives. It crosses zero at edge. LoG combines Gaussian filtering with Laplacian for the detection of an edge. This approach has the advantage of isolating noise points and small structures, particularly those in the brain, which can be filtered out effectively. This is especially true when the image contrast across the edge is combined with the slope of the zero crossing.

While traditional approaches for feature description may require the expertise of a designer, DL systems are end-to-end systems that extract features in the absence of such expertise. We deployed Convolutional Neural Networks in the present study because of their ability to partition the feature space using nonlinear boundaries for classes. They can learn classification boundaries in their feature spaces despite their limitations given a carefully selected training set. It is recognized that neuropsychiatric symptoms such as agitation and aggression, have a strong link with cognitive impairment connected with AD. Changes in amygdala, frontal cortex, hippocampus, occipital cortex and other brain areas trigger neuroinflammation and neuronal dysfunction during early stages of AD and can be captured effectively by a PET scan19. DL models can be deployed to represent these changes effectively by processing PET scans which can contribute to the identification of vital factors leading the way for personalized treatments.

Table 6 provides a comparison of the proposed method to existing techniques available in the literature for different Alzheimer’s disease classifications using PET data. Our results demonstrate that our method is consistently better than previous approaches in different classification scenarios, obtaining the best accuracy for AD–NC, AD–MCI, MCI–NC and AD–MCI-NS task.

Table 6 Comparison with other methods reported in the literature.

As seen in Table 6, our LoG- based augmentation approach produces better results for AD-NC binary classification task in comparison to box filtering approach which could be due to the fact that LoG identify changes in image intensity effectively preserving edges while reducing noise sensitivity while box filtering blurs an image by averaging out the neighborhood pixels defined by the kernel.

For the AD-MCI binary classification task, our approach, which emphasises augmentation based on Prewitt-edge features, outperforms the architecture-based approach in Ahsan et al.21. Combined with the fact that the architecture in the present study has not employed dropout before softmax, which essentially rules out the possibility of dropping such useful features, and the powerful horizontal and vertical edge detection by Prewitt-edge emphasising filtering method, better performance in the recognition of continuum defined by AD and MCI can be achieved.

For the MCI-NC binary classification task, our approach based on Local Laplacian augmentation has outperformed other approaches, especially those based on median filtering augmentation. This could be due to the fact that our approach uses the Laplacian pyramid effectively, decomposing an image into different frequency bands and locally enhancing details near each pixel. In contrast, the median filtering approach has the potential to blur fine details and reduce effectiveness against Gaussian noise, thus effectively reducing accuracy in this task. This superiority may be attributed to the blurring effect of Gaussian filtering, which reduces high-frequency components and fine details, potentially hindering the performance of deep learning architectures in this context.

New AI-based healthcare research has offered methodologies that can be applied to improve the diagnosis of Alzheimer’s disease with PET. Pan et al.23 found that combining decision-level fusion across multiple data domains can greatly enhance the classification accuracy of cognitive state, which may provide new guidance for incorporating multimodal PET biomarkers. Luan et al. For the detection of early-stage disease, PET scan images have suboptimal spatial resolution that can be improved with the recently emerging deep learning method for super-resolution imaging by Zhu et al.24. Zhan et al. Based on their results25, Taskar and his team recommended that the choice of the algorithm design can highly affect the precision in brain analysis as well, suggesting to use computational techniques optimized for PET data processing. Li et al. Using machine learning on physiological data for the diagnosis of age-related diseases, AI models show their promise in non-invasive assessments as well in neurodegenerative conditions26. Wang et al.27 emphasised the convergence of novel therapy with validated clinical treatment strategies, and integrations into multi-disciplinary team approaches as advancing peri-diagnosis AI-enhanced PET reading would greatly improve early diagnostic endeavour and patient guidance.

Conclusion

This study compared and contrasted six data augmentation methods: ellipsoidal averaging, LoG, local Laplacian, local contrast, Prewitt-edge emphasizing, and unsharp masking. We used PET neuroimaging modality scans from ADNI database for the early diagnosis of AD. Furthermore, we considered three binary classification problems: AD-NC, AD-MCI, and MCI-NC, as well as one multiclass classification problem, AD-MCI-NC. We also combined data augmentation methods and tried a modified strided convolution architecture for all these tasks. We found the performances of Prewitt-edge emphasizing, LoG and local Laplacian augmentation methods to be the best. In the future, we plan to study the impact of other data augmentation methods, such as Sobel-edge emphasising, superpixel over-segmentation, numerical gradient, directional gradient, GANs, etc., on the early diagnosis of AD using novel DL architectures.