Abstract
Alzheimer’s disease (AD) is characterized by a progressive spread of neurofibrillary tangles (NFT), beginning in the medial perirhinal cortex (mPRC), advancing to the entorhinal cortex (ERC), and subsequently involving the hippocampus, lateral perirhinal cortex (lPRC), and the rest of the brain. Given the close relationship between NFT accumulation and neuronal loss, the mPRC reflects a promising structural marker for early diagnosis of AD. However, only limited tools that automatically measure the cortical thickness of the mPRC are currently available. Utilizing the nnU-Net framework, we trained models on structural MRI of 126 adults, with manually segmented labels as ground truth. These models were then applied to an independent dataset of 103 adults (comprising patients with Alzheimer’s dementia, amnestic mild cognitive impairment (aMCI), and healthy controls). High agreement was observed between manual and automated measurements of cortical thickness. Furthermore, we found significant atrophy in the Alzheimer’s dementia group in the mPRC, ERC, and lPRC compared to healthy controls. Comparison of the aMCI group and healthy controls revealed significant differences in the ERC only. The results underscore the utility of our automated segmentation tool in advancing Alzheimer’s research.
Similar content being viewed by others
Introduction
Discovering the earliest signs of Alzheimer’s disease (AD) and thereby starting treatment as early in the disease progression as possible is of great interest1,2. Neuropathological brain changes associated with AD, namely β-amyloid plaques and neurofibrillary tangles (NFT), are thought to begin years before clinical symptoms become evident. In contrast to β-amyloid plaques, NFT are more strongly correlated with cognitive deficits3 and progress in a hierarchical manner throughout the brain in typical AD4. This continuous accumulation of NFT is strongly related to loss of neurons4,5 and has been shown to be causally associated to cerebral atrophy in affected regions6. In regard to the sporadic AD (representing approximately 95% of all cases) the first region typically affected by neurofibrillary tau pathology is the medial perirhinal cortex (mPRC), which approximately corresponds to the transentorhinal cortex and Brodmann area 354,7,8. In later stages, this neurofibrillary tau pathology spreads to the medially located entorhinal cortex (ERC) and eventually to the hippocampus, and throughout the brain4,9,10. In clinical settings, the diagnosis of AD often relies on visual assessment of atrophy, such as the ERC atrophy score (ERICA11) along with the medial temporal lobe atrophy score that evaluates the hippocampus, the choroid fissure, and the lateral ventricle (MTA12). In this context, the mPRC is often overlooked, and if considered, it is usually in form of the entire perirhinal cortex. Given its relatively small size, we believe that assessing mPRC integrity in clinical settings could be enhanced by using a computed measure of atrophy (e.g., cortical thickness). Recent studies are in line with this notion, reflecting the potential of the mPRC’s integrity in clinical settings. For example, Sone et al.13 discovered that in early stages of AD, regional NFT accumulation is associated with cortical thinning in the perirhinal cortex and ERC. As a result, brain structures that are initially impacted by NFT induced atrophy (e.g., mPRC) likely serve as sensitive preclinical structural imaging biomarkers. Indeed, more recent studies, with focus on mPRC atrophy, found promising results suggesting mPRC integrity as a sensitive and specific marker for early AD (e.g14,15,16). In addition, these findings further indicate, that the lateral part of the perirhinal cortex (lPRC, most likely encompassing Brodmann area 36) is only affected after the mPRC, which is in line with the proposed staging of NFT-related pathology4,17. Based on these results, we hypothesize that investigating the cortical thickness of mPRC and lPRC separately is more sensitive in early-stage AD than the cortical thickness of the entire perirhinal cortex.
Nonetheless, the perirhinal cortex, particularly the mPRC, remains underrepresented in AD research and diagnostics. One contributing factor is the ongoing debate regarding its precise anatomical boundaries in the human brain46. Foundational work in nonhuman primates47,48 have shaped our understanding of the medial and lateral subdivisions of the PRC. These areas are notably larger in humans and are critically involved in the progression of tau pathology in AD, highlighting their clinical relevance. Although manual segmentation protocols, such as those based on the cytoarchitectonic work of Insausti et al.49, enable differentiation between the mPRC and lPRC (e.g.,14,50), only a limited number of studies have adopted these methods. Anatomically, the transition from the mPRC to the lPRC is defined by the collateral sulcus of the medial temporal lobe, which exhibits considerable inter- and intraindividual variability (e.g., differences in length and form of the sulcus)8. This variability presents a major challenge to accurately segment the mPRC and lPRC regions, as it profoundly influences the delimitation of their boundaries (for an insight into manual segmentation see18). In a recent study we demonstrated an excellent inter-rater reliability between two raters using an existing manual segmentation protocol for the mPRC, lPRC, and ERC, which takes collateral sulcus variability into account18. Although manual segmentation for quantification of brain regions from MRI is the gold standard, it comes with the disadvantage of time-consuming implementation and is therefore not feasible for clinical and research setting19. Earlier work (e.g42,43), addressing this challenge, used atlas-based tools, which create standardized templates by spatially aligning anatomical structures across individuals. These robust, atlas-based approaches, use the manually annotated atlases as templates that are co-registered to brain image data and subsequently use label fusion followed by an AdaBoost classifier to derive the final segmentation. In contrast, the proposed deep-learning-based method directly predicts the class label from the input data. The algorithm learns the anatomical variability of the collateral sulcus from the training data and has a smaller tendency to the average. It is thus expected to better capture inter-individual differences, while possibly being less robust20.
Advancements in algorithm and computation resources over time have significantly propelled the development of different segmentation techniques for neuroimaging, such as FreeSurfer21 or Statistical Parametric Mapping (SPM22). Parallelly, machine learning approaches based on convolutional neural network (CNN) architectures (e.g., U-Net) are experiencing a growing trend20. Against this backdrop, we aimed to develop an automated segmentation tool based on U-Net, a generic deep-learning-based software package for cell detection and cell segmentation, which can be trained and applied to new data. In addition, it is customizable, which allows an adaption to specific challenges23,24. Using the automated segmentation tool, we aimed to replicate the results described by Krumm et al.14 to evaluate a potential clinical benefit. The study found a significant atrophy in the mPRC and ERC when comparing both Alzheimer’s dementia patients (dAD) as well as amnestic mild cognitive impairment (aMCI) patients with healthy controls. Notably, atrophy in the lPRC was observed exclusively in the dAD group, aligning with the NFT distribution pattern in early AD4,14,17. While Krumm et al.14 used a manual segmentation protocol to extract cortical thickness, we replicate the study using an automated segmentation method in the identical sample. This allows us to compare the automated segmentation to the manual segmentation for the cortical thickness of brain regions first affected by atrophy in typical AD (e.g., mPRC, ERC). A reliable automated segmentation, especially of the mPRC, would facilitate the use in research and clinical settings to improve early detection of AD.
Materials and methods
Participants and MRI acquisition
Training data set
The training data set (N = 126, mean age = 69.8 ± 10.8 years) consisted of 101 patients and 25 healthy control participants (NC). Written informed consent was obtained from all individuals prior to participation and the study was approved by the local ethics committee (EKNZ: Ethics Committee of Northwestern and Central Switzerland). All methods were performed in accordance with the relevant guidelines and regulations. NCs were recruited from the “Registry of Healthy Individuals Interested to Participate in Research” of the Memory Clinic FELIX PLATTER Basel, Switzerland. They had undergone a thorough medical screening and neuropsychological testing to confirm their cognitive health. In particular, the exclusion criteria encompassed severe impairments in auditory, visual, or speech abilities; substantial sensory or motor deficits; severe systemic illnesses; persistent moderate to intense pain; conditions with significant or likely effects on the central nervous system (e.g., neurological disorders such as cerebral-vascular disease, generalized atherosclerosis, and psychiatric disorders); and the use of potent psychoactive substances, except for mild tranquilizers. In addition, all individuals classified NC obtained standard scores within the normal range on the Mini-Mental State Examination (MMSE)25, California Verbal Learning Task26, Clock Drawing Test (Critchley, 1953), and the short version of the Boston Naming Test27. Of the 101 patients, 29 participants were diagnosed with mild cognitive disorder (MCI) according to DMS-IV28. 26 participants were diagnosed with Major Depression (MD) including 14 participants recruited from the Memory Clinic FELIX PLATTER Basel, Switzerland, and 12 recruited from the University Psychiatric Clinics Basel, Switzerland. MDs had to score 10 or more points on the Becks Depression Inventory29, 13 or more on Becks Depression Inventory-II30, or 6 or more points on the Geriatric Depression Scale31. 8 participants were diagnosed with aMCI32 according to DSM-IV28 and Winblad et al. (2004) criteria. 18 participants were diagnosed with dementia due to AD (dAD) according to DSM-IV criteria28, and NINCDS-ADRDA33. aMCI and dAD were combined to one AD group (N = 26) based on the assumption that the progression from aMCI to early dementia stage of AD is gradual and time of diagnosis can differ34. Further, 20 patients were diagnosed with dementia due to other etiologies than AD (non-AD; e.g., due to Lewy body disease) according to DSM-IV. For an overview see Table 1. All patients had been recruited either from the Memory Clinic FELIX PLATTER Basel, Switzerland, where they had received neuropsychological testing, and medical and neurological examinations including blood analyses, or in the case of the 12 MDs from the University Psychiatric Clinics Basel, Switzerland. All participants were native Swiss-German or German-speaking adults.
Participants received T1-weighted 3D magnetization-prepared rapid acquisition gradient echo (MPRAGE) structural MRI using the same 3-Tesla scanner (MAGNETOM Skyra fit, Siemens; inversion time = 900 ms, repetition time 2300 ms, echo time 2.92 ms, flip angle = 9; acquisition matrix = 256 × 256 mm, voxel size = 1 mm isotropic, acquisition time = 5 min 12 s) at the University Hospital Basel, Switzerland.
Test data set
The test data set (N = 103, mean age = 76.4 ± 7.0 years) is identical to the one used for group comparison in Krumm et al.14 and contained 46 healthy control participants (NC), 34 participants diagnosed with early Alzheimer’s dementia (dAD) according to NINCDS-ADRDA and DSM-IV criteria28 and 23 patients with amnestic mild cognitive disorder (aMCI) according to DSM-IV and Winblad et al.35 criteria (see Table 2). For a comprehensive overview of the inclusion and exclusion criteria, see Krumm et al.14. All patients had been recruited from the Memory Clinic FELIX PLATTER Basel, Switzerland, where they had received neuropsychological testing, and medical and neurological examinations including blood analyses. All participants were native Swiss-German or German-speaking adults.
Participants received T1-weighted 3D MPRAGE structural MRI using the same 3-Tesla scanner (MAGNETOM Verio, Siemens; inversion time = 1000 ms, repetition time 2000 ms, echo time 3.75 ms, flip angle = 8; acquisition matrix = 256 × 256 mm, voxel size = 1 mm isotropic, acquisition time = 7 min 30 s) at the University Hospital Basel, Switzerland.
Preprocessing of structural MRI and manual segmentation
MRI scans were preprocessed using FreeSurfer (Massachusetts General Hospital, Boston, MA, USA; http://surfer.nmr.mgh.harvard.edu; accessed on 7 January 202036,37). In a semi-automated processing stream, FreeSurfer segmented the T1-weighted 3D MPRAGE volumes into grey and white matter. Next, the surface of white matter, represented by the transition area from white to grey matter, and the pial surface were modeled36. Lastly, tissue classification was visually confirmed for all participants, and, if required, manual adjustments were performed. Regions of interest (ROIs; i.e., mPRC, lPRC, and ERC) for both hemispheres were manually drawn by a blinded rater on coronal slices, according to the protocol depicted in Krumm et al.14, which takes collateral sulcus variation into account (for visual examples of the anterior-posterior borders of manual segmentation, see18).
Training and application of automated segmentation
The semi-automatic labels were mapped to the gray matter obtained by Freesurfer and transformed to the 3D voxel space to create regional masks for mPRC, lPRC and ERC. Using each of the masks, we trained a separate network to segment the respective region as a voxel mask (for examples see Supplementary material). The predicted voxel mask was then mapped back to the Freesurfer space to compute morphological characteristics such as the average cortical thickness. We used the nnU-Net38 framework to train the networks. The nnU-Net23,24 is a toolbox to train 2D and 3D U-Nets, specifically optimized for user-friendly model training and selection with biomedical imaging data. The U-Net23,24 is a multi-stage neural network architecture for semantic segmentation. The input image, a T1 weighted MRI in this work, is processed on multiple resolution levels. The features from the analysis path (with increasing voxel size) are combined with the features from the synthesis path (with decreasing voxel size) at every resolution level except the lowest. This leads to an effective combination of high-level features with large spatial context and low-level features with small spatial context. The output is a pixel-wise semantic segmentation. At the border of regions, the class labels are ambiguous. For example, a pixel contains 50% of two classes due to interpolation. To better account for this ambiguity, we substitute the default sparse cross-entropy loss with dense cross-entropy loss that was capable of modeling a full probability distribution. The conversion from surface-based annotations to voxel label and back were done with Freesurfer. Eventually, we trained a separate network for the ERC, mPRC, and lPRC, respectively for 150 epochs.
The inference of the MRI data was performed without additional pre-processing. In two cases, the prediction of one of the masks failed and could not be projected to the FreeSurfer space to cortical thickness values (e.g., ERC right hemisphere for one participant, lPRC right hemisphere for another participant). To ensure the accuracy of the automated segmentations, we performed a quality control assessment on a subset of 60 participants, with 20 randomly selected from each diagnostic group (healthy controls, aMCI, and AD). The process involved a detailed visual inspection of coronal slices in FreeSurfer, focusing on key anatomical landmarks such as the medial and lateral borders of all ROIs (ERC, mPRC, and lPRC). Each segmentation layer was inspected systematically from the anterior to posterior border to detect any gross overextensions, under-segmentations, or incorrectly labeled pixels. A significant deviation would have included segmentation labels being entirely misplaced outside the medial temporal lobe, gross misplacement of the ROI, such as segmentation labels extending well beyond the expected anatomical boundaries, extensive gaps within the ROI where relevant pixels belonging to the cortical structure were consistently excluded, or a complete absence of labeled pixels for a given ROI. Additionally, a segmentation would have been flagged if it spanned fewer than 10 slices in the anterior-posterior direction, as this would indicate insufficient coverage of the expected anatomical region. In this sub-sample of 60 participants, the ROI masks performed as expected, with no significant deviations observed. Given these consistent results and the high ICC values between manual and automated segmentation, extending quality control to the full sample was deemed unnecessary. An example, where the progression of segmentation masks across consecutive coronal slices from the anterior to posterior boundary is displayed alongside the corresponding unsegmented T1-weighted images, is displayed in Supplementary Fig. 2. In addition, all quality control criteria used for evaluating the segmentation masks are summarized in Supplementary Table 1. Based on the regions that were analyzed in the study by Krumm et al.14, we additionally trained a separate network for the parahippocampal cortex. However, since this region is not the focus of this work, it is not further discussed in this manuscript.
Statistical analyses
For each ROI, an aggregated bilateral cortical thickness value was used. Cortical thickness measurements were normalized for head size (as total intracranial volume [TIV]) as reported by Krumm et al.14 using the formula [(cortical thickness)/(TIV) × 100]. For reporting in Table 3, normalized values were retransformed to mm using the mean TIV of the two comparing groups (e.g., dAD versus NC mean TIV = 1453 cm3; aMCI versus NC mean TIV = 1480 cm3). Group differences were examined conducting univariate analysis of covariance (ANCOVA), incorporating age, sex, and education level as covariates. To address multiple comparisons, significance thresholds were adapted using the Bonferroni correction (e.g., p = 0.05/8 = 0.00625). In addition, to evaluate the accuracy between the two methods (manual and automated segmentation), TIV corrected bilateral cortical thickness values of all participants of the test data set were compared using intraclass correlation coefficient (ICC) estimates and their 95% confidence intervals based on a single-rating, consistency, and a 2-way mixed-effects model according to the guidelines of Koo and Li39. All analyses were executed in SPSS software, and while Krumm et al.14 utilized SPSS 21.0, our replication utilized the subsequent version, SPSS 22.0 (IBM Corp. Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY, USA).
Results
Two-tailed, univariate ANCOVAs with sex, age, and education as covariates were performed to determine whether each ROI (mPRC, lPRC, and ERC) was atrophied in the aMCI and dAD groups relative to their corresponding NC sample. Significance was tested with Bonferroni corrected p-values (i.e., p = 0.05/8 = 0.00625 according to Krumm et al., 201614).
In comparison to the NC group, the dAD group showed significantly lower average cortical thickness of the ERC, mPRC, and lPRC [ERC: F(1,60) = 39.820, p < 0.00625, mPRC: F(1,60) = 32.270, p < 0.00625, lPRC: F(1,60) = 10.907, p < 0.00625]. In comparison to the NC group, only the ERC was significantly atrophied in the aMCI group [ERC: F(1,64) = 13.249, p < 0.00625], while the p-value of the mPRC and lPRC did not survive Bonferroni correction of 0.00625 [mPRC: F(1,64) = 4.884, p = 0.031; lPRC: F(1,64) = 6.408, p = 0.014]. The ICC analyses, based on a single-rating, consistency, and a 2-way mixed-effects model, between manual and automated segmentation for TIV corrected cortical thickness estimates are summarized in Table 4.
Discussion
Our goal was to replicate the study of Krumm et al.14, who used a manual segmentation protocol to extract cortical thickness values of key regions within the parahippocampal gyrus (e.g., mPRC, lPRC, and ERC). Recognizing the labor-intensive nature of manual MRI-segmentation, we trained a deep convolutional network, specifically utilizing U-Net architecture, for the automated segmentation of the same regions and subsequently applied it to the identical sample as used in the group analysis in Krumm et al.14. In line with the findings of Krumm et al.14, our study found significant atrophy in the ERC, mPRC, and lPRC within the early dAD group when compared to the NC group. However, when comparing the aMCI group to the NC group, the null hypothesis of no difference could only be rejected in the ERC, while the null hypothesis of no difference in cortical thickness in the mPRC could not be rejected after applying strict correction for multiple comparisons, unlike the findings of Krumm et al.14. Nonetheless, the results are highly promising for future research in the early detection of AD, which will be discussed below. In addition, we compared the manual with the automated segmentation. The ICC analyses for cortical thickness estimates showed high ICC values between the manually and automatically generated cortical thickness values for all ROIs (mPRC, lPRC, and ERC), suggesting manual and automatic segmentation to generate comparable outcomes.
As highlighted, the early involvement of the mPRC in NFT pathology in typical AD positions a specific mPRC-integrity score (e.g., cortical thickness) as a promising early and sensitive imaging biomarker. Building upon our initial aim to replicate the critical findings of Krumm et al.14 regarding atrophy in the ERC and mPRC as early markers of Alzheimer’s disease, we have now bridged a significant gap in the field. The previously outlined very high reliability of a manual segmentation protocol, as detailed in the introduction, laid a solid foundation for accurate and detailed analysis of brain regions crucial for early detection of AD18. By integrating machine learning techniques, particularly through the training of a U-Net based deep convolutional network38, we have developed an automated segmentation tool that parallels the very high interrater reliability seen using the manual segmentation protocol. These findings establish a foundation for more efficient application in clinical and research settings, potentially improving the early diagnosis of AD.
A key strength of our approach lies in its ability to capture individual anatomical variability more accurately and less biased compared to traditional atlas-based methods. On the other hand, the segmentation may be less robust20. Deep learning methods benefit strongly from larger data sets. Thus, combining both data sets of this study and adding additional training data has the potential to improve prospective models. Earlier work (e.g42,43), used atlas-based tools that create standardized templates by averaging anatomical structures from multiple individuals. While these methods are robust, they tend to smooth out inter- and intra-individual differences due to the averaging process. In contrast, our deep-learning-based method predicts class labels directly from the input data, allowing the algorithm to learn and account for the anatomical variability of the collateral sulcus from the training data. This approach has a smaller tendency to average out these differences, making it better suited to capturing the unique anatomical features of each individual, which is crucial for personalized and precise measurements in clinical settings.
In evaluating the early dAD group against the NC group using the identical sample, we replicated significant atrophy findings in the ERC, mPRC, and lPRC as reported by Krumm et al.14. Yet, in the aMCI vs. NC comparison, significant difference in cortical thickness was confined to the ERC. The results diverge from Krumm et al.‘s14 findings regarding the mPRC, despite the same underlying sample. This discrepancy may be attributed to the inherent complexity of the sulcal pattern in this region. The collateral sulcus, where the mPRC is located, is known for its anatomical variability across individuals, which poses a challenge for automated segmentation algorithms8. While disease condition (e.g., AD) influences cortical thickness, it is not known to alter the anatomical borders of the mPRC (e.g., by changing the length of the collateral sulcus and thereby the anatomical borders)8. Hence, the variability observed in the automated compared to the manual segmentation is unlikely due to disease or disease progression but rather reflects the difficulty in consistently identifying the precise boundaries of the mPRC within a highly variable collateral sulcus.
Another factor to consider is the potential inclusion of pixels overlapping with the dura (see, for example, images in Table 3), which could impact atrophy measurements such as cortical thickness or volumes. However, it is crucial to note that cortical thickness was computed for both manual and automated segmentation methods using FreeSurfer. The computation of cortical thickness is performed after reconverting the voxel-based data back to the surface-based representation. While we believe the impact on cortical thickness—our key clinical measurement—is minimal, a potential influence cannot be entirely denied, particularly in volumetric analyses where surface area is also considered. Moreover, since the automated segmentation strives to replicate manual segmentation outcomes, its efficacy is inherently bounded by the precision of the manual segmentation technique. Although the manual segmentation served as the reference standard in our study, it still represents an estimation of the true cortical thickness, as all methodologies inherently possess limitations and potential biases. The results based on our sample suggest that the manual segmentation might be slightly more sensitive, but it lacks practicability in clinical or research setting. For example, manual segmentation for the mPRC requires about 20 min of labor per person, which is typically regarded as unfeasible in a clinical setting. The automated segmentation, on the other hand, runs unattended in the background in less than 5 min.
While the automated approach was not able to fully replicate the manual segmentation results for the mPRC in the aMCI vs. NC comparison, a clear trend was observed, even though it did not reach statistical significance as in14. Together with the high consistency between the automated and manual segmentation, the automated method emerges as a promising alternative, especially for larger datasets where manual segmentation would not be feasible. Nonetheless, the future application and validation of our new automated tool remains of utmost importance. Conducting a longitudinal study with initially healthy individuals would be particularly beneficial. This approach not only aims to collect essential normative data for the wider application of our tool in clinical settings but also enables the retrospective analysis of cortical changes in participants who later develop symptomatic AD. One important consideration for future studies is the need for larger sample sizes in group comparisons. Given the small size and high anatomical variability of the mPRC, larger sample sizes are crucial to reliably detect subtle differences in cortical thickness between groups. This is particularly relevant in early-stage conditions like aMCI, where atrophy may be less pronounced and more difficult to detect. Additionally, in the context of future imaging studies, it would be worth considering the use of less stringent statistical corrections for group comparisons, as they can lead to underestimation of meaningful effects, particularly in smaller regions where variability is high (e.g., the mPRC). By expanding the dataset and adjusting statistical thresholds, it may be possible to capture more subtle cortical changes in the mPRC in early stages of AD.
Furthermore, the mPRC should be evaluated alongside other well-established markers in AD research and clinical practice, such as the ERC11,44,45. Combining this established marker with a new mPRC atrophy score can potentially enhance the early detection and monitoring of AD. This dual approach could provide a more comprehensive understanding of cortical degeneration patterns and improve diagnostic accuracy in the preclinical stages of AD. In addition, incorporating longitudinal studies that not only employ conventional neuropsychological assessments but also integrate newer, more specific neuropsychological tests for assessing specific perirhinal cortex function, such as the novel object recognition task developed by Frei et al.40, will be instrumental. Our automated segmentation method, initially designed for assessing cortical thickness, also shows potential for functional imaging studies. This capability offers a valuable tool for exploring the functional dynamics of medial temporal lobe subregions in early AD progression, while mitigating the need for labor-intensive manual segmentation.
Finally, we did not investigate cortical thickness values of left and right hemisphere separately. Brain atrophy in typical AD often presents asymmetrically, particularly emphasizing the vulnerability of the left hemisphere41. This emphasizes the importance of separate evaluation of the hemispheres (e.g., cortical thickness of the left and right mPRC) in future research and clinical assessments. Although the mPRC rises as a promising vital structural biomarker in the early phases of AD, adopting a multi-domain approach that potentially includes a range of biomarkers becomes increasingly more important as we delve into the mild cognitive impairment spectrum moving towards asymptomatic stages. Such investigations require large data sets, which are not feasible for manual segmentation, a shortfall we believe to have successfully addressed with our presented automated segmentation method. This strategy not only aims to improve individualized patient evaluations but also promises to refine diagnostic precision and foster early, customized interventions in the incipient stages of the disease.
To conclude, our study extends beyond state-of-the-art validation methods by connecting segmentation outputs to clinical data, ensuring both anatomical precision and clinical relevance. Currently, there is no alternative that provides automated segmentation adhering to the same rigorous protocol. This integration marks a pivotal advancement, addressing the need for automated segmentation methods that convince anatomically while directly supporting clinical applications. By demonstrating strong agreement with manual segmentation and validating our findings against established clinical patterns, our approach represents a meaningful step forward in the development of automated tools. This dual-validation strategy enhances reliability and establishes a robust reference point for future studies aiming to integrate automated segmentation with clinical practice.
Conclusion
We aimed to replicate a prior study by Krumm et al.14, confirming early AD-associated cortical thinning within key parahippocampal gyrus regions via automated MRI-segmentation. The results showed high consistency for cortical thickness between the manual and automated segmentation. Therefore, despite the potentially slightly higher sensitivity of manual segmentation in our sample, the automated method still emerges as a promising tool, especially due to its applicability to larger datasets. We underscore the importance of future longitudinal studies, which should not only include initially healthy individuals but also focus on measuring unilateral cortical thickness values and incorporate neuropsychological testing, particularly tasks specifically assessing the function of the perirhinal cortex. This strategy not only aims to enhance diagnostic precision but also to pave the way for early, targeted intervention strategies, ultimately contributing to the development of personalized treatment plans and advancing our collective understanding of AD pathology.
Data availability
Analysis code and research materials, as well as metadata and/or full data, are available upon request. Correspondence: Nicolas A. Henzen, University Department of Geriatric Medicine FELIX PLATTER, Burgfelderstrasse 101, Basel CH-4055, Switzerland. Email: Nicolas.Henzen@felixplatter.ch.
References
Morris, J. C. et al. Role of biomarkers in studies of presymptomatic Alzheimer’s disease. Alzheimer’s Dement. 1 (2), 145–151. https://doi.org/10.1016/j.jalz.2005.09.013 (2005).
Sperling, R. A. et al. Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dement. 7 (3), 280–292. https://doi.org/10.1016/j.jalz.2011.03.003 (2011).
Giannakopoulos, P. et al. Tangle and neuron numbers, but not amyloid load, predict cognitive status in Alzheimer’s disease. Neurology 60 (9), 1495–1500. https://doi.org/10.1212/01.wnl.0000063311.58879.01 (2003).
Braak, H. & Braak, E. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathol. 82 (4), 239–259. https://doi.org/10.1007/BF00308809 (1991).
Bobinski, M. et al. Relationships between regional neuronal loss and neurofibrillary changes in the hippocampal formation and duration and severity of Alzheimer disease. J. Neuropathol. Exp. Neurol. 56 (4), 414–420. https://doi.org/10.1097/00005072-199704000-00010 (1997).
Gómez-Isla, T. et al. Neuronal loss correlates with but exceeds neurofibrillary tangles in Alzheimer’s disease. Ann. Neurol. 41 (1), 17–24. https://doi.org/10.1002/ana.410410106 (1997).
Brodmann, K. & Garey, L. Brodmann’s Localisation in the Cerebral Cortex: the Principles of Comparative Localisation in the Cerebral Cortex Based on the Cytoarchitectonics (Springer, 2006). https://doi.org/10.1007/b138298
Taylor, K. I. & Probst, A. Anatomic localization of the transentorhinal region of the perirhinal cortex. Neurobiol. Aging 29 (10), 1591–1596. https://doi.org/10.1016/j.neurobiolaging.2007.03.024 (2008).
Rullmann, M. et al. Multicenter 18F-PI-2620 PET for in vivo Braak staging of Tau pathology in Alzheimer’s disease. Biomolecules https://doi.org/10.3390/biom12030458 (2022). O. T. G. I. I. F.
Schwarz, A. et al. Regional profiles of the candidate Tau PET ligand 18F-AV-1451 recapitulate key features of Braak histopathological stages. Brain 139 Pt 5, 1539–1550. https://doi.org/10.1093/brain/aww023 (2016).
Enkirch, S. J. et al. The ERICA score: An MR imaging-based visual scoring system for the assessment of entorhinal cortex atrophy in alzheimer disease. Radiology 288 (1), 226–333. https://doi.org/10.1148/radiol.2018171888 (2018).
Scheltens, P. & van de Pol, L. Atrophy of medial temporal lobes on MRI in probable Alzheimer’s disease and normal ageing: Diagnostic value and neuropsychological correlates. J. Neurol. Neurosurg. Psychiatry 83 (11), 1038–1040. https://doi.org/10.1136/jnnp-2012-302562 (2012).
Sone, D. et al. Regional Tau deposition and subregion atrophy of medial Temporal structures in early Alzheimer’s disease: A combined positron emission tomography/magnetic resonance imaging study. Alzheimer’s Dement. 9, 35–40. https://doi.org/10.1016/j.dadm.2017.07.001 (2017).
Krumm, S. et al. Cortical thinning of parahippocampal subregions in very early Alzheimer’s disease. Neurobiol. Aging 38, 188–196. https://doi.org/10.1016/j.neurobiolaging.2015.11.001 (2016).
Kulason, S. et al. Cortical thickness atrophy in the transentorhinal cortex in mild cognitive impairment. NeuroImage Clin. 21, 101617. https://doi.org/10.1016/j.nicl.2018.101617 (2019).
Kulason, S. et al. Entorhinal and Transentorhinal atrophy in preclinical Alzheimer’s disease. Front. Neurosci. https://doi.org/10.3389/fnins.2020.00804 (2020).
Braak, H. & Del Tredici, K. Staging of cortical neurofibrillary inclusions of the Alzheimer’s type. In Alzheimer: 100 Years and Beyond (eds Jucker, M. et al.) 97–106 (Springe,. 2006). https://doi.org/10.1007/978-3-540-37652-1_8
Henzen, N. A., Reinhardt, J., Blatow, M., Kressig, R. W. & Krumm, S. Excellent interrater reliability for manual segmentation of the medial perirhinal cortex. Brain Sci. 13 (6). https://doi.org/10.3390/brainsci13060850 (2023).
Morey, R. A. et al. A comparison of automated segmentation and manual tracing for quantifying hippocampal and amygdala volumes. NeuroImage 45 (3), 855–866. https://doi.org/10.1016/j.neuroimage.2008.12.033 (2009).
Alzubaidi, L. et al. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data. 8 (1), 53. https://doi.org/10.1186/s40537-021-00444-8 (2021).
Reuter, M., Schmansky, N. J., Rosas, H. D. & Fischl, B. Within-subject template estimation for unbiased longitudinal image analysis. Neuroimage 61 (4), 1402–1418. https://doi.org/10.1016/j.neuroimage.2012.02.084 (2012).
Ashburner, J. & Friston, K. J. Unified segmentation. NeuroImage 26 (3), 839–851. https://doi.org/10.1016/j.neuroimage.2005.02.018 (2005).
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016 (eds Ourselin, S. et al.) 424–432 (Springer International Publishing, 2016). https://doi.org/10.1007/978-3-319-46723-8_49
Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. arXiv:1505.04597 [Cs] http://arxiv.org/abs/1505.04597 (2015).
Folstein, M. F., Folstein, S. E. & McHugh, P. R. Mini-mental state: A practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12 (3), 189–198. https://doi.org/10.1016/0022-3956(75)90026-6 (1975).
Delis, D. C., Freeland, J., Kramer, J. H. & Kaplan, E. Integrating clinical assessment with cognitive neuroscience: Construct validation of the California verbal learning test. J. Consult Clin. Psychol. 56 (1), 123–130. https://doi.org/10.1037/0022-006X.56.1.123 (1988).
Kaplan, E., Goodglass, H. & Weintraub, S. Boston Naming Test (Lea & Febiger, 1983).
Diagnostic and Statistical Manual of Mental Disorders. 4th edn (American Psychiatric Association, 1994).
Beck, A. T. An inventory for measuring depression. Arch. Gen. Psychiatry. 4 (6), 561. https://doi.org/10.1001/archpsyc.1961.01710120031004 (1961).
Beck, A. T., Steer, R. A. & Brown, G. Beck Depression Inventory-II [dataset]. https://doi.org/10.1037/t00742-000 (2011).
Molton, I. Geriatric Depression Scale. In Encyclopedia of Behavioral Medicine (eds Gellman, M. D. & Turner, J. R.) 857–858 (Springer, 2013). https://doi.org/10.1007/978-1-4419-1005-9_194
Albert, M. S. et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dement. 7 (3), 270–279. https://doi.org/10.1016/j.jalz.2011.03.008 (2011).
McKhann, G. M. et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dement. 7 (3), 263–269. https://doi.org/10.1016/j.jalz.2011.03.005 (2011).
Brooks, L. G. & Loewenstein, D. A. Assessing the progression of mild cognitive impairment to Alzheimer’s disease: Current trends and future directions. Alzheimer’s Res. Ther. 2 (5), 28. https://doi.org/10.1186/alzrt52 (2010).
Winblad, B. et al. Mild cognitive impairment–beyond controversies, towards a consensus: Report of the international working group on mild cognitive impairment. J. Intern. Med. 256 (3), 240–246. https://doi.org/10.1111/j.1365-2796.2004.01380.x (2004).
Dale, A. M., Fischl, B. & Sereno, M. I. Cortical surface-based analysis: I. Segmentation and surface reconstruction. NeuroImage 9 (2), 179–194. https://doi.org/10.1006/nimg.1998.0395 (1999).
Fischl, B., Sereno, M. I. & Dale, A. M. Cortical surface-based analysis: II: Inflation, flattening, and a surface-based coordinate system. NeuroImage 9 (2), 195–207. https://doi.org/10.1006/nimg.1998.0396 (1999).
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods. 18 (2), 203–211. https://doi.org/10.1038/s41592-020-01008-z (2021).
Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropract. Med. 15 (2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012 (2016).
Frei, M. et al. Can you find it? Novel oddity detection task for the early detection of Alzheimer’s disease. Neuropsychology https://doi.org/10.1037/neu0000859 (2022).
Lubben, N., Ensink, E., Coetzee, G. A. & Labrie, V. The enigma and implications of brain hemispheric asymmetry in neurodegenerative diseases. Brain Commun. 3 (3), fcab211. https://doi.org/10.1093/braincomms/fcab211 (2021).
Augustinack, J. C. et al. Predicting the location of human perirhinal cortex, Brodmann’s area 35, from MRI. NeuroImage 64 (C), 32–42. https://doi.org/10.1016/j.neuroimage.2012.08.07144 (2013).
Xie, L. et al. Automated segmentation of medial Temporal lobe subregions on in vivo T1-weighted MRI in early stages of Alzheimer’s disease. Hum. Brain Mapp. 40 (12), 3431–3451. https://doi.org/10.1002/hbm.24607 (2019).
Devanand, D. P. et al. MRI hippocampal and entorhinal cortex mapping in predicting conversion to Alzheimer’s disease. NeuroImage 60 (3), 1622–1629. https://doi.org/10.1016/j.neuroimage.2012.01.075 (2012).
Dickerson, B. C. et al. The cortical signature of Alzheimer’s disease: Regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloid-positive individuals. Cereb. Cortex 19 (3), 497–510. https://doi.org/10.1093/cercor/bhn113 (2009).
Suzuki, W. A. & Amaral, D. G. Perirhinal and parahippocampal cortices of the macaque monkey: Cytoarchitectonic and chemoarchitectonic organization. J. Comp. Neurol. 463 (1), 67–91. https://doi.org/10.1002/cne.10744 (2003).
Suzuki, W. A. & Amaral, D. G. Perirhinal and parahippocampal cortices of the macaque monkey: Cortical afferents. J. Comp. Neurol. 350, 497–533. https://doi.org/10.1002/cne.903500402 (1994a).
Suzuki, W. A. & Amaral, D. G. Topographic organization of the reciprocal connections between the monkey entorhinal cortex. J. Neurosci. 14, 1856–1877. https://doi.org/10.1523/JNEUROSCI.14-03-01856.1994 (1994b).
Insausti, R. et al. MR volumetric analysis of the human entorhinal, perirhinal, and temporopolar cortices. Am. J. Neuroradiol. 19 (4), 659–671 (1998).
Kivisaari, S. L., Probst, A. & Taylor, K. I. The perirhinal, entorhinal, and parahippocampal cortices and hippocampus: an overview of functional anatomy and protocol for their segmentation in MR images. In fMRI: Basics and Clinical Applications (eds Ulmer, S. & Jansen, O.) 239–267 (Springer, 2013). https://doi.org/10.1007/978-3-642-34342-1_19
Acknowledgements
The authors thank Dr. phil. Kirsten I. Taylor, principal investigator of the study whose data were used for the test data set, for allowing us to use the data to replicate the study of Krumm et al.14.
Author information
Authors and Affiliations
Contributions
Conceptualization, N.A.H., A.A, and S.K.; Data curation, J.R., M.B. and S.K.; Formal analysis, N.A.H.; Funding acquisition, S.K.; Investigation, S.K.; Methodology, N.A.H., A.A. and, S.K.; Project administration, S.K.; Resources, A.A., J.R., M.B., R.W.K. and S.K.; Software, N.A.H., A.A., J.R., M.B. and S.K.; Supervision, A.A., S.K.; Validation, A.A., S.K.; Visualization, N.A.H.; Writing—original draft, N.A.H., A.A., and S.K.; Writing—review & editing, N.A.H., A.A., J.R., M.B., R.W.K. and S.K. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Henzen, N.A., Abdulkadir, A., Reinhardt, J. et al. Automated segmentation for cortical thickness of the medial perirhinal cortex. Sci Rep 15, 14903 (2025). https://doi.org/10.1038/s41598-025-98399-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-98399-w


