Characterizing replicability in the clustering structure of brain morphology in autism, attention-deficit/hyperactivity disorder, and obsessive compulsive disorder

Sadat-Nejad, Younes; Vandewouw, Marlee M.; Brian, Jessica; Crosbie, Jennifer; Schachar, Russell J.; Iaboni, Alana; Kelley, Elizabeth; Jones, Jessica; Taylor, Margot J.; Ayub, Muhammad; Nicolson, Robert; Syed, Bilal; Hammill, Christopher; Georgiades, Stelios; Arnold, Paul D.; Lerch, Jason P.; Anagnostou, Evdokia; Kushki, Azadeh

doi:10.1038/s41398-025-03540-y

Download PDF

Article
Open access
Published: 30 August 2025

Characterizing replicability in the clustering structure of brain morphology in autism, attention-deficit/hyperactivity disorder, and obsessive compulsive disorder

Translational Psychiatry volume 15, Article number: 333 (2025) Cite this article

2102 Accesses
1 Citations
11 Altmetric
Metrics details

Subjects

Abstract

In neurodevelopmental research, within-diagnosis heterogeneity and across-diagnosis overlap necessitate a shift from case-control designs to data-driven clustering approaches. However, our understanding of the replicability of these clustering structures across independent datasets remains limited. Our objective was to examine the replicability of clustering structure in measures of brain morphology in neurodiverse children across two independent datasets, namely the Province of Ontario Neurodevelopmental Disorder (POND) Network and the Healthy Brain Network (HBN). POND and HBN data were collected across various institutions in Ontario, Canada, and New York, United States, respectively. Participants were 5-19 years old and had diagnoses of autism, attention deficit/hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), or were neurotypical. We used measures of cortical volume, surface area, cortical thickness, and subgroup volume from structural MRI data. Principal component analysis (PCA) and clustering were used to examine the replicability of clustering structures across the datasets. Correlations among principle components, measures of clusterability, and alignment between the four brain measures as well as male/female subsets were examined. Brain-behaviour associations were examined using univariate and multivariate approaches. The POND dataset included 747 participants with (autism n = 312, ADHD n = 220, OCD n = 70, neurotypical n = 145). The HBN dataset included 582 participants (autism n = 60, ADHD n = 445, OCD n = 19, neurotypical n = 58). Our results showed significant between-dataset correlations in 82.1% of the principal components derived from brain measures. A two-cluster structure was replicated across datasets, brain measures, and the female/male subsets, however the participant composition of clusters were only aligned between cortical volume and surface area, and cortical thickness and subcortical volume. Regional effect sizes for between-cluster differences were highly correlated across datasets (beta = 0.92+/−0.01, p < 0.0001; adjusted R-squared=0.93). Data-driven clusters did not align with diagnostic labels across datasets. Brain-behaviour associations were only replicated for male subsets and subcortical volume using multivariate analysis. We found evidence of replicability of the clustering structure across two independent datasets; however, caution must be exercised in integrating multiple measures in clustering and interpretation of brain-behaviour associations.

Integration of brain and behavior measures for identification of data-driven groups cutting across children with ASD, ADHD, or OCD

Article 09 November 2020

Cortical and subcortical structural differences in psychostimulant-free ADHD youth with and without a family history of bipolar I disorder: a cross-sectional morphometric comparison

Article Open access 30 November 2023

Generalizability of 3D CNN models for age estimation in diverse youth populations using structural MRI

Article Open access 27 April 2023

Introduction

Autism spectrum disorder (ASD; autism), attention-deficit/hyperactivity disorder (ADHD), and obsessive-compulsive disorder (OCD) are behaviourally-defined neurodevelopmental conditions [1,2,3] with significant variability and overlap in their neurobiology and phenotypic presentation [4, 5]. To characterize the variability within and across these conditions, a growing body of research has focused on data-driven approaches, including clustering [6,7,8,9], to discover transdiagnostic groups of individuals who share similar neurobiological [7, 9,10,11] or phenotypic features [12]. These studies have consistently found a misalignment between data-driven subgroups and existing diagnostic labels [6, 12]; however, significant variability exists across these studies in the neurobiological features and analytical approaches used in clustering. For example, different measures of brain morphology [12, 13] and function [10] have been used along with a range of clustering approaches including hierarchical, spectral, multi-view, or regression clustering [12, 13].

In addition to differences in data modalities and analytical approaches, significant variability is also found in data acquisition methods (e.g., scanners, scanning parameters, motion), imaging pipelines [14,15,16] (quality control method, denoising/correction algorithms), and sample characteristics [17], including diagnoses and sociodemographic composition. Given this, it is not surprising that there is also significant variability in study findings. This includes differences in the suggested number of clusters (e.g., 2-8 cluster solutions [10, 12, 18]), neurobiological characteristics defining the subgroups (e.g. differences in cortical volume or subcortical [18], cortical thickness [12]), and the phenotypic presentation of the clusters (for example, cluster differences in social communication abilities [10, 18], language and attention [18], cognitive ability and hyperactivity [9]). The heterogeneity of findings in the existing literature has raised questions about the replicability of clustering results. In this context, replicability is defined as obtaining consistent findings across studies with the same research question [19]. Among the existing literature, only one study has investigated the replicability of data-driven subgroups across two independent datasets including children with diagnoses of neurodevelopmental conditions [9]. Using resting-state functional connectivity datasets from the Province of Ontario Neurodevelopmental Disorder (POND) Network and the Healthy Brain Network (HBN), this study found two clusters that differed in IQ, hyperactivity, and impulsivity, as well as patterns of segregation and integration within the brain’s networks. These results provide encouraging preliminary evidence that the results of clustering based on these measures of brain function may be replicable in spite of differences in datasets. A critical gap still remains, however, in understanding replicability in clustering results based on measures of brain morphology. The present study addresses the gap by examining the issue of replicability in measures of brain morphology namely cortical thickness, surface area, and cortical and subcortical volume. To this end, we will examine 1) replicability of clustering across these measures, and 2) replicability across two independent datasets. Given the known sex-differences in neurodevelopmental conditions, our analysis was disaggregated by sex, allowing us to also examine replicability across the male and female subsets of each dataset.

Methods

Participants

For this study, we used data from two independent datasets, namely, POND (export date May 22, 2023), and HBN (Release 10). Data from participants who were between 5–19 years of age, had a diagnosis of autism, ADHD, OCD, or who were neurotypical, and whose neuroimaging data passed quality control were selected for the current study. This resulted in data from 747 participants from POND (autism: n = 312, female=22.4%, median age=12.4 (5.53); ADHD: n = 220, 25.5% female, median age=11.3 (4.08); OCD: n = 70, 40.0% female, median age=11.8 (5.57); neurotypical: n = 145, 41.4% female, median age=11.9 (5.16)), and 582 participants from HBN (autism: n = 60, 8.33% female, age=12.31(5.86); ADHD: n = 445, 31.2% female, 10.02 (4.83); OCD: n = 19, 52.6% female, age=8.76 (5.09); neurotypical: n = 58, 41.4% female, age=10.1(5.24)). For POND, clinical diagnoses were supported by gold-standard assessments: the Autism Diagnostic Observation Schedule-2 (ADOS [20]) and the Autism Diagnostic Interview-Revised (ADI-R [20]) for autism, the Parent Interview for Children Symptoms for ADHD (PICS) for ADHD, and the Children’s Yale-Brown Obsessive Compulsive Scale for OCD [21] (CY-BOCS). Children in the neurotypical group did not have a history of neurodevelopmental, psychiatric, or neurological diagnoses, were born after 35 weeks gestation, and had no first-degree relative with a neurodevelopmental condition. For HBN, a computerized web-based version of the Schedule for Affective Disorders and Schizophrenia—Children’s version (KSADS [22]) was administered, which was reviewed alongside all study material by a clinical team to synthesize a consensus clinical diagnosis aligning with the DSM-5 [23]. Individuals with no diagnosis given were considered as neurotypical.

Both POND and HBN studies were approved by the respective institutions’ research ethics board. Written informed consent and/or verbal assent (if written is not available) were obtained from the primary caregiver and/or participants as appropriate. The present study on secondary analysis of POND and HBN data was approved by the Holland Bloorview Research Ethics Board.

Behavioural measures

Phenotypic measures available for both datasets included the Social Communication Questionnaire (SCQ) to quantify autism-like features [24], the Strength and Weakness for ADHD symptoms and Normal Behaviour Rating Scale (SWAN [25]) to measure inattention and hyperactivity symptoms,Toronto Obsessive Compulsive Scale (TOCS) to measure obsessive-compulsive traits, and full-scale intelligence quotient (FSIQ) measured by an age-appropriate IQ scale [26, 27]. The internalizing and externalizing measures of Child Behaviour Checklist (CBCL [28, 29]) were used to quantify internalizing and externalizing symptoms.

Sociodemographics measures

In addition to age and sex, racial and ethnic identifications in both datasets were collected through self-reported or parent-reported questionnaires. For POND, racial categories were aligned with the standards set by the Canadian Institute for Health Information. These categories encompassed Black, East Asian, Indigenous, Latino, Middle Eastern, South Asian, Southeast Asian, White, and other. Participants with mixed racial backgrounds were coded in multiple categories. For HBN, racial categories followed the US Census guidelines, including American Indian or Alaskan Native, Asian, Black, Hispanic, Native Hawaiian or other Pacific Islander, White, 2 or more races, and other. Given the sample size, we consolidated race into two categories of minoritized and white for both datasets. For both datasets, household income was categorized as low (<$74,999 CAD), medium ($75,000 CAD to $199,999 CAD), and high (≥$200,000 CAD). Education was defined based on the highest educational attainment of the primary caregiver, categorized as: Level 1 (non-completion of high school or high school diploma), Level 2 (associate degree or undergraduate degree), and Level 3 (graduate or professional degree).

Imaging data

For both datasets, measures of cortical surface area, cortical thickness, cortical volume and subcortical volume were obtained from structural MRI (sMRI). For POND, the sMRI images were collected on Siemens MAGNETOM 3 T Trio and Prisma MRI scanners across three sites, namely, the Hospital for Sick Children (Toronto, Ontario; Trio: n = 233; Prisma: n = 348), Queen’s University (Kingston, Ontario; Trio: n = 100; Prisma: n = 43), and Holland Bloorview Kids Rehabilitation Hospital (Toronto, Ontario; Prisma: n = 23). For HBN, the data were collected using Siemens 3 T Trio and Prisma scanners from three institutions in the New York City area, namely the CitiGroup Cornell Brain Imaging Center (Prisma: n = 345), Rutgers University (Trio: n = 202), the City University of New York Advanced Science Research Center (Prisma: n = 28), and a mobile site in Staten Island with a 1.5 Tesla Siemens Avanto (n = 7).

To extract surface area, cortical thickness, and cortical volume, the CIVET pipeline (version 2.1.0) [30] was used. These measures were extracted for 76 regions based on the automated anatomical labeling atlas (AAL) [17, 31]. Non-uniformity image correction and stereotaxic registration to the Montreal Neurologic Institute (MNI ICBM) [32] template (non-linear 6th generation target) was then used. Masking, extraction and classification were used to separate and obtain gray matter, white matter, and cerebrospinal fluid volume. A surface diffusion kernel was applied [33], and regions were registered to the AAL atlas [34]. Cortical thickness was calculated based on the distance between two smooth surfaces [14] and gray matter and white matter surfaces was generated by tissue classification, and then surfaces were registered to the automated anatomical labelling (AAL) atlas [35]. Lastly, segmentation by use of multiple automatically generated templates (MAGeT) [33] was used to calculate volume of 95 subcortical structures from multiple starting atlases, including 5-atlas subcortical, cerebellum, amygdala, hippocampus-subfields, and striatum and thalamus subdivisions. The CIVET and MAGeT quality control (QC) pipelines were used, and participants were only included if they passed both QC pipelines. Details of the data filtering is provided in the eTable 1 in Supplement 1. For each dataset, separately for males and females, the brain measures were corrected for scanner effects using ComBat Harmonization [14]. For age correction, the best model fit among linear, quadratic, and cubic effects was used for each brain region [9].

Analysis pipeline

Data and statistical analyses were performed using Python 3.8.0 and R 3.3.3. An overview of the analysis pipeline is depicted in eFigure 1 in Supplement 1. Given the sex-differences in neurobiology of neurodevelopmental conditions [36,37,38,39], analyses were conducted independently on male and female subsets of each dataset.

To examine between-dataset similarities in the structure of the data, we used Principal Component Analysis (PCA) [40]. PCA is a multivariate approach that transforms the set of measurement variables into a new set of uncorrelated variables (principal components; PCs) that capture the largest variation in the data. The coefficients of the original variables, referred to as loadings, represent the strength of their contribution to the PCs. For this study, PCA was applied independently on surface area, cortical thickness, and cortical, and subcortical volume data. To examine similarities in the principal components across POND and HBNs, Pearson’s correlation between POND and HBN loadings was computed. This was computed as the maximum correlation between a POND PC, and the corresponding PC on HBN, allowing for a window of 2 in cases where PC numbers were not aligned between the datasets.

To characterize the clustering structure of the datasets, we used the PCA-transformed data to compute participant similarity networks. These are matrices with entries corresponding to the similarity between pairs of participants (i.e., entry i, j, corresponds to the similarity between participants i and j). Pairwise similarities were computed using the Gaussian transform of the cosine distance between vectors encoding brain measure values across all regions of the atlas. The cosine distance was selected as it provides a robust method for capturing structural associations in high-dimensional datasets [41]. With this pipeline, we generated four distinct similarity networks (cortical area, cortical thickness, cortical, and subcortical volume) separately for male and females in each dataset. These matrices were then clustered using spectral clustering [42].

Statistical analyses

To examine the existence of clusters within each network, we employed three measures of clusterability: the gap statistic [43], silhouette coefficient [33], and Calinski-Harabasz [44]. Statistical significance of clustering patterns were determined using a permutation test comparing the three measures of clusterability between the datasets and 200 random networks. The random networks were generated using the same weight distribution as the original networks [45], preserving the degree and strength of the original networks [46]. Alignment among the constructed clusters and diagnostic labels, as well as clusters obtained from different brain measures was assessed using the adjusted rand score [47]. An adjusted rand score of one indicates full alignment between two sets, whereas a value of zero suggests no alignment.

Univariate and multivariate methods were employed to examine the associations among clusters and behavioural and brain measures. For univariate analysis, measures were compared among clusters using t-test for normally-distributed, continuous data, Mann-Whitney tests for non-normally-distributed, continuous data, and Chi-squared tests for categorical data. Family-wise correction was used for multiple comparisons and Cohen’s effect size [42] was reported for statistically significant results. For multivariate analysis, we predicted cluster labels from phenotypic measures using a random forest classifier [48]. These phenotypic predictors included scores on the SCQ, SWAN (inattention and hyperactivity), and CBCL (internalizing and externalizing), as well as full-scale IQ, age, race/ethnicity, and household education level.

Results

Participants

A total of 121 participants in POND and 923 participants in HBN failed either the CIVET and MAGet quality control (detailed in Supplementary Table 1). As the result, 747 participants from POND and 582 participants from HBN remained for the analyses. The demographic characteristics for the POND and HBN participants are shown in Table 1.

Table 1 Demographic characteristics for the POND datasets.

Full size table

PCA decomposition

The number of principal components needed to account for 75% of variance in the data across the measures and dataset ranged between 14 and 24 (eTable 2 in Supplement 1). The correlations between the loadings on the principal components of two datasets are shown in the eFigure 2 in Supplement 1. The loadings were significantly correlated between 82.1% of PCs. Of the statistically significant correlations, 40.9% exceeded a correlation coefficient of 0.3. HBN females had the lowest percentage of significant correlations (eTable 3 in Supplement 1).

Clustering composition of the data

Participant similarity matrices are visualized as network graphs in Fig. 1. As seen, two distinct groupings are evident across datasets, brain measures, and male and female subsets. To determine if clusters existed in the data, we used the gap statistic [43], comparing the within cluster dispersion of the data to that expected under the null distribution (no random permutation networks). The gap statistic was significantly larger for our data compared to the null distributions (random permutation networks) for surface area, cortical thickness, and cortical and subcortical volume for both males and females (p < 0.01, eTable 4 in Supplement 1). Silhouette and Calinksi-Harabsz scores (eFigure 3 in Supplement 1) suggest that the optimal number of clusters is two for all measures and datasets.

**Fig. 1: Replicability of clustering structure.**

Clustering results

Across brain measures and datasets, there was very low alignment between diagnostic labels and data-driven groupings (adjusted rand scores <0.02; eTable 5 in Supplemental 1). Alignment among clusters constructed using different brain measures is shown in the eFigure 4 in Supplement 1. Across all datasets, clustering solutions were highly aligned for cortical volume and surface area (adjusted rand score 0.63–0.81), and moderately aligned for cortical thickness and subcortical volume (adjusted rand score 0.22–0.44). This finding was replicated across datasets and female/male subsets.

Cluster differences in brain measures

For both datasets, we computed the effect size for the differences in brain measures across clusters using Cohen’s d (eFigure 5, eFigure 6 and eTable 6 in the Supplement 1). Figure 2 shows the association among these effect sizes between POND and HBN, as well as male and female subsets. Linear regression analysis revealed a significant association between cluster effect sizes for POND and HBN after controlling for measure and sex (intercept=0.09+/−0.02, p < 0.0001; beta=0.92+/−0.01, p < 0.0001; adjusted R-squared=0.93). Similarity, a significant association was found for cluster effect sizes between males and females (intercept = −0.04+/−0.02, p = 0.04; beta=0.97+/−0.01, p < 0.0001; adjusted R-squared=0.91). This suggests that brain signatures associated with the clusters are highly consistent between datasets and male/female subsets.

**Fig. 2: Association among between-cluster effect sizes computed.**

Cluster associations with phenotypic measures

Univariate testing did not reveal any significant between-cluster-differences in age, race/ethnicity, family income and education, ethnicity, FSIQ, SCQ, SWAN, or CBCL scores (detailed statistics in the eTable 3 in Supplemental 1) across datasets or measures (Fig. 3).

The accuracy for multivariable prediction of cluster labels is presented in the eTable 7 in Supplemental 1. One-sample t-tests revealed that cluster labels were predicted with greater than chance accuracy for subcortical volume for males in both POND (accuracy = 0.65 ± 0.09; p = 0.02) and HBN (accuracy = 0.61 ± 0.05; p = 0.01). For those prediction tasks, feature importance values (calculated based on mean decrease in impurity [48]) are reported in supplemental eFigure 7. The differentiating features were highly consistent between POND and HBN and female and male subsets, with the highest importance attributed to age and the phenotypic measures (IQ, CBCL internalizing and external, SCQ, SWAN scores). The contribution of sociodemographic factors to prediction was significantly smaller.

Discussion

Our study characterized the replicability of the participant similarity networks constructed using surface area, cortical thickness, and cortical and subcortical volume, across the POND and HBN datasets, as well as male and female subsamples.

Replicability across datasets

Despite significant differences in the POND and HBN datasets in demographic and phenotypic composition, our results revealed a high degree of consistency between the data structures for the two datasets. In particular, we found high between-datasets correlations among the principal components obtained using POND and HBN datasets, suggesting that data structures are similar in both datasets across the brain measures examined. The clustering structure was highly replicable across datasets, with our results revealing a 2-cluster composition across the four brain measures and the female/male subsets. This is at the lower end of previous literature findings where the number of reported clusters is highly variable, ranging from 2–10 [6, 9,10,11, 18, 49]. Larger number of clusters are likely to be found when multiple brain measures are combined, especially if these measures quantify potentially different biological mechanisms (for example, if two independent groups are found in each measure A and B, the combination of measures will result in four possible group combinations).

The brain signatures of the clusters were highly consistent across datasets with high correlations among regional effect sizes for between-cluster differences. Another finding that was replicated was that data-driven clusters were not aligned with diagnostic labels as indicated by the low Adjusted Rand Index scores (eTable 5 in Supplement), across datasets, brain measures, and female/male datasets. This finding is consistent with previous literature [6, 9, 10, 18, 49], further adding to the body of work highlighting the need for enhanced biologically-relevant precision in characterization of neurodevelopmental conditions, compared to our broad diagnostic categories.

In this study, we did not find statistically significant phenotypic differences between clusters through uni-variate analysis. However, multivariate analysis showed that cluster labels derived based on subcortical volume were predicted with greater than chance accuracy based on a combination of differences in age, IQ, internalizing, externalizing symptoms, autism features, and inattention and hyperactivity/impulsivity, across both POND and HBN. This finding suggests that neurobiological homogeneity may not align with single diagnostic domains of neurodevelopmental conditions, but instead, reflects differences in constellation of phenotypic features that are not specific to a single diagnosis category. The null finding of univariate phenotypic differences may also be due to statistical power as replicability in brain-behaviour associations may require very large sample sizes [50].

Replicability across brain measures

In addition to between-dataset differences, we examined the replicability of clustering structures across different brain measures within a dataset. The two-cluster solution was replicated across cortical area, cortical thickness, cortical volume, and subcortical volume; However, the participant membership to the clusters was only partially replicated between subcortical volume and cortical thickness, and surface area and cortical volume, but not at all among other pairs of measures. The misalignment between cortical thickness and surface area is not surprising given that these features are suggested to be genetically distinct determinants of cortical structure [51]. Further, the finding of replicability between cortical volume and surface area is consistent with the suggestion that interindividual variation in gray matter volume is largely driven by differences in surface area rather than the cortical thickness [52]. The dissociation between cortical thickness and surface area is particularly important to studies of subgroup structure in neurodevelopmental conditions that integrate multiple measures of cortical morphology. Given that these measures reflect different genetic mechanisms, clustering based on each individual measure may be advantageous to reveal subgroups that share differences in these mechanisms.

Replicability and sex differences

Given the known sex-differences in the neurobiology and phenotypic expression of neurodevelopmental conditions [53], we disaggregated our results by sex. There was high replicability between male and female datasets in the principal component decomposition of the brain measures, overall clustering structure, lack of alignment with the diagnostic labels, and brain signatures of clusters. In terms of brain-phenotype association, replicability was found in HBN, but not POND. It is important to note that overall, we observed higher variability in the female dataset. This may suggest larger variability in neurobiological characteristics or may be the result of our smaller sample size for the female subsets.

Strengths and limitations

This study has several strengths, including our large sample sizes across both datasets. At the same time, there was lower representation of females, matching the expected prevalence in autism and ADHD. This may have limited our ability to detect female-specific patterns. Additionally, our phenotypic measures were limited by what was available in both POND and HBN sets. It may be possible that brain-behaviour associations can be found in other measures of function or cognition (e.g., response inhibition, memory, affect recognition).

Conclusions

To our knowledge, this is the first study of clustering replicability in structural brain measures across neurodevelopmental conditions. We found evidence of replicability of the clustering structure across two independent datasets; however, when examining replicability across brain measures, only replicability across cortical thickness and subcortical volume, and surface area and cortical volume were strongly supported by our results.

Data availability

Participants were drawn from the Province of Ontario Neurodevelopmental Disorders (POND) network (exported April 2021; now available via a controlled data release through Ontario Brain Institute’s Brain-CODE: https://www.braincode.ca/) and the Healthy Brain Network (exported November 2020: http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/) datasets.

Code availability

The codes used in this study are available at https://github.com/KamranNiroomand/ClusteringAnalysisStudy.

References

Christensen DL, Bilder DA, Zahorodny W, Pettygrove S, Durkin MS, Fitzgeral RT, et al. Prevalence and characteristics of autism spectrum disorder among 4-year-old children in the autism and developmental disabilities monitoring network. J Dev Behav Pediatr. 2016;37:1–8. https://doi.org/10.1097/DBP.0000000000000235.
Article PubMed Google Scholar
Lever AG, Geurts HM. Psychiatric co-occurring symptoms and disorders in young, middle-aged, and older adults with autism spectrum disorder. J Autism Dev Disord. 2016;46:1916–30. https://doi.org/10.1007/s10803-016-2722-8.
Article PubMed PubMed Central Google Scholar
Thapar A, Cooper M, Rutter M. Neurodevelopmental disorders. Lancet Psychiatry. 2017;4:339–46. https://doi.org/10.1016/S2215-0366(16)30376-5.
Article PubMed Google Scholar
Boedhoe PSW, van Rooij D, Hoogman M, Twisk JWR, Schmaal L, Abe Y, et al. Subcortical brain volume, regional cortical thickness, and cortical surface area across disorders: findings from the ENIGMA ADHD, ASD, and OCD working groups. Am J Psychiatry. 2020;177:834–43. https://doi.org/10.1176/appi.ajp.2020.19030331.
Article PubMed PubMed Central Google Scholar
Cao M, Shu N, Cao Q, Wang Y, He Y. Imaging functional and structural brain connectomics in attention-deficit/hyperactivity disorder. Mol Neurobiol. 2014;50:1111–23. https://doi.org/10.1007/s12035-014-8685-x.
Article PubMed CAS Google Scholar
Kushki A, Anagnostou E, Hammill C, Duez P, Brian J, Iaboni A, et al. Examining overlap and homogeneity in ASD, ADHD, and OCD: a data-driven, diagnosis-agnostic approach. Transl Psychiatry. 2019;9:318. https://doi.org/10.1038/s41398-019-0631-2.
Article PubMed PubMed Central Google Scholar
Ellegood J, Anagnostou E, Babineau BA, Crawley JN, Lin L, Genestine M, et al. Clustering autism: using neuroanatomical differences in 26 mouse models to gain insight into the heterogeneity. Mol Psychiatry. 2015;20:118–25. https://doi.org/10.1038/mp.2014.98.
Article PubMed CAS Google Scholar
Zheng S, Hume KA, Able H, Bishop SL, Boyd BA. Exploring developmental and behavioral heterogeneity among preschoolers with ASD: a cluster analysis on principal components. Autism Res. 2020;13:796–809. https://doi.org/10.1002/aur.2263.
Article PubMed Google Scholar
Vandewouw MM, Brian J, Crosbie J, Schachar RJ, Iaboni A, Georgiades S. et al. Identifying replicable subgroups in neurodevelopmental conditions using resting-state functional magnetic resonance imaging data. JAMA Netw Open. 2023;6:e232066.
Article PubMed PubMed Central Google Scholar
Choi EJ, Vandewouw MM, Taylor MJ, Arnold PD, Brian J, Crosbie J, et al. Beyond diagnosis: Cross-diagnostic features in canonical resting-state networks in children with neurodevelopmental disorders. NeuroImage Clin. 2020;28:102476. https://doi.org/10.1016/j.nicl.2020.102476.
Article PubMed PubMed Central Google Scholar
Zabihi M, Floris DL, Kia SM, Wolfers T, Tillmann J, Arenas AL, et al. Fractionating autism based on neuroanatomical normative modeling. Transl Psychiatry. 2020;10:1–10. https://doi.org/10.1038/s41398-020-01057-0.
Article Google Scholar
Jacobs GR, Voineskos AN, Hawco C, Stefanik L, Forde NJ, Dickie EW, et al. Integration of brain and behavior measures for identification of data-driven groups cutting across children with ASD, ADHD, or OCD. Neuropsychopharmacology. 2021;46:643–53.
Article PubMed Google Scholar
Chien H, Gau SS, Hsu Y, Chen YJ, Lo YC, Shih YC, et al. Altered cortical thickness and tract integrity of the mirror neuron system and associated social communication in autism spectrum disorder. Autism Res. 2015;8:694–708.
Article PubMed Google Scholar
Fortin JP, Cullen N, Sheline YI, Taylor YD, Aselcioglu Y, Cook PA, et al. Harmonization of cortical thickness measurements across scanners and sites. NeuroImage. 2018;167:104–20. https://doi.org/10.1016/j.neuroimage.2017.11.024.
Article PubMed Google Scholar
Loth E, Charman T, Mason L, Tillmann J, Jones EJH, Wooldridge C, et al. The EU-AIMS Longitudinal European Autism Project (LEAP): design and methodologies to identify and validate stratification biomarkers for autism spectrum disorders. Mol Autism. 2017;8:24. https://doi.org/10.1186/s13229-017-0146-8.
Article PubMed PubMed Central Google Scholar
Poldrack RA, Baker CI, Durnez J, Gorgolewski KJ, Matthews PM, Munafò MR, et al. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat Rev Neurosci. 2017;18:115–26. https://doi.org/10.1038/nrn.2016.167.
Article PubMed PubMed Central CAS Google Scholar
Pipitone J, Park MTM, Winterburn J, Lett TA, Lurch JP, Pruessner JC, et al. Multi-atlas segmentation of the whole hippocampus and subfields using multiple automatically generated templates. NeuroImage. 2014;101:494–512. https://doi.org/10.1016/j.neuroimage.2014.04.054.
Article PubMed Google Scholar
Kushki A, Cardy RE, Panahandeh S, Malihi M, Hammill C, Brian J, et al. Cross-diagnosis structural correlates of autistic-like social communication differences. Cereb Cortex. 2021;31:5067–76. https://doi.org/10.1093/cercor/bhab142.
Article PubMed PubMed Central Google Scholar
Committee on Reproducibility and Replicability in Science, Board on Behavioral, Cognitive, and Sensory Sciences, Committee on National Statistics, et al. Reproducibility and Replicability in Science. National Academies Press; 2019:25303. https://doi.org/10.17226/25303.
Becker MM, Wagner MB, Bosa CA, Schmidt C, Longo D, Papaleo C, et al. Translation and validation of Autism Diagnostic Interview-Revised (ADI-R) for autism diagnosis in Brazil. Arq Neuropsiquiatr. 2012;70:185–90. https://doi.org/10.1590/S0004-282X2012000300006.
Article PubMed Google Scholar
Scahill L, Riddle MA, McSWIGGIN-HARDIN M, Ort SI, King RA, Goodman WK, et al. Children’s Yale-Brown obsessive compulsive scale: reliability and validity. J Am Acad Child Adolesc Psychiatry. 1997;36:844–52. https://doi.org/10.1097/00004583-199706000-00023.
Article PubMed CAS Google Scholar
Kaufman J, Birmaher B, Brent D, Rao U, Flynn C, Moreci P, et al. Schedule for affective disorders and schizophrenia for school-age children-present and lifetime version (K-SADS-PL): initial reliability and validity data. J Am Acad Child Adolesc Psychiatry. 1997;36:980–8. https://doi.org/10.1097/00004583-199707000-00021.
Article PubMed CAS Google Scholar
Guha M. Diagnostic and statistical manual of mental disorders: DSM-5. Ref Rev. 2014;28:36–37.
Google Scholar
Rutter M, Bailey A, Lord C. SCQ. Soc Commun Quest Torrance CA West Psychol Serv. Published online 2003.
Barkley RA. Behavioral inhibition, sustained attention, and executive functions: Constructing a unifying theory of ADHD. Psychol Bull. 1997;121:65–94. https://doi.org/10.1037/0033-2909.121.1.65.
Article PubMed Google Scholar
Wechsler D. Wechsler Abbreviated Scale of Intelligence. Published online November 12, 2012. https://doi.org/10.1037/t15170-000.
Powel J. Wechsler memory scale-revised David A. Wechsler. New York: The Psychological Corporation. Harcourt Brace Jovanovich, Inc, 1987. 150 pp. Arch Clin Neuropsychol. 1988;3:397–403. https://doi.org/10.1016/0887-6177(88)90053-4.
Article Google Scholar
Achenbach TM, Edelbrock C. Child Behavior Checklist. University Associates in Psychiatry; 2001.
Nelson EC, Hanna GL, Hudziak JJ, Botteron KN, Heath AC, Todd RD. Obsessive-compulsive scale of the child behavior checklist: specificity, sensitivity, and predictive power. Pediatrics. 2001;108:E14. https://doi.org/10.1542/PEDS.108.1.E14.
Article PubMed Google Scholar
Lepage C, Wagstyl K, Jung B, Seidlitz J, Sponheim C, Ungerleider M, et al. CIVET-Macaque: an automated pipeline for MRI-based cortical surface generation and cortical thickness in macaques. NeuroImage. 2021;227:117622. https://doi.org/10.1016/j.neuroimage.2020.117622.
Article PubMed Google Scholar
Park MTM, Pipitone J, Baer LH, Winterburn JL, Shah Y, Chavez S, et al. Derivation of high-resolution MRI atlases of the human cerebellum at 3T and segmentation using multiple automatically generated templates. NeuroImage. 2014;95:217–31. https://doi.org/10.1016/j.neuroimage.2014.03.037.
Article PubMed Google Scholar
Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging. 1998;17:87–97. https://doi.org/10.1109/42.668698.
Article PubMed CAS Google Scholar
Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7.
Article Google Scholar
Lyttelton O, Boucher M, Robbins S, Evans A. An unbiased iterative group registration template for cortical surface analysis. NeuroImage. 2007;34:1535–44. https://doi.org/10.1016/j.neuroimage.2006.10.041.
Article PubMed Google Scholar
Kim JS, Singh V, Lee JK, Lerch J, Ad-Dab’bagh Y, MacDonald D, et al. Automated 3-D extraction and evaluation of the inner and outer cortical surfaces using a Laplacian map and partial volume effect classification. NeuroImage. 2005;27:210–21. https://doi.org/10.1016/j.neuroimage.2005.03.036.
Article PubMed Google Scholar
Cauvet É, van’t Westeinde A, Toro R, Kuja-Halkola R, Neufeld J, Mevel K, et al. Sex differences along the autism continuum: a twin study of brain structure. Cereb Cortex. 2019;29:1342–50. https://doi.org/10.1093/cercor/bhy303.
Article PubMed Google Scholar
Greven CU, Richards JS, Buitelaar JK Sex Differences in ADHD. Vol 1. Oxford University Press; 2018. https://doi.org/10.1093/med/9780198739258.003.0016.
Lawrence KE, Hernandez LM, Bowman HC, Padgaonkar NT, Fuster E, Jack A, et al. Sex Differences in functional connectivity of the salience, default mode, and central executive networks in youth with ASD. Cereb Cortex. 2020;30:5107–20. https://doi.org/10.1093/cercor/bhaa105.
Article PubMed PubMed Central Google Scholar
Schmithorst VJ, Holland SK. Sex differences in the development of neuroanatomical functional connectivity underlying intelligence found using Bayesian connectivity analysis. NeuroImage. 2007;35:406–19. https://doi.org/10.1016/j.neuroimage.2006.11.046.
Article PubMed Google Scholar
Tipping ME, Bishop CM. Probabilistic Principal Component Analysis. J R Stat Soc Ser B Stat Methodol. 1999;61:611–22. https://doi.org/10.1111/1467-9868.00196.
Article Google Scholar
Polychronopoulou A, Zhou F, Obradovic Z Cosine similarity for multiplex network summarization. In: ; 2021:56-63.
Cohen J Statistical Power Analysis for the Behavioral Sciences.; 1988.
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol. 2001;63:411–23.
Article Google Scholar
Calinski T, Harabasz J. A dendrite method for cluster analysis. Commun Stat - Theory Methods. 1974;3:1–27. https://doi.org/10.1080/03610927408827101.
Article Google Scholar
LaPlante RA, Douw L, Tang W, Stufflebeam SM. The connectome visualization utility: Software for visualization of human brain networks. PloS One. 2014;9:e113838.
Article PubMed PubMed Central Google Scholar
Rubinov M, Sporns O. Complex network measures of brain connectivity: uses and interpretations. Neuroimage. 2010;52:1059–69.
Article PubMed Google Scholar
Steinley D. Properties of the hubert-arable adjusted rand index. Psychol Methods. 2004;9:386–96. https://doi.org/10.1037/1082-989X.9.3.386.
Article PubMed Google Scholar
Breiman L. Random Forests. Mach Learn. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324.
Article Google Scholar
Sadat-Nejad Y, Vandewouw MM, Cardy R, Lerch JP, Taylor MJ, Iaboni A, et al. Investigating heterogeneity across autism, ADHD, and typical development using measures of cortical thickness, surface area, cortical/subcortical volume, and structural covariance. Front Child Adolesc Psychiatry. 2023;2. Accessed February 8, 2024. https://www.frontiersin.org/articles/10.3389/frcha.2023.1171337.
Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654–60. https://doi.org/10.1038/s41586-022-04492-9.
Article PubMed PubMed Central CAS Google Scholar
Panizzon MS, Fennema-Notestine C, Eyler LT, Jernigan TL, Prom-Wormley E, Neale M, et al. Distinct genetic influences on cortical surface area and cortical thickness. Cereb Cortex. 2009;19:2728–35. https://doi.org/10.1093/cercor/bhp026.
Article PubMed PubMed Central Google Scholar
Im K, Lee JM, Lyttelton O, Kim SH, Evans AC, Kim SI. Brain size and cortical structure in the adult human brain. Cereb Cortex. 2008;18:2181–91. https://doi.org/10.1093/cercor/bhm244.
Article PubMed Google Scholar
Bölte S, Neufeld J, Marschik PB, Williams ZJ, Gallagher L, Lai MC. Sex and gender in neurodevelopmental conditions. Nat Rev Neurol. 2023;19:136–59. https://doi.org/10.1038/s41582-023-00774-6.
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Holland Bloorview Kids Rehabilitation Hospital, 150 Kilgour Rd, Toronto, ON, M4G1R8, Canada
Younes Sadat-Nejad, Marlee M. Vandewouw, Jessica Brian, Alana Iaboni, Evdokia Anagnostou & Azadeh Kushki
University of Toronto, Toronto, Ontario, Canada
Younes Sadat-Nejad, Jennifer Crosbie, Russell J. Schachar, Margot J. Taylor, Jason P. Lerch, Evdokia Anagnostou & Azadeh Kushki
Harvard Medical School, Boston, MA, USA
Marlee M. Vandewouw
The Hospital for Sick Children, Toronto, Ontario, Canada
Jennifer Crosbie, Russell J. Schachar, Margot J. Taylor, Bilal Syed & Christopher Hammill
Queen’s University, Kingston, Ontario, Canada
Elizabeth Kelley & Jessica Jones
University College London, London, UK
Muhammad Ayub
University of Western Ontario, London, Canada
Robert Nicolson
McMaster University, Hamilton, Canada
Stelios Georgiades
University of Calgary, Calgary, Canada
Paul D. Arnold
University of Oxford, Oxford, UK
Jason P. Lerch
Department of Psychiatry, Icahn School of Medicine, Mount Sinai, NY, USA
Evdokia Anagnostou

Authors

Younes Sadat-Nejad
View author publications
Search author on:PubMed Google Scholar
Marlee M. Vandewouw
View author publications
Search author on:PubMed Google Scholar
Jessica Brian
View author publications
Search author on:PubMed Google Scholar
Jennifer Crosbie
View author publications
Search author on:PubMed Google Scholar
Russell J. Schachar
View author publications
Search author on:PubMed Google Scholar
Alana Iaboni
View author publications
Search author on:PubMed Google Scholar
Elizabeth Kelley
View author publications
Search author on:PubMed Google Scholar
Jessica Jones
View author publications
Search author on:PubMed Google Scholar
Margot J. Taylor
View author publications
Search author on:PubMed Google Scholar
Muhammad Ayub
View author publications
Search author on:PubMed Google Scholar
Robert Nicolson
View author publications
Search author on:PubMed Google Scholar
Bilal Syed
View author publications
Search author on:PubMed Google Scholar
Christopher Hammill
View author publications
Search author on:PubMed Google Scholar
Stelios Georgiades
View author publications
Search author on:PubMed Google Scholar
Paul D. Arnold
View author publications
Search author on:PubMed Google Scholar
Jason P. Lerch
View author publications
Search author on:PubMed Google Scholar
Evdokia Anagnostou
View author publications
Search author on:PubMed Google Scholar
Azadeh Kushki
View author publications
Search author on:PubMed Google Scholar

Contributions

YSN conceptualized and designed the study, analysed study data, drafted the original manuscript, interpreted the results, and revised the manuscript. MM generated manuscript figures, and critically reviewed and revised the manuscript. EA contributed to the concept and design of the study, acquisition of study data, interpretation of results, and the critical review and revision of the manuscript. JL, MJT, AI, JB, EK, MA, JC, RJS, SG, RN, JJ, PDA, and JPL contributed to the acquisition of data and the critical review and revision of the manuscript. BS and CH contributed to processing of the neuroimaging data and generation of figures. AK conceptualized and designed the study and analytic plan, interpreted the results, and drafted the manuscript. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.

Corresponding author

Correspondence to Azadeh Kushki.

Ethics declarations

Competing interests

Dr Nicolson reported receiving grants from Brain Canada, Hoffman La Roche, Otsuka Pharmaceuticals, and Maplight Therapeutics outside the submitted work. Dr Anagnostou reported receiving grants from Roche and Anavex; receiving nonfinancial support from AMO Pharma and CRA-Simons Foundation; and receiving personal fees from Roche, Impel, Ono, and Quadrant outside the submitted work; in addition, Dr Anagnostou had a patent for Anxiety Meter issued 14/755/084 (United States) and a patent for Anxiety Meter pending 2,895,954 (Canada) as well as receiving royalties from APPI and Springer. Dr Kushki reported receiving grants from National Science and Engineering Research Council during the conduct of the study; in addition, Dr Kushki had a patent for Anxiety Meter with royalties paid from Awake Labs. No other disclosures were reported. Dr Kushki is the inventor of a software called holly (formerly “Anxiety Meter”). She is involved in commercialising the holly (patents US 9,844,332 B2) and will financially benefit from its sales. Dr Kushki served on the board of advisors for Shaftesbury, a media company developing virtual reality products for autistic children, from February 2020 to February 2021, and was compensated financially for this role. She has also received consulting fees from DNAStack and donations of hardware for her research program from Samsung Canada.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary materials

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sadat-Nejad, Y., Vandewouw, M.M., Brian, J. et al. Characterizing replicability in the clustering structure of brain morphology in autism, attention-deficit/hyperactivity disorder, and obsessive compulsive disorder. Transl Psychiatry 15, 333 (2025). https://doi.org/10.1038/s41398-025-03540-y

Download citation

Received: 15 May 2024
Revised: 15 July 2025
Accepted: 12 August 2025
Published: 30 August 2025
DOI: https://doi.org/10.1038/s41398-025-03540-y

Subjects

Abstract

Similar content being viewed by others

Integration of brain and behavior measures for identification of data-driven groups cutting across children with ASD, ADHD, or OCD

Cortical and subcortical structural differences in psychostimulant-free ADHD youth with and without a family history of bipolar I disorder: a cross-sectional morphometric comparison

Generalizability of 3D CNN models for age estimation in diverse youth populations using structural MRI

Introduction

Methods

Participants

Behavioural measures

Sociodemographics measures

Imaging data

Analysis pipeline

Statistical analyses

Results

Participants

PCA decomposition

Clustering composition of the data

Clustering results

Cluster differences in brain measures

Cluster associations with phenotypic measures

Discussion

Replicability across datasets

Replicability across brain measures

Replicability and sex differences

Strengths and limitations

Conclusions

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary materials

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links