Introduction

Attention deficit hyperactivity disorder (ADHD) is a highly prevalent and debilitating condition that affects individuals in multiple aspects across the lifespan, which often causes significant burdens to individuals and families [1, 2]. Similar to other psychiatric disorders, the specific mechanisms underlying ADHD remain unclear. Currently, the genetic susceptibility and the succedent alterations in brain structure and function were suggested as core biological substrates in ADHD [3,4,5]. Genetic factors represent the most significant contributor to ADHD, with heritability estimates as high as 74% [3, 6, 7].

Advances in neuroimaging have enabled analysis of intricate brain connectivity networks, highlighting white matter (WM) as a critical structural framework for information exchange between grey matter (GM) regions [8,9,10]. Diffusion tensor imaging (DTI) is the most commonly used approach to identify in-vivo white matter (WM) properties, which helped researchers to identify altered microstructural WM integrity in psychiatric disorders. Studies on ADHD have indicated widespread WM compromises, including the Corpus Callosum (CC), the cerebellum, and the projection tracts [11,12,13,14,15,16].

Previous evidence found a robust genetic foundation underlying WM structure among typically developed (TD) population, with heritability estimates for fractional anisotropy (FA) metrics reaching up to 90% [17,18,19]. In ADHD, concordant WM changes have been observed between affected individuals and their unaffected siblings, providing evidence for strong genetic basis on WM development in this population [20, 21]. Subsequent studies further highlighted that genetic risks might act through WM alterations to act on ADHD symptoms [22, 23]. Incorporating 1314 participants collected among eight European sites, Albaugh et al. reported significant associations between ADHD symptoms and left inferior fronto-occipital, inferior longitudinal, uncinate fasciculi [22]. Based on 544 individuals, Sudre et al. identified that ADHD-PRSs were significantly associated with reduced FA in the anterior corona radiata, which mediated 29% of the genetic effects on hyperactive/impulsive symptoms, suggesting WM integrity as a critical endophenotype [23]. However, the scope remains constrained, GWAS studies predominantly reported genomic background of WM in TD population, leaving unanswered questions about potential differences in the interactive patterns between ADHD diagnosis and genes on WM microstructure.

Canonical correlation analysis (CCA) is a useful tool for identifying co-varying patterns in high-dimensional multimodal datasets [24]. Presently, it has been employed to elucidate interactions between neuroimaging and genetic data [25]. Using CCA, Lin et al., identified multivariate association patterns in schizophrenia (SCZ) between genes and brain, linking glutaminergic and GABAergic genes with the frontal lobe and the thalamus [26]. In Alzheimer’s disease (AD), single nucleotide polymorphisms (SNPs) locating within the AD-related gene APOE were detected as associated with both brain structural and functional endophenotypes [27,28,29,30,31].

Andrew extended CCA by developing Deep CCA (DCCA), an unsupervised deep learning model that maximizes correlations between datasets through nonlinear transformations [32]. Compared to CCA, DCCA learns from gradient-based iterations, which could better capture the real-world nonlinear transformations. Previous studies have demonstrated that DCCA could prominently improve the correlation coefficients between genetic and MRI latent features [30]. Meanwhile, DCCA and its variations present good performance in prediction tasks [26, 33,34,35,36]. Li et al. reported an accuracy up to 95.65% in diagnosing SCZ [26]. Hu et al., reported excellent performance in predicting cognitive abilities, with an accuracy of 96.64% [33].

Given the intricate involvement of gene and brain in ADHD, integrating DCCA holds potential in providing crucial insights into the etiological mechanisms of ADHD. However, no such work has been undertaken in this context thus far.

Based on the aforementioned background in ADHD and DCCA, the current study utilized novel techniques and firstly developed a set of adversarial DCCA (A-DCCA) models: 1) An ADHD-A-DCCA model (AA-DCCA) aimed to learn association patterns between white matter imaging measures and genetic variants which are unique to ADHD; 2) Shared-A-DCCA model (SA-DCCA) integrated a gradient reverse layer (GRL) to disregard diagnostic distinctions, to identify shared patterns between ADHD and TD individuals. We hypothesized that the gene-WM patterns contributing to the SA-DCCA model constitute a background common to both ADHD and TD participants, while genes observed within the AA-DCCA pattern exert different effects on WM in ADHD group.

Methods

The main workflow of the study is illustrated in Fig. 1, which consists of two key components: 1) training a set of adversarial DL models and 2) interpreting the models to extract ADHD-specific and non-specific “gene-brain” association patterns (Fig. 1).

Fig. 1
figure 1

Flow diagram of the current study.

Study participants

All ADHD individuals met the diagnostic criteria of the Statistical Manual of Mental Disorders-IV or 5 (DSM-IV/DSM-5 for participants recruited in different time period) for ADHD. Diagnosis procedures involved initial assessment by senior child psychiatrists, followed by confirmation through semi-structured interviews utilizing the Chinese version of the Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version (KSADS-PL) [37, 38] For the controls, any evidence of current or past major psychiatric disorders in the K-SADS-PL interview and/or present or lifetime neurological disorders led to exclusion. The protocol was reviewed and approved by the Ethics Committee of Peking University Health Science Center. Written informed consent was obtained from the parents and from both the participants themselves and their parents if they were over 8 years old.

MRI acquisition and DTI preprocessing

To ensure data harmonization across cohorts, we adopted ENIGMA’s DTI preprocessing pipeline (https://enigma.ini.usc.edu/protocols/dti-protocols/). Details of the DTI parameters for each cohort can be found in the Supplementary Methods section. The FSL software (http://www.fmrib.ox.ac.uk/fsl) was employed for preprocessing in following steps: 1). Motion correction, eddy current distortion correction, brain extraction, and computation of fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD), and axial diffusivity (AD) maps were performed for individual participants; 2). FA images were registered to the ENIGMA FA template in the MNI-ICBM-152 standard space; 3). the registered FA maps were masked with the ENIGMA brain skeleton mask; 4). Other metric maps were registered using the FA transformations, and then masked by the skeletonized template. Post-preprocessing, skeletonized images in four modalities were resized into (N x (64, 64, 64)) and segmented into 64 slices (N x 64 x (64x64x1)) for input into the 2D autoencoder (AE) in DCCA models.

Genotyping and feature selection

One hundred sixteen samples were genotyped using Affymetrix6.0 array and four hundred eighty-four samples were genotyped using the InfiniumPsychArray-24 array by CapitalBio Ltd. (Beijing).

Quality control (QC) procedures for genetic data encompassed the following steps: 1). Exclusion of individuals with per-individual autosomal heterozygosity >3 SD larger than the mean, missing age or sex information, or a per-individual call rate <98%; 2). Exclusion of SNPs with low call rates (<95%), significant deviation from Hardy–Weinberg equilibrium (P < 1e-04), or a minor allele frequency (MAF) lower than 5%. QC procedures were performed for each genotyping dataset separately, followed by pre-phasing and imputation implemented using SHAPEIT and IMPUTE2 [39, 40]. The merged genotyping data comprised 600 samples with 2,268,177 variants for subsequent analysis.

For the A-DCCA models construction, the feature selection of Single Nucleotide Polymorphisms (SNPs) was conducted in reference to white matter-based [17] and ADHD-based GWAS results [3]. SNPs demonstrating a GWAS-P < 1E-05 were incorporated, resulting in 6874 SNPs remained for further analysis in this study. Each SNP was encoded to reflect the major alleles (A) it harbored. This encoding scheme yielded a (Nsamples × 4) one-hot vector representation, wherein the genotypes “aa,” “Aa,” “AA,” and “NA” were represented as (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), and (0, 0, 0, 1), respectively.

DL model development and training

Based on the DCCA model, the current study developed a set of data-fusion models (A-DCCAs): AA-DCCA and SA-DCCA. Both DCCA models employed two convolutional autoencoders (CAEs) to extract features from DTI (64 × 64 x 1) and genetic inputs (6874 × 4), respectively. Each convolutional autoencoder (CAE) was architecturally composed of three sequential convolutional blocks, each incorporating a convolutional layer followed by batch normalization and max-pooling operations to progressively refine spatial feature hierarchies. The hierarchical representations were subsequently flattened into a 1D feature vector, enabling dimensional reduction while preserving discriminative patterns. This transformation facilitated the projection of input data into a 10-dimensional latent space (output dimension: N×10), where N denotes the batch size.

To optimize cross-modal feature alignment, a canonical correlation analysis (CCA) layer was integrated into the pipeline. This layer dynamically quantified the statistical correlation between the latent embeddings of paired modalities (DTI and gene) through eigen-decomposition of their covariance structures [32]. The CCA formulation is:

$$({w}_{1}^{\ast },{w}_{2}^{\ast })=\mathop{{\rm{argmax}}}\limits_{{w}_{1},{w}_{2}}\,{\rm{corr}}({w}_{1}^{{\prime} }{X}_{1},{w}_{2}^{{\prime} }{X}_{2})$$

Then, to build AA-DCCA model, a classifier module with “Sigmoid” activation was appended to the final network layer, utilizing the binary classification of ADHD diagnosis as supervised labels. A joint optimization strategy was implemented combining Binary Cross-Entropy (BCE) and Canonical Correlation Analysis loss (CCA Loss).

In SA-DCCA architecture, a Gradient Reversal Layer (GRL) was integrated prior to the classifier. During backpropagation, this module applied negative weighting to diagnosis-related gradients, establishing an adversarial training mechanism [41]. This design significantly reduced the model’s sensitivity to diagnostic state variations.

The dataset was randomly divided into a training set (80%), and a validation set (20%). Model training initiated using the Xavier method for initialization [42]. Additional specifications included a batch size of 128, optimization with the Adam method [43], dropout layers with a dropout rate set to 0.4, a learning rate of 5E-3, early stopping after 30 epochs of unchanged validation loss, and a kernel size of 3.

Interpretation of the DL model

The DeepExplain package [44] facilitated the model’s interpretation. The “Saliency” method was employed to compute the contributions of individual voxels and SNPs to the trained models. We extracted WM clusters with saliency scores within the top 0.1% and consisting of over 100 voxels. Manual verification was conducted on gene variants with the top 10 saliency scores, and subsequent mapping of the top 200 SNPs to potential genes was performed for enrichment analyses. Regression and interaction analyses between the top SNPs and the entire AA/SA clusters were performed to identify potential associations and interactive effects.

Mapping risk genes and enrichment analyses

To link variants to potential genes, three methods for gene mapping were leveraged: 1) positional mapping, 2) eQTL mapping, and 3) chromatin interaction mapping. For positional mapping, we used a maximum distance of 10 kb [45]. The eQTL data and chromatin interaction data were utilized to identify genes through the implementation of FUMA [45] (details of the datasets can be found in the Supplementary materials).

Additionally, enrichment analyses were conducted through over-representation analysis in FUMA [45] and Metascape [46] to identify significant associations with gene sets or functional categories related to the identified variants.

Various databases encompassing both phenotypic and biological function information were utilized, comprising GWAScatalog, Gene Ontology (GO), Reactome, and the WikiPath database [47,48,49,50].

Pathway-based polygenic risk scores (p-PRSs)

The polygenic risk score (PRS) was utilized to calculate the effect of a set of SNPs in a test data set based on the GWAS summary statistics of a discovery dataset. In the current study, we selected four Gene ontology (GO) terms closely associated with neural system development and were suggested by the enrichment analyses (GO:0051402 neuron apoptotic process; GO:0016358 dendrite development; GO:0001764 neuron migration; GO:0045202 synapse) and calculate the pathway PRSs using PRSice-2 [51] based on ADHD GWAS summary statistics. Before generating the scores, clumping was used to obtain SNPs in linkage equilibrium (r2 < 0.1, 250 bp window). The P-value threshold for significance was set from 0 to 0.5, increasing by 0.00005. The associations between the polygenic profile and the target phenotypes were examined in linear regression models with age and sex as covariates for each DTI metric, P-values were adjusted by using 10,000 label-swapping permutations. The GWAS P-value threshold with the largest Nagelkerke’s r2 was considered as best-fitting.

Results

Clinal characteristics of the recruited subjects

The current study encompassed a dataset comprising 600 samples, consisting of 430 individuals diagnosed with ADHD and 170 TD individuals, all processed through the same rigorous quality control procedure. Mean age of the included samples was 121.49 ( ± 25.21) months and the male/female ratio is 4.7:1. Brain scans were performed at four different sites. No significant difference in age between ADHD and TD groups. A notable difference in sex ratio was observed, though still aligning with the recognized epidemiological features in ADHD (Further details were showed in the Supplementary Materials).

ADHD-specific and non-specific non-linear multivariate “Gene-DTI” interaction pattern

We identified WM regions and genetic variants contributed most to the A-DCCA models through saliency scores. The AA-DCCA model highlighted a cluster extending from the right cerebral peduncle (CP) to the right posterior limb of the internal capsule (PLIC) (Fig. 2). Conversely, the top contributors to the SA-DCCA model encompassed six clusters across diverse WM tracts, including the right superior longitudinal fasciculus (SLF), the right posterior thalamic radiation (PTR), the left CP, and the left PLIC.

Fig. 2: White matter patterns identified by the A-DCCA models and associated grey matter regions.
figure 2

Key white matter clusters to the AA-DCCA (A) and SA-DCCA (B), and the grey matter areas projected (C, D); DTI metrics in the whiter matter clusters in ADHD and TD groups (E & F); In the AA-DCCA cluster, AD values were significantly increased in ADHD group (E); in SA-DCCA regions, the FA values were decreased in ADHD, compared with TD.

Deterministic fiber tracking mapped the GM regions connected by these identified WM regions (Fig. 2). The AA-DCCA WM cluster connects the bilateral supplementary motor area, right dorsal lateral superior frontal gyrus, and bilateral cerebellum. The six SA-DCCA clusters displayed widespread connections across frontal, parietal, occipital, temporal lobes, and the cerebellum.

Top 10 SNPs and subsequent genes in each model were listed in the Table 1. Five of 10 SNPs contributing to the AA-DCCA model located at genes involved in apoptotic processes (METTL15, MAP2K4, and CAMK1D), with CAMK1D additionally regulating dendrite development. In contrast, the top gene in the SA-DCCA model, FYN, plays critical roles in neural developmental processes, including neuron migration and projection.

Table 1 Top 10 SNPs contributed to each A-DCCA model.

Enrichment analyses based on phenotypes suggested that the AA genes were most significantly enriched in a mental disorder, major depressive disorder (MDD); while the SA genes were mostly enriched in brain morphology (Supplementary Figure 3). These findings might suggest that the AA genes exhibit more disorder-specific roles. Results based on the biological function terms highlighted a distinct role of top genes in SA-DCCA model in the exocytosis, which is essential for synaptic function (Supplementary figure 4).

Validating ADHD-specific and non-specific associations by post-hoc analyses

Comparative analyses between ADHD and TD cohorts were conducted in the previously identified WM regions (AA, SA regions) and top 25 SNPs from each model. Results demonstrated significantly higher AD values in the AA-model cluster for the ADHD group compared to the TD group (P = 0.00427, Pbon = 0.017). Conversely, FA values within the SA regions were notably reduced in individuals with ADHD (P = 0.00195, Pbon = 0.0078). Moreover, the majority of SNPs from the AA-model contributors (16 out of 20, 80%) exhibiting significant differences between diagnostic groups (Fig. 3A). Exploration of interaction effects revealed differential associations between the AA regions and specific SNPs. For instance, the variant rs7330238 (chr13:113703404), located within MCF2L, exhibited distinct associations with MD values in ADHD and TD groups. Similarly, an interaction effect was observed between RD values and rs1051861 (chr14:58838701), located within ARID4A—an ADHD-associated gene. Conversely, such interaction effects were not evident in SA associations.

Fig. 3: Genetic association identified by the AA-DCCA (A) and SA-DCCA.
figure 3

A SNPs features that differed in ADHD and TD individuals (red denotes the variant is contributor to the AA-DCCA model; blue denotes contributor of SA-DCCA); B Association patterns between top 5 SNPs and DTI metrics in the key white matter clusters (left: AA-DCCA; right: SA-DCCA); C Interaction effects in rs7330238 and rs1051861 with the AA-DCCA cluster.

Calculation of pathway-based Polygenic Risk Scores (p-PRSs) highlighted significant associations between WM properties and distinct neurodevelopmental processes. Specifically, the AA cluster showed a significant link to dendrite development (Padj = 0.023), while processes related to neuron migration and synapse were associated with the SC clusters (P = 0.0023, 0.0062, 0.0067; Padj = 0.010, 0.027, 0.027) (Fig. 4).

Fig. 4: Polygenic profiles underlying A-DCCA identified white matter regions.
figure 4

A, B Correlations between p-PRS and DTI metrics in two models. A Dendrite development P-PRS scores were associated with FA values in the AA-DCCA region; B In the SA regions, synapse p-PRSs were associated with RD and AD values (right and left), neuron migration p-PRSs were related to MD values (middle). C the correlations between dendrite p-PRS and AA-DCCA region differed among ADHD and TD groups. D best-fit R2 in each pathway in two models (asterisk * denotes significant adjust P-values).

Further examination of interaction effects between p-PRSs and WM regions revealed differential associations between the p-PRS of dendrite development and FA values in the AA region (P = 0.049). However, no such interactions were identified in the SA associations.

Discussion

Through interpretation of a set of A-DCCA models, we delineated shared and distinct gene-White Matter (WM) patterns in ADHD and TD cohorts. Specifically, our findings indicated differential effects of gene variants associated with apoptosis and dendrite development on WM regions, on the right CP and the right PLIC between ADHD individuals and TDs. Meanwhile, genes involved in the processes related to neuron migration and synapse had effects on widespread WM regions, involving the left CP, the left PLIC, the right SLF, and the right PTR.

Discerning the intricate interactions between genes and the brain remains a significant challenge, especially regarding potential differences in these associations across psychiatric disorders. The complexities inherent in individual variations, alongside the high-dimensional and nonlinear nature of the data, make direct association assessment challenging. Our study addressed this issue by employing DL models to conduct non-hypothesis-driven analyses on multimodal data. This approach represents a crucial step toward unraveling the etiological mechanisms of ADHD.

Despite their informative nature, DL models are often not sufficiently interpreted. In the field of psychiatry, in the absence of stable biomarkers, the essentialities of DL models have also been foregrounded by the ability of recognizing the latent mechanistic patterns under phenotypes. For instance, Gao et al. achieved a prominent accuracy when predicting three ADHD subtypes, pointing out that the connections between the right inferior occipital gyrus (IOG) and the right superior temporal gyrus (STG), and between the left supplementary motor area (SMA) and the right Heschl gyrus (HES) were the discriminative features [52]. Based on functional connectivity, Feng et al. informed ADHD biotypes, identified two subtypes response differently to pharmacological treatment [53]. As mentioned above, GWAS is an effective way to detect relevant signals in genome-wide, however, it also has limitations in detecting the real-world non-linear relationships. Liu’s work demonstrated that the convolutional neural network (CNN)-based model could capture ADHD-related SNPs which tend to be neglected by GWAS, serve as an important supplementary method [54]. The current study also holds expectations for using DL models to identify the underlying complex mechanisms.

Our results in adversarial models demonstrated two types of multivariate correlational patterns between gene and WM structure. The ADHD specific one raised between the right CP, right PLIC and genes related to apoptosis and dendrite. The pattern shared between ADHD and TD presented to be more widespread, including left projection and right association tracts associated with genes involved in basic neural system development. Interestingly, while the left CP and PLIC exhibited shared transdiagnostic association patterns, their right homologs were uniquely mapped to ADHD-specific genetic loading profiles. This hemispheric asymmetry aligns with emerging evidence of disrupted structural lateralization in ADHD neurodevelopment. A recent meta-analysis identified increased asymmetry in the frontal, subcortical, and cerebellum areas in ADHD individuals in grey matter volume (GMV), most of the GM regions exhibited “left-ward” shifts except the superior frontal lobe [55]. Same changes were observed in the Surface Area (SA) of the frontal lobe [56]. WM asymmetry alterations have been consistently reported across multiple diffusion metrics, showing leftward deviations in the frontal-striatal circuit and the corticospinal tract [57,58,59]. The current results implied the potential genetic basis underlying the “leftward” alterations in ADHD, provided further insight regarding this issue.

In this study, we found different association pattern between WM and genetic variants in ADHD cases and general population. On one hand, by interpreting AA-DCCA, we found an interaction pattern, in which the associations between gene and WM differed among ADHD and TD groups. The brain features were mostly contributed by a cluster located within the right CP and the right PLIC. This cluster mainly located in the projection tract, communicating between the right cerebellum and the bilateral supplementary motor area and the right superior frontal gyrus, which are both ADHD-related regions according to previous literature [60, 61], especially the prefrontal regions, providing “top-down” regulation of attention, inhibition/cognitive control, motivation through key neural networks includes dopaminergic circuits, reward processing networks, and default-mode network (DMN) [62, 63]. Showing consistencies with previous evidence, our findings also highlight the frontal-striatum-cerebellum circuit susceptible to genetic changes.

The top genes suggested a trend of being involved in the apoptotic processes (CAMK1D, METTL15, and MAP2K4). Enrichment analyses based on GO terms also detect such inclinations (R-HAS-140432: apoptosis induced DNA fragmentation, supplementary materials), and further suggested certain roles in neurogenesis and glia cell development. Moreover, p-PRS results also emphasized that the genetic loadings of dendrite development are associated with the AA cluster in different ways between ADHD and TD groups, which were also implied by the top contributed gene CAMK1D (Table 1). Briefly, our results suggested that possibly both apoptotic and dendrite-related processes impact the white matter in the right CP and PLIC in ADHD specific ways.

On the other hand, we identified patterns shared between individuals with and without ADHD using the SA-DCCA model. Unsurprisingly, wide-range WM tracts were identified as contributors. We identified six clusters bilaterally located. Four clusters locating in the left hemisphere were involved in the left CP, extended to the left PLIC. Other two clusters locating within the right cerebellum, within the right SLF, and the right PTR to the right SS, respectively. Tractography suggested that the clusters connected wide-range grey matter areas, including the five lobes and the bilateral cerebellum. Genetic analyses highlighted genes involved in more basic neurodevelopmental processes and synaptic function. For instance, the gene FYN, which is responsible for several important neural system developmental processes and the synaptic plasticity [64, 65]. Three zinc-finger protein family members, including the PHF2, ZSCAN31, and CD82 were also suggested. Zinc-finger proteins served as the DNA-binding transcriptional factors, regulate the development of neural system [66]. Based on previous evidence, PHF2 and ZSCAN31 both regulate the prenatal development [67, 68], including the proliferation of progenitors; CD82 is involved in the oligodendrocyte precursor differentiation [69, 70]. Enrichment analyses suggested the involvement of exocytosis, which is a synaptic process which regulate impulse conduction. P-PRS further suggested the regulation of genetic loadings in synapse-related pathways to the SA WM regions. Distinct from AA regions, no interaction effects were found here. To summarize, the interpretations of SA-DCCA model presented a shared pattern among individuals with and without ADHD, specifically, interactions between widespread WM tracts, involved in both projection and association tracts, and genes associated with more general and basic neurodevelopmental processes, including the prenatal brain development and synaptic plasticity.

Conclusion

The current study introduced a set of adversarial DCCA models, which identified ADHD-specific and non-specific gene-brain non-linear multivariate association patterns. Correlations between the right CP, PLIC and the genes responsible for apoptotic and dendrite development process were found to be different in ADHD and TD individuals; the interactions between widespread WM regions, involving bilateral projection and association tracts, and the genes involved in early neural development shared between two diagnostic groups. To conclude, the current results unveiled the etiological pathways of white matter development in ADHD and identified ADHD-specific genetic-brain associations.