Abstract
The corpus callosum (CC) is the largest set of white matter fibers connecting the two hemispheres of the brain. In humans, it is essential for coordinating sensorimotor responses and performing associative or executive functions. Identifying which genetic variants underpin CC morphometry can provide molecular insights into the CC’s role in mediating cognitive processes. We developed and used an artificial intelligence based tool to extract the midsagittal CC’s total and regional area and thickness in two large public datasets. We performed a genome-wide association study (GWAS) meta-analysis of European participants (combined N = 46,685) with generalization to the non-European participants (combined N = 7040). Post-GWAS analyses implicated prenatal intracellular organization and cell growth patterns, and high heritability in regions of open chromatin. Results suggest programmed cell death mediated by the immune system drives the thinning of the posterior body and isthmus. Genetic overlap, and causal genetic liability, between the CC, cerebral cortex features, and neuropsychiatric disorders such as attention-deficit/hyperactivity, bipolar disorders, and Parkinson’s disease were identified.
Similar content being viewed by others
Introduction
The corpus callosum (CC) is the largest white matter tract in the human brain, facilitating higher order functions of the cerebral cortex by allowing the two hemispheres of the brain to communicate1,2. This connection is essential for coordinating sensorimotor responses, performing associative and executive functions, and representing information in multiple dimensions3,4. Most CC fibers connect corresponding left and right cortical regions of the brain, with the organization, development of axonal elongation, and myelination of callosal fibers being correlated with the rostro-caudal (front-to-back) distribution of functional areas5,6. Regional alterations in CC shape are easily assessed with neuroimaging studies, which have found local callosal abnormalities in complex neurodevelopmental and neuropsychiatric disorders6,7,8,9,10,11, such as, on average, lower anterior volumes in people with autism spectral disorder12 and lower posterior thickness in individuals with bipolar disorder13. Twin studies show up to 66% heritability for CC area14,15, and previous single-cohort studies of genetic influences on CC volume and its relationship to neuropsychiatric disorders have found heritability estimates between 22–39%16,17. Yet, the interplay between genetic variants influencing CC morphometry, the cerebral cortex, and associated neuropsychiatric disorders is not well understood.
Three-dimensional (3D) magnetic resonance imaging (MRI) provides a non-invasive approach to quantify individual variations in brain regions and their connections6, including the morphometry of the CC, and how they are associated with brain-based traits and diseases. The midsagittal section of an anatomical brain MRI scan is able to capture the entire rostro-caudal formation of the CC, which is almost always in the field of view of 2D clinical and 3D research MRI scans alike. This 2D midsagittal representation can be segmented to offer a lower dimensional projection of the anatomical intricacies of the CC, allowing for structural measures of CC area and thickness to be computed18,19,20. We developed and validated a fully automated artificial intelligence based CC feature extraction tool, Segment, Measure, and AutoQC the midsagittal CC (SMACC), which we make publicly available at smacc20.
Using data from the UK Biobank21 (UKB) and Adolescent Brain Cognitive Development22 (ABCD) studies, here we present results from a genome-wide association study (GWAS) meta-analysis of total area and mean thickness of the CC derived using SMACC. We also present the results for five differentiated areas based on distinguishable projections to (1) prefrontal, premotor, and supplementary motor, (2) motor, (3) somatosensory, (4) posterior parietal and superior temporal, and (5) inferior temporal and occipital cortical brain regions23,24. These regions are believed to represent structural-functional coherence6. We performed a GWAS meta-analysis using two population-based cohorts, one of adolescents and another of older adults, to examine distinct genetic influences on CC area and thickness25,26. The principal analyses were in individuals of European ancestry and the same analyses were then repeated using the data from non-European participants to assess consistency in the magnitude and direction of effect sizes. Downstream post-GWAS analyses investigated the enrichment of genetic association signals in tissue types, cell types, brain regions, and biological pathways. We examined the genetic overlap at the global and local level, using LD Score regression (LDSC)27 and Local Analysis of Variant Association (LAVA)28, respectively, and the causal genetic relationships between CC phenotypes, cortical morphometry, and related neuropsychiatric conditions.
Results
Characterization of corpus callosum shape associated loci
We conducted a GWAS of area and mean thickness of the whole CC, and five regions of the Witelson parcellation scheme (Fig. 1)23,24, using data from participants of European ancestry from the UKB (N = 41,979) and ABCD cohorts (N = 4706). A meta-analysis of GWAS summary statistics of all CC derived metrics in UKB and ABCD was performed using METAL and the random-metal extension29,30, based on the DerSimonian-Laird random-effects model (Methods). To examine the generalizability of single-nucleotide polymorphism (SNP) effects across ancestries, these same analyses were run using data from non-European participants (total N = 7040).
An ideogram representing loci that influence total CC area, its mean thickness, and area and thickness of individual parcellations determined by the Witelson parcellation scheme in a rostral-caudal gradient (1–5). Results shown are from an inverse-weighted random-effects meta-analysis (DerSimonian-Laird method). Reported p-values are two-sided. All loci are significant at the Bonferroni corrected, experiment-wide threshold of p < 6.13 × 10−9. Created in part by using Biorender.com (agreement number UX28RS3P2L).
The GWAS meta-analysis identified 48 independent significant SNPs for total area and 18 independent SNPs for total mean thickness. Independent significant SNPs were determined in FUMA using the default threshold of r2 = 0.6, and genomic loci were determined at r2 = 0.1. This identified 28 genomic loci for total cross-sectional area, and 11 genomic loci for total mean thickness. All significant loci for total area and mean thickness showed concordance in the direction of effect between the two cohorts. There were 5 loci, all in intronic regions, each positionally mapped to genes31 that overlapped between area and mean thickness. These included IQCJ-SHIP1 (multimolecular complexes of initial axon segments and nodes of Ranvier, and calcium mediated responses)32, FIP1L1 (RNA binding and protein kinase activity)33, HBEGF (growth factor activity and epidermal growth factor receptor binding)34, CDKN2B-AS1 (involved in the NF-κB signaling pathway with diverse roles in the nervous system)35,36, and FAM107B (cytoskeletal reorganization in neural cells and cell migration/expansion)37. The genomic locus mapped to IQCJ-SHIP1 had a positive effect for total area (rs11717303, effect allele: C, effect allele frequency (EAF): 0.689, β = 4.28, s.e. = 0.51, p = 4.54 × 10−17). The same locus showed a negative effect for a different SNP on total thickness (rs12632564, effect allele: T, EAF: 0.305, β = −0.042, s.e. = 0.006, p = 2.59 × 10−12). The strongest locus for total area (rs7561572, effect allele: A, EAF: 0.532, β = −4.13, s.e. = 0.46, p = 1.98 × 10−18) was positionally mapped to the STRN gene. The strongest locus for mean thickness (rs4150211, effect allele: A, EAF: 0.265, β = −0.05, s.e. = 0.006, p = 8.20 × 10−18) was mapped to the HBEGF gene.
Loci for area overlapped between parcellations in a rostral-caudal gradient (1–5), such that: rs1122688 on the SHTN1 (or KIAA1598) gene (involved in positive regulation of neuron migration) overlapped between the genu (1) and anterior body (2); rs1268163 near the FOXO3 gene (involved in IL-9 signaling and FOXO-mediated transcription) overlapped between the posterior body (3) and isthmus (4); and rs11717303 on the IQCJ-SCHIP1 gene overlapped between the isthmus (4) and splenium (5). This gradient pattern was not observed for mean thickness. The strongest regional association was observed with splenium area (rs10901814, effect allele: C, EAF: 0.584, β = −1.69, s.e. = 0.16 p = 2.02 × 10−24) and thickness (rs11245344, effect allele: T, EAF: 0.570, β = −0.11, s.e. = 0.11, p = 6.28 × 10−22), both on the FAM53B gene. FAM53B is involved in the positive regulation of the canonical Wnt signaling pathway. We observed overlaps in the direction and magnitude of effects between the main European analyses with the results from the non-European participants. 124 (82%) out of the 152 significant loci identified across CC phenotypes in this study had effect sizes in European participants falling within the 95% confidence interval of those seen in the non-European participants. Furthermore, 78 loci demonstrated a consistent direction of effect across all four cohorts (2 European from UKB and ABCD, and 2 non-European from UK Biobank and ABCD, respectively). Detailed annotations and regional association plots of all genomic loci, independent significant SNPs and genes are in Supplementary Data 1–4 and Supplementary Data 42.
In order to test for the influence of total intracranial volume (ICV) on the GWAS, another set of GWASes were run controlling for ICV (Methods, Supplementary Data 37). The overlap coefficients and genetic correlations, respectively, were 0.64 and 0.75 (s.e.= 0.03) for total area and 1 and 1.02 (s.e. = 0.008) for total thickness, indicating a high degree of overlap for both analyses. Near perfect genetic correlations were observed comparing across all CC traits, except splenium area (rg = 0.08, s.e. = 0.05), which may be driven by collider bias38,39 (Supplementary Data 38). Overall, the strongest enrichment signals were observed among genes that were significant in both the ICV-adjusted and non-ICV GWAS, as well as genes uniquely significant in the non-ICV GWAS - compared to the GWAS that included ICV as a covariate alone. Overall, greater enrichment was observed with genes common between ICV and specific to no ICV, compared to the GWAS which included ICV as a covariate. Genes mapped to significant loci common to both GWAS sets, were consistently mapped to canonical signaling pathways, including PI3K/AKT, PDGFR, and estrogen signaling, suggesting ICV-insensitive mechanisms involved in CC development and maintenance. Genes unique to the no ICV GWAS showed enrichment for mitochondrial respiration, oxidative stress response, and integrin signaling. Genes specific to the ICV-controlled GWAS exhibited much lower enrichment. Regionally specific enrichments were observed for area traits, including WDR5-mediated epigenetic regulation in the genu and apoptosis, as implicated by previous analyses, in the isthmus (Supplementary Data 39).
SNP heritability and genetic correlation between cohorts
Moderate to high genetic correlations were seen across CC phenotypes between cohorts using LDSC, with rg ranging from 0.54 (s.e. = 0.27) and 0.92 (s.e. = 0.63) for area metrics, and 0.30 (s.e. = 0.16) and 0.99 (s.e. = 0.69) for thickness metrics. To complement the LDSC approach with an approach using individual level data, we used the bivariate GREML in GCTA40. Moderate genetic correlations between cohorts were seen using bivariate GCTA with rg ranging between 0.40 (s.e. = 0.04) and 0.49 (s.e. = 0.03) across all traits. Age-related variability in white matter likely contributes to some of the lower correlation agreements between cohorts, as white matter volume tends to increase through childhood and adolescence, peak in early adulthood, and then gradually decline from middle age onward41. For instance, certain genetic variants might exert a stronger influence on CC structure during periods of white matter growth (as in the younger ABCD cohort) compared to periods of white matter decline (as in the older UKB cohort). The smaller sample size of the ABCD cohort may limit LDSC’s ability to detect polygenic effects, capturing primarily the strongest genetic signals42. However, strong cross-cohort correlations for total area and isthmus thickness phenotypes suggest that genetic variants affecting these traits are likely consistent across developmental stages43,44 to estimate SNP heritability (h2SNP) and generic correlations between each cohort. Within the UKB, heritability values ranged for different CC phenotypes from 0.42 to 0.71, with similar results seen in the ABCD cohort (Supplementary Data 5–8). Total area (UKB h2SNP = 0.72, s.e. = 0.01; ABCD h2SNP = 0.74, s.e. = 0.03) and mean thickness (UKB h2SNP = 0.61, s.e. = 0.02; ABCD h2SNP = 0.78, s.e. = 0.02) showed the highest h2SNP across both cohorts. LDSC27 h2SNP estimates from the meta-analysis ranged between 0.10 (s.e. = 0.01) and 0.18 (s.e. = 0.05) for area, and 0.12 (s.e. = 0.01) and 0.16 (s.e. = 0.02) for thickness, with the area of the genu showing the highest, and area of the splenium showing the lowest h2SNP estimates. As shown in Supplementary Data 5–8, all LDSC and GCTA rG estimates between meta-analyzed CC phenotypes were significant.
Gene-mapping and gene-set enrichment analyses
Gene-based association analysis in MAGMA45 identified 30 genes for the total area, and 34 genes for total mean thickness of the CC, with 5 genes overlapping between area and thickness (IQCJ-SCHIP1, IQCJ, BPTF, PADI2, CHIC2). The strongest association seen with area was AC007382.1 and the strongest association with mean thickness was HBEGF (Fig. 2a). There were between 15 and 31 genes for area, and between 7 and 25 genes for thickness identified within regions of the CC. Notably, IQCJ, IQCJ-SCHIP1, and STRN overlapped for all parcellations of CC area. AC007382.1 overlapped for four out of five parcellations, and STRN and PARP10 overlapped for three out of five parcellations of CC thickness (Fig. 2b, Supplementary Data 1–4). Enrichment of SNP heritability in 53 functional categories for each trait was determined via LDSC46. The majority of enrichment and the strongest effects across parcellations of the CC were observed in categories related to gene regulation/transcription in chromatin (Fig. 3a, b).
a Miami plot for SNPs (top) and genes (bottom) based on MAGMA gene analysis for total area and total mean thickness. b Miami plot for SNPs (top) and genes (bottom) based on MAGMA gene analysis for area of thickness of the CC split by the Witelson parcellation scheme23. Results shown on the upper panels of (a) and (b) are from an inverse-weighted random-effects meta-analysis (DerSimonian-Laird method). Reported -log10(p-values) are two-sided. All loci are significant at the Bonferroni corrected, experiment-wide threshold of p < 6.13 × 10−9. Results shown on the lower panels of (a) and (b) are from the MAGMA gene-based analysis. Reported -log10(p-values) are two sided from the Z-statistic. All significant genes are shown at the Bonferroni corrected threshold of p < 2.74 × 10−6. Significant SNPs and genes are color-coded by CC traits. Created in part by using Biorender.com (agreement number PT28RS3SIJ).
a Significant enrichment of SNP heritability across 53 functional categories computed by LD Score regression for area (left) and mean thickness (right). Analyses were completed using the meta-analyzed GWAS summary statistics (N = 46,485). Data are presented as mean values +/− s.e. b Proportion of GWAS SNPs in each functional category from ANNOVAR across each CC phenotype. c Significant gene-sets across CC phenotypes computed via MAGMA gene-set analysis using the equivalent of a one-sided two-sample t-test at the Bonferroni corrected threshold of 3.23 × 10−6. GOBP Gene-ontology biological processes, GOCC Gene-Ontology Cellular Components.
Gene-set enrichment analyses were also completed in MAGMA (Fig. 3c). The strongest effects of significant gene sets included those involved in postsynaptic specialization for total CC area, including GO:009901 (postsynaptic specialization, intracellular component) and GO:009902 (postsynaptic density, intracellular component). A theme of signal transduction-related pathways was observed for the splenium area, including R-HSA-6785631 (ERBB2 regulates cell motility) and R-HSA-8857538 (PTK6 promotes HIF1A stabilization). Enrichment of the “CARM1 and regulation of the estrogen receptor” was found for the posterior body thickness and is implicated transcriptional regulation via histone modifications. Enrichment of GO:1904714 (regulation of chaperone-mediated autophagy) was found for the isthmus area, which is implicated in lysosomal-mediated protein degradation. All significant results across all CC phenotypes are in Supplementary Data 18.
Tissue-specific and cell-type-specific expression of corpus callosum associated genes
Gene-property enrichment analyses were completed in MAGMA with 54 tissue types from GTEx v8 and BrainSpan47,48, which includes 29 samples from individuals representing 29 different ages, as well as 11 general developmental stages. An enrichment of genes associated with isthmus thickness were expressed in the cerebellum (p(Bon) = 0.017). Area and thickness across parcellations of the CC showed an enrichment of expression of genes in the brain from early prenatal to late mid-prenatal developmental stages. An enrichment of expression of genes associated with area and thickness of the anterior body of the CC was observed in brain tissue prenatally, 9–24 weeks post conception. Enrichment of expression of genes associated with area of the genu was observed in brain tissue 19 weeks post conception. Enrichment of expression of genes associated with the total mean thickness of the CC was observed in brain tissue 19 weeks post conception. All results are shown in Supplementary Data 19–21. These results, along with the gene-sets involved in histone modifications, were supported by LDSC-SEG analyses using chromatin-based annotations from narrow peaks49, which showed a significant enrichment in the heritability by variants located in genes specifically expressed in DNase in the female fetal brain for total CC thickness (p(Bon) = 0.0105). Chromatin annotations showed a consistent and significant enrichment of splenium area and thickness-associated variants in histone marks of the fetal brain and neurospheres (Supplementary Data 25).
Using microarray data from 292 immune cell types, the area of the posterior body showed a significant enrichment in the heritability by variants located in genes specifically expressed in multiple types of myeloid cells (p(Bon) < 0.05), and the area of the isthmus showed enrichment in innate lymphocytes (p(Bon) = 0.047). This further validates the aforementioned significant locus on gene FOXO3, which overlapped between the posterior body and isthmus (Supplementary Data 26).
Cell-type-specific analyses were performed in FUMA using data from 13 single-cell RNA sequencing datasets from the human brain. This tests the relationship between cell-specific gene expression profiles and phenotype-gene associations50. Of the 12 phenotypes tested, only total CC thickness showed significant results after going through the 3-step process using conditional analyses to avoid bias from batch effects from multiple scRNA-seq datasets. The most significant association was seen with oligodendrocytes located in the middle temporal gyrus (MTG, p(Bon) = 0.001) from the Allen Human Brain Atlas (AHBA). Oligodendrocytes (p(Bon) = 0.03) and non-neuronal cells (p(Bon) = 0.03) located in the lateral geniculate nucleus (LGN) from the AHBA also showed significant associations but were collinear (Supplementary Data 22).
LAVA-TWAS analyses28,51 (Fig. 4) of expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) of protein-coding genes in 16 different brain, cell type, and whole blood tissues revealed the strongest eQTL associations of area and thickness with CROCC expression in whole blood for the isthmus (ρ = −0.53, p = 1.29 × 10−10). Other notable eQTL (Supplementary Data 29) findings included total CC area and isthmus area and thickness being positively associated with ATP13A2 expression in fibroblasts (ρ = 0.48, p = 1.58 × 10−7). The strongest sQTL association was a positive association observed with KANSL1 (cluster 11710) in fibroblasts for genu area (ρ = 0.83, p = 1.46 × 10−14), which was the tissue type where most observed associations occurred across CC phenotypes (Supplementary Data 30). Moreover, a negative association was observed in a KANSL1 (cluster 11707) in fibroblasts for the genu area (ρ = 0.82, p = 3.11 × 10−7). An sQTL in MFSD13A (cluster 7894) in the anterior cingulate showed very strong yet opposite associations for total CC thickness (ρ = 0.42, p = 1.12 × 10−13) and total CC area (ρ = −0.44, p = 2.98 × 10−11). Other notable findings across tissue types included CRHR1 in the cortex, nucleus accumbens, and putamen, as well as UGP2 in fibroblasts, whole blood, and the putamen. No significant results from LAVA-TWAS gene-set enrichment analyses were observed after Bonferroni correction (Supplementary Data 31–32).
Results of local genetic correlations between CC traits and eQTLs and sQTLs from GTEx v8 using the LAVA-TWAS framework. All significant points are colored by tissue type and labeled by CC trait. Significance was tested as a two-sided t-test statistic. Significance thresholds for eQTLs (p < 2.01 × 10−6) and sQTLs (p < 5.45 × 10−7) were determined by Bonferroni correction. Associations between (a) CC area and eQTLs, (b) CC thickness and eQTLs, (c) CC area and sQTLs, and (d) CC thickness and sQTLs are shown via -log10p values scaled by the direction of association (y-axis) and chromosomal location (x-axis).
Genetic overlap of corpus callosum and cerebral cortex architecture
Broadly, we observed a pattern of negative genetic correlations with area and thickness of the CC with cortical thickness across regions of the cingulate cortex, but positive genetic correlations with regions’ cortical thickness across the neocortex (Fig. 5a). Specifically, we observed a significant negative genetic correlation between total area with cortical thickness of the rostral anterior cingulate (rg = −0.35, s.e. = 0.06) and posterior cingulate (rg = −0.28, s.e. = 0.06). Mean thickness had a negative genetic correlation with cortical thickness of the rostral anterior cingulate (rg = −0.29, s.e. = 0.06) and posterior cingulate (rg = −0.23, s.e. = 0.05). Positive genetic correlations were observed with cortical thickness of the lingual gyrus (rg = 0.26, s.e. = 0.05) and cuneus (rg = 0.27, s.e. = 0.06). When parcellating by the Witelson scheme23, negative genetic correlations were observed for area and mean thickness with cortical thickness of regions across the cortex and the cingulate, but positive genetic correlations with regions in the occipital lobe. We also observed a significant negative genetic correlation between total area of the CC with surface area of the precuneus (rg = −0.20, s.e. = 0.04). (Supplementary Data 9–10).
a Global genetic correlations (LDSC - rG) between CC phenotypes and cerebral cortex phenotypes. Significance was based on a two-sided Z-statistic. The Bonferroni significance threshold was set at p = 6.1 × 10−5. Surface area and cortical thickness of significant cortical regions with each CC phenotype are displayed on brain plots. b Of the significant global genetic correlations, significant Mendelian randomization (GSMR) results are displayed, representing the effect of CC phenotypes on cortical phenotypes free of non-genetic confounders. Significance was determined using the two-sided t-statistic calculated within GSMR. The number of SNPs used in GSMR were N = 26, N = 18, and N = 10 for the precuneus, rostral anterior cingulate, and posterior cingulate respectively. Data are presented as beta values +/- s.e. c Chord plot displaying the number of significant bivariate local genetic correlations (LAVA) between CC and cortical phenotypes. Underlined numbers represent the total number of genes shared with that phenotype. d Volcano plots showing degree (-log10 p-values) and direction (rG) of local genetic correlations (LAVA) between cortical and CC phenotypes. Significance was tested as a two-sided t-test statistic. Colors represent cortical regions labeled on the chord plot in section C. Significant genes (Bonferroni significance threshold was set at p = 2.18 × 10−6) across all phenotypes are labeled.
Genetic correlations can reflect direct causation, pleiotropy, or genetic mediation. To explore potential causal relationships between CC phenotypes and morphometry of the cerebral cortex, we ran Generalized Summary-data-based Mendelian Randomization (GSMR) analyses52 directional effect of CC phenotypes on morphometry of the cerebral cortex, but not vice-versa. (Fig. 5b, Supplementary Data 14). There was a strong negative unidirectional effect of total CC area on the precuneus surface area (bxy = −0.50, s.e. = 0.13, p = 0.0002), implying a greater total area and thickness of the CC results in a lower surface area of the precuneus. There was also a negative unidirectional effect of total CC mean thickness and cortical thickness of the posterior cingulate (bxy = −0.02, s.e. = 0.008, p = 0.02), but not vice versa. When using the Witelson parcellation scheme, there was a strong negative unidirectional effect on the area of the genu on the cortical thickness of the rostral anterior cingulate (bxy = −0.001, s.e. = 0.0003, p = 0.003).
Local genetic correlations of area phenotypes of the CC and surface area of the cerebral cortex with LAVA28 showed many significant negative correlations in genes between the total area and posterior body and the precuneus SA along the 2p22.2 cytogenetic band (QPCT, PRKD3, SULT6B1, NDUFAF7, EIF2AK2, HEATR5B, GPATCH11, CEBPZ, CEBPZOS, CDC42EP3, STRN, VIT) (Fig. 5c, d). Negative genetic correlations between total CC area and caudal middle frontal gyrus SA in 5 genes along the 17q24.2 cytogenetic band (HELZ, PSMD12, PITPNC1, ARSG, BPTF) were also observed. Positive local genetic correlations along the 2p22.2 cytogenetic band were observed with the anterior body area and the surface area of the posterior cingulate (CDC42EP3, PRKD3), as well as the total area of the CC and precentral gyrus surface area (HEATR5B).
Many negative local genetic correlations were observed with mean thickness of the splenium and cortical thickness of the superior parietal gyrus (TEX36, EDRF1, UROS, BCCIP, DHX32) and the parahippocampal gyrus (ZNF879) along the 10q26.13–10q26.2 cytogenetic bands, while positive genetic correlations were observed with isthmus cingulate cortical thickness along the 10q26.13–10q26.2 cytogenetic bands (EDRF1, TEX36, UROS, BCCIP, DHX32, CTBP2, CPXM2, GPR26, ZRANB1, FAM53B).
The area of the posterior body showed a negative local genetic correlation with the cortical thickness of the pericalcarine gyrus (GPATCH11). The area of the isthmus showed positive local genetic correlations with the cortical thickness of the superior parietal gyrus (LRRC73), caudal middle frontal gyrus (GPATCH2L), and isthmus cingulate (PLPPR3, CFD, R3HDM4, PTBP1, ELANE, MED16, PALM) along the 19p13.3 cytogenetic band.
The mean thickness of the posterior body showed negative local genetic correlations with the surface area of the lingual gyrus (STC2, NKX2-5, 5q35.2) and pericalcarine gyrus (NKX2-5). Mean thickness of the isthmus showed negative local genetic correlations with the precuneus (EIF2AK2, GPATCH11, 2p22.2) and superior frontal gyrus (TBX19) surface area. Total mean thickness of the CC showed a positive genetic correlation with the surface area of the insula (PDZRN3). The mean thickness of the anterior body showed positive local genetic correlations with surface area of the superior parietal gyrus (RETN, FCER2). Splenium mean thickness showed positive genetic correlations with inferior temporal gyrus surface area (ZNF318, CRIP3, SLC22A7) along the 6p21.1 cytogenetic band.
Genetic overlap of corpus callosum and associated neuropsychiatric phenotypes
We observed a significant genetic correlation (Fig. 6a, Supplementary Data 11) between total CC area and ADHD (rg = −0.11, s.e. = 0.03), bipolar disorder (BD, rg = −0.10, s.e. = 0.03), and bipolar I disorder (BD-I, rg = −0.10, s.e. = 0.03). Total mean thickness was genetically correlated with BD (rg = −0.10, s.e. = 0.03) and BD-I (rg = −0.10, s.e. = 0.03). When analyzing the regional Witelson parcellations23, the area of the genu was genetically correlated with ADHD risk (rg = −0.13, s.e. = 0.03), and the mean thickness of the splenium was genetically correlated with risk for BD (rg = −0.13, s.e. = 0.03) and BD-I (rg = −0.12, s.e. = 0.03).
a Global genetic correlations between CC traits and neuropsychiatric phenotypes. Significance was based on a two-sided Z-statistic. Significant results are designated by the * at the Bonferroni significance threshold of p = 0.0015. Significant negative genetic correlations are observed between total and splenium thickness, and bipolar disorder (I). Significant negative genetic correlations are also observed with CC area phenotypes and ADHD. Of the significant global genetic correlations, significant Mendelian randomization (GSMR) results are displayed, representing the effect of CC phenotypes on neuropsychiatric phenotypes free of non-genetic confounders. The number of SNPs used in the GSMR analysis were N = 29 for BD on total mean thickness, N = 26 for BD-I on total mean thickness, N = 11 for total mean thickness on BD, N = 25 on BD-I on splenium mean thickness, and N = 11 on total mean thickness on BD-I. Data are presented as beta values +/− s.e. b Volcano plots showing degree (-log10 p-values) and direction (rG) of local genetic correlations (LAVA) between neuropsychiatric and CC phenotypes. Significant local negative genetic correlations on the STH, KANSL1, SPPL2C, CRHR1, and MAPT genes are observed between the genu area and neuroticism. Significant local positive genetic correlations on the MAPT, SPPL2C, STH, CRHR1, ARHGAP27, KANSL1, PLEKHM1, MAP3K14, and DCAKD genes are observed between the genu area and Parkinson’s disease. A significant positive local genetic correlation is observed on the CEP170 gene between total mean thickness and Parkinson’s disease. Significance was tested as a two-sided t-test statistic. Phenotypes with significant associations are colored (IQ and bipolar II disorder). Significant genes (Bonferroni significance threshold was set at p = 2.79 × 10−6) across all neuropsychiatric phenotypes are shown. AD Alzheimer’s disease, ADHD attention deficit hyperactivity disorder, ASD autism spectrum disorder, BD bipolar disorder, BD-I bipolar I disorder, BD-II bipolar II disorder, COPC chronic overlapping pain conditions, IQ intelligence quotient, OCD obsessive-compulsive disorder, PD Parkinson’s disease, PTSD post-traumatic stress disorder, SCZ schizophrenia.
GSMR analyses showed causal bidirectionality of genetic liability of BD (bxy = −0.06, s.e. = 0.02, p = 0.006) and BD-I (bxy = −0.05, s.e. = 0.02, p = 0.003) on total mean thickness of the CC, and mean thickness of the CC on BD (bxy = −0.19, s.e. = 0.08, p = 0.01) and BD-I (bxy = −0.23, s.e. = 0.09, p = 0.02). When using the Witelson parcellation23, GSMR analyses showed causal directionality of genetic liability of BD-I on mean thickness of the splenium (bxy = −0.09, s.e. = 0.04, p = 0.01), but not vice versa (Fig. 6a, Supplementary Data 15).
Local genetic correlations with LAVA28 (Fig. 6b, Supplementary Data 17) showed 5 negative local genetic correlations between area of the genu and neuroticism on the 17q21.31 cytogenetic band (STH, KANSL1, SPPL2C, CRHR1, and MAPT), and 9 positive local genetic correlations between anterior body area and Parkinson’s disease (PD) on the 17q21.31 cytogenetic band (MAPT, SPPL2C, STH, CRHR1, ARHGAP27, KANSL1, PLEKHM1, MAP3K14 and DCAKD). One positive local genetic correlation was also observed between total mean thickness of the CC and PD (CEP170).
Discussion
We conducted a GWAS meta-analysis of CC area and thickness across two cohorts with vastly different age ranges, leveraging our artificial intelligence-based tool, SMACC, to extract detailed CC phenotypes from 46,685 individuals across the UKB and ABCD studies. While prior research into the genetic basis of CC structure and development has primarily relied on candidate gene approaches in animal models and post-mortem human studies, our work addresses the notable differences between the human CC and its counterparts in animal models6. This study offers genome-wide insights into the genetic architecture of the human CC in vivo, significantly advancing our understanding of its development and variation. Previous GWAS efforts focused on total and parcellated CC volume using FreeSurfer-derived measures53 solely in the UKB cohort16,17. Here, we expand on the well-replicated finding that area and thickness measures of neuroimaging phenotypes have distinct genetic influences25. Our findings strongly support this distinction, as our meta-analysis revealed zero overlapping significant loci between the area and thickness phenotypes of the CC. This underscores the value of separating these metrics to identify unique genetic contributions to CC morphometry. Furthermore, while previous studies have reported genetic correlations between CC FA and volume with neuropsychiatric traits such as bipolar disorder and ADHD16,17, our investigation extends these findings by exploring the specific genetic influences on CC area and thickness. This approach enables a deeper understanding of the mechanistic underpinnings behind these associations. Notably, we identified localized, distinct genetic relationships between CC morphometry and traits like neuroticism and PD - associations that had not been previously reported.
To better understand the stability of genetic influences on CC morphometry across development, we estimated the genetic correlation of each trait between the UKB (adult) and ABCD (adolescent) cohorts. While initial LDSC-based estimates yielded low/imprecise genetic correlations - likely due to low sample size in ABCD - we complemented this with bivariate GREML using individual-level data via GCTA40. GCTA provides more precise estimates by modeling genetic covariance between traits directly and is more robust to smaller sample sizes54. We observed consistent and significant genetic correlations across all CC traits (rg ≈ 0.40–0.49, Supplementary Data 6), suggesting a partially shared genetic architecture of the CC across the lifespan. While some genetic effects are stable from childhood into adulthood, others may be population specific. The differences could reflect changes in gene expression, shifts in variant effect sizes, or age-dependent neurobiological processes (e.g., neural migration in early life vs. myelination in adulthood)55. Environmental and cohort-specific influences are likely to modulate these effects. Future GWASes using a larger sample size of individuals across the lifespan will be crucial to dissect developmental effects and genetic pathways of CC morphometry.
We show that the genetic architecture of the CC is highly polygenic, and specific genetic variants influence CC subregions along a rostral-caudal gradient. Five loci that were positionally mapped to genes were identified to influence both total area and mean thickness of the CC (IQCJ-SHIP1, FIP1L1, HBEGF, CDKN2B-AS1, and FAM107B). IQCJ-SHIP1 had the strongest effect across total area and mean thickness, implicating mechanisms such as conduction of action potentials in myelinated cells via organizing molecular complexes at the nodes of Ranvier and axon initial segments, calcium-mediated responses, as well as axon outgrowth and guidance56. The strongest locus for total area was mapped to the STRN gene. STRN has been heavily implicated in the Wnt signaling pathway, which controls the expression of genes that are essential for cell proliferation, survival, differentiation, and migration via transcription factors57,58,59. The HBEGF gene was the strongest locus for total mean thickness, implicating mechanisms in early development. HBEGF expression is localized in the ventricular zone and cortical layers during development60, and has been implicated in regulating cell migration via chemoattractive mechanisms60. Significant enrichment of heritability of total mean thickness in various histone marks from chromatin data (ATAC-seq) of the fetal brain and cortex derived primary cultured neurospheres, significant tissue expression in the brain 19-weeks post conception, as well as enrichment of gene sets involving regulation of histone modification, suggests genetic variants in regions of open chromatin and transcriptional activity regulation in early development are key mechanisms underlying CC morphometry. When histones are acetylated, they become more negatively charged. This negative charge repels the negatively charged DNA, causing the DNA to be “pushed away” from the histones. This loosening of the DNA-histone complex makes it easier for transcription factors to access the DNA and initiate transcription61.
Parcellation of the CC into the five regions defined by the Witelson scheme allowed for further refinement and genetic understanding of its morphometry in a rostral-caudal gradient. Our results provide insight as to which molecular mechanisms influence this functionally defined gradient (i.e., prefrontal, premotor/supplementary motor, primary motor, primary sensory, and parietal/temporal/occipital)24. An overlap of genetic loci along the most anterior (genu and anterior body, SHTN1) and most posterior (isthmus and splenium, IQCJ-SCHIP1) regions of the CC, along with splenium heritability enrichment of histone chromatin marks of the fetal brain and dorsolateral prefrontal cortex, implicates regulation of neuron migration and action potential conduction. But the overlap of the FOXO3 along the area of the posterior body and isthmus implicates IL-9 signaling and FOXO-mediated transcription responsible for triggering apoptosis62. Only the posterior body and isthmus showed heritability enrichment in immune cells, including myeloid cells and innate lymphocytes. The thinning of the CC (along the posterior body and isthmus) occurs in a functional gradient connecting the somatosensory and parietal association areas of the brain6,63,64. Such thinning of the posterior body and isthmus aligns with activity dependent pruning by functional area6, where somatosensory circuits are pruned in early development in an experience dependent context65. As immune cells are increasingly being recognized as key players in brain maturation and neurodevelopment66, our results suggest IL-9 mediating a neuroprotective effect in the CC during the cell dieback phase66,67, and may play a significant role in posterior CC morphometry. LAVA-TWAS results showed another potential mechanism of isthmus pruning via expression of ATP13A2 in fibroblasts, and splicing of genes involved in NF-κB signaling68. ATP13A2 is involved in lysosomal-mediated apoptosis69, suggesting such regulation of fibroblast mediated growth of callosal projections70. This hypothesis is also supported by the current discovery of enrichment of genes related to isthmus area in the “regulation of chaperone mediated autophagy pathway”, which may influence isthmus morphometry.
The topographic organization of the CC correlates with the homotopic bilateral regions of the cortex it is known to connect5. A variety of genetically regulated principal mechanisms influence CC neuronal and glial proliferation, neuronal migration and specification, midline patterning, axonal growth and guidance, and post-guidance refinement to homotopic analogs in the cortex71,72. Our results suggest potential genetic mechanisms contributing to callosal-cortical organization. We show an overall negative global genetic correlation of CC phenotypes with the cortical thickness of the cingulate and surface area of the posterior parietal cortices, including a unidirectional negative effect of genu area on rostral anterior cingulate thickness, and total area on precuneus surface area free of any non-genetic confounders. Positive global genetic correlations of total CC area and splenium thickness with cortical thickness in the occipital cortex were also observed. Local genetic correlations of the CC were observed throughout the cerebral cortex, most pronounced with total CC area and splenium thickness. Notable findings included numerous genes in the chr2p22 cytogenetic band showing negative correlations between total CC and posterior body area with precuneus surface area, including the significant STRN gene observed across all CC phenotypes, further implicating the Wnt signaling pathway and dendritic calcium signaling in the context of neurodevelopment73,74. Within this cytogenetic band, HEATR5B was also positively genetically associated with precentral gyrus surface area. Opposing genetic effects were observed between splenium thickness with isthmus cingulate thickness (i.e. positive) vs. superior parietal cortex thickness (i.e. negative) in genes in the chr10q26.13 cytogenetic band. Previous clinical neuropsychiatric conditions associated with copy number variations of chr10q26.13 include abnormal cranium development, global developmental delay and learning difficulties, and neurodevelopmental manifestations including ADHD or autistic behaviors75,76,77. This finding provides a novel testable hypothesis for functional follow up studies, as alterations in the isthmus cingulate and superior parietal cortex have been observed in large-scale studies of various neurodevelopmental disorders78. Positive genetic associations in the chr19p13.3 cytogenetic band were observed between the isthmus area and isthmus cingulate cortical thickness, which has been implicated with microcephaly, ventriculomegaly and developmental delay79,80.
Our results demonstrate opposing genetic relationships between CC phenotypes (area and thickness of the entire CC and it’s subregions) and thickness of the cingulate cortex (negative) vs the neocortex (positive), which suggests a strong genetic component underlying the development of the CC via pioneer axons and chemotaxis. Coupled with the observed negative phenotypic correlations (Supplementary Data 13), this suggests that the relationship between the CC and the thickness of the cingulate cortex (but not surface area) is influenced by distinct genetic mechanisms that govern their development. Developmentally, pioneer axons emerge in the cingulate and project their axons across the midline using guidance cues. A large portion of these callosal projections are pruned and myelinated in an activity-dependent manner, such that axonal remodeling is highly dependent on correlated neural activity in the cortex6,81,82,83. The strongest local genetic correlation supporting this finding was observed between the total mean thickness of the CC and the rostral anterior cingulate thickness on TGIF1. As TGIF1 is implicated in holoprosencephaly (i.e. where the brain fails to develop two hemispheres), forebrain development via alterations in the Sonic Hedgehog (SHH) pathway, and disruption of axonal guidance via chemoattractive mechanisms84,85, these results provide a potential genetic localization for functional follow-up. The isthmus cingulate, in relation to the isthmus and splenium, was the only cingulate region showing positive local genetic correlations, providing further evidence of distinct molecular mechanisms (e.g. immune-mediated apoptosis and regulation of callosal projections) compared to the rest of the CC underlying its structure and development.
Abnormalities of the CC have also been associated with various neurological/neuropsychiatric disorders6. This study demonstrates a significant negative genetic relationship between the CC and ADHD (utilizing the latest ADHD GWAS findings)86, and also replicates previously observed negative genetic associations with bipolar disorder16,17. It is important to note, that prior studies focused on brain volume phenotypes, whereas the current study examines area and thickness, which are known to be influenced by different genetic factors25. The negative global genetic correlations observed in the CC area with ADHD and CC thickness with bipolar disorder indicate that the allelic differences resulting in a smaller CC area and thickness are partly shared with those resulting in a greater risk for ADHD and bipolar disorder, respectively. Further evidence of the negative genetic relationship between ADHD and CC area is provided by studies that show the CC is smaller in individuals with ADHD across various ages87,88,89, suggesting that impaired inter-hemispheric communication between sensorimotor and attentional systems may contribute to symptoms of hyperactivity, impulsivity, and inattention. Our results also provide a credence to future studies investigating the genetic relationship between the CC and bipolar disorder, as differences in the CC in bipolar disorder have been well established13,90,91. Negative local genetic correlations on the 17q21.31 cytogenetic band between genu area and neuroticism implicated the closely located CRHR1 and KANSL1 genes, which were also highly significant genes observed with genu area and splicing QTLs (sQTLs) in various cortical and subcortical tissue types in the TWAS analysis. Neuroticism, a construct historically describing a cluster of negative emotions, thoughts, and behaviors under the umbrella of “negative affect,” is increasingly recognized as a significant predictor of susceptibility to stress-related psychiatric disorders, including anxiety and depression92. CRHR1 encodes the corticotropin-releasing hormone receptor 1, a receptor widely expressed in the cortex and central nervous system that mediates the effects of corticotropin-releasing factor (CRF). The CRF system plays a central role in orchestrating the body’s stress response through the hypothalamic-pituitary-adrenal (HPA) axis and autonomic nervous system. Dysregulation of this system has been extensively linked to the pathophysiology of stress-related anxiety and mood disorders, with neuroticism often serving as a measurable intermediate phenotype92. CRHR1 polymorphisms have been associated with differential responses to stress and heightened susceptibility to psychiatric disorders92, potentially mediated via altered connectivity between the prefrontal cortices via the genu and altered RNA splicing of the CRHR1 gene inside cortical tissue.
Moreover, all of the local genetic correlations on the 17q21.31 cytogenetic band observed between the genu area and neuroticism were also observed with PD, but in the positive direction, suggesting heightened risk. The strongest association was with the microtubule-associated protein tau (MAPT) gene. The same locus was also a highly significant sQTL with genu area in the caudate nucleus in the TWAS analysis. This sQTL falls within exon 7 of MAPT, part of the proline-rich domain of the tau protein93. This region is known to regulate microtubule binding and aggregation, with the 6p and 6d MAPT isoforms resulting in altered tau assembly and reduced aggregation93,94. Multiple independent studies have linked MAPT to PD risk95,96,97,98, and our findings suggest that these splicing events may influence the structural morphology of the genu of the corpus callosum via effects on the caudate nucleus, a region heavily implicated in PD pathology99,100. The genu, which facilitates interhemispheric communication between the prefrontal cortices6, shows longitudinal progressive structural decline in PD, with its degeneration correlating with worsening akinetic-rigid motor symptoms101. Lower genu volume has also been reported in PD patients, particularly those with cognitive impairment, compared to cognitively intact individuals102. The loss of dopaminergic neurons in the substantia nigra and the broader nigrostriatal pathway, including within the caudate nucleus, is a hallmark of PD and is more pronounced in patients with mild cognitive impairment103. This finding suggests that altered splicing of MAPT in the caudate nucleus may contribute to increased risk for Parkinson’s disease with cognitive impairment through shared genetic effects on the morphometry of the genu, a region critical for facilitating higher-order cognitive functioning.
Several factors can contribute to the lack of global and local genetic correlations of the CC with other tested traits. Our primary explanation is that the CC does not serve a core biological mechanism underlying the pathology of many of the tested traits. Methodological limitations in LDSC and LAVA may also account for a lack of significant results. LDSC requires GWAS summary statistics with a large sample size or a high chi-squared test-statistic. Discordant effects across the genome can also result in null findings. The LAVA workflow uses common SNPs among all the traits’ GWAS summary statistics tested, which can reduce the number of variants used in the analysis. This reduction can decrease the power to detect heritability at each locus, and not meet the stringent multiple comparisons threshold in our analysis (p < 0.05/18380 = 2.72 × 10−6) for a bivariate correlation at each locus to be tested. Moreover, the strict Bonferroni threshold correcting for every test across every trait (0.05/17909 = 2.79 × 10−6) can reduce the likelihood of finding significant bivariate associations. Phenotypic variability and underlying genetic heterogeneity within the tested traits may also contribute to the absence of significant findings. For instance, complex conditions such as Alzheimer’s disease104,105 and chronic pain106 are increasingly recognized as comprising multiple subtypes with distinct biological and clinical profiles. Consequently, stratified GWAS analyses, tailored to these subtypes, may be necessary to uncover genetic correlations with our specific neuroanatomical traits.
Functional enrichment analysis across CC area, thickness, and volume (from Chen et al.17 and Campbell et al.16 papers) revealed both shared and trait-specific biological signatures (Supplementary Data 40–41). Consistent across all modalities and subregions was strong enrichment of growth factor signaling, particularly PDGFRA/B-driven, PI3K/AKT-related pathways, and RAS-MAP kinase cascade pathways107. These pathways were significant using the results with and without ICV as a covariate. Growth factor receptor signaling pathways are important intracellular pathways for cellular survival, growth, and proliferation through various mechanisms, which require further study in the current context108,109,110. Area measures showed the greatest number of significant associations and the strongest enrichment of these pathways. However, thickness measures, especially in the isthmus, showed the strongest enrichment of these pathways across all morphometry measures and CC subregions. The thickness of isthmus was also the only measure which showed enrichment of the axonemal basal plate, which is crucial for the formation of cilia and flagella, ensuring proper motility of cells111. Volume measures, especially from the Campbell et al.16 results, highlighted neuronal morphogenesis and guidance pathways - especially in the splenium. These findings suggest that while a core set of signaling pathways influence CC morphology broadly, distinct modalities may reflect different aspects of structural development and cellular regulation.
In summary, this work identifies genome-wide significant loci of morphometry of the overall CC and its sectors, convergence on biological functions with a particular importance of apoptosis and pruning during development, tissues and cell types, as well as the genetic overlap with the cerebral cortex and neuropsychiatric conditions.
Methods
Artificial intelligence corpus callosum extraction and segmentation with SMACC
We developed a UNet-based automated segmentation tool that segments mid CC in multiple modalities like T1w, T2, and FLAIR, assesses the quality of the segmentation using machine learning methods on the meaningful metrics extracted from the segmentation, and is generalizable to data from various scanners and sites. To our knowledge, there has been no published integrated pipelines for mid CC extraction with quality control in multiple MR modalities. Existing deep learning methods, like DeepnCCA, have been trained to segment mid CC but only work on T2w images and have been trained on data from one scanner only, so they might not be generalizable to data from other scanners. Other existing methods like FreeSurfer (which was used in previous CC GWAS studies)16,17, FastSurfer and a few UNet based methods112 segment CC but do not assess the quality of the segmentations.
Data preprocessing
All UKB participants completed a 31-min neuroimaging protocol using a Siemens Skyra 3 Tesla scanner and a 32-channel head coil in one of three MRI scanning locations. All 3D structural T1-weighted brain scans were acquired using the following parameters: 3D MPRAGE, sagittal orientation, in-plane acceleration factor = 2, TI/TR = 880/2000 ms, voxel resolution = 1 × 1 × 1 mm, acquisition matrix = 208 × 256 × 256 mm. All scans were pre-scan normalized using an on-scanner bias correction filter. More details of the imaging protocols may be found in the following reference papers113,114.
All ABCD participants completed a neuroimaging protocol in one of three scanner types at 21 different sites115. The Siemens Prisma had the following parameters for the T1-weighted scans: TI/TR = 1060/2500 ms, TE = 2.88 ms, voxel resolution = 1 × 1 × 1 mm, acquisition matrix = 176 × 256 × 256, flip angle = 8 degrees. The Philips Achieva Ingenia had a TI/TR = 1060/6.31 ms, voxel resolution = 1 × 1 × 1 mm, acquisition matrix = 225 × 256 × 256 mm and a flip angle = 8 degrees. The GE MR750 had a TI/TR = 1060/2500 ms, TE = 2 ms, voxel resolution = 1 × 1 × 1 mm, acquisition matrix = 208 × 256 × 256, and a flip angle = 8 degrees.
All T1w MRIs were registered to MNI152116,117,118 1 mm space with 6 degrees of freedom using FSL’s flirt119 command.
SMACC development and UNet training
Mid-sagittal T1w, T2w, and FLAIR images from UK Biobank21, PING120, HCP121, and ADNI1122 were used for training the UNet model for CC segmentation. Individual study scanner parameters can be found in their respective references. The demographic information for the datasets used to create the UNet model is shown in Supplementary Data 31. Augmentation of image data is a common procedure in deep learning to prevent model overfitting and improve model accuracy123. All the images were downsampled by a factor of 2, 3, 4 and 5 along the sagittal axis and then upsampled back to the original size using MRtrix’s mrgrid command to include low resolution images in the training124. To include lower resolution T1w images resembling older or clinical data in training, all the images were harmonized using a fully unsupervised deep-learning framework based on a generative adversarial network (GAN)125 to a subject from the ICBM dataset117. The original images in the training set already had 5–10 degrees rotation variation, so we rotated images in increments of 15 degrees to include more variety of head orientations. Black boxes were randomly added to the images to imitate partial agenesis cases. Supplementary Fig. 1 shows some T1w augmented images that were the input training images for the UNet model.
UNet implementation
A Tensorflow implementation of UNet126 was trained on 80% of the images for 250 epochs until the difference between the intersection over union (IOU) after consecutive iterations was less than 1 × 10−4. The U-Net architecture is structured with a contracting pathway and an expansive pathway. The contracting pathway repeatedly performs two 3 × 3 convolutions (without padding), with each convolution followed by a rectified linear unit (ReLU) and a 2 × 2 max pooling operation. At each stage in the expansive pathway, the feature map is upsampled, followed by a 2 × 2 convolution, which reduces the feature channels by half. Then, the corresponding cropped feature map from the contracting pathway is concatenated, and two 3 × 3 convolutions are applied, with each one followed by a ReLU. We used the following training parameters: 1 × 10−4 learning rate and an Adam optimizer127. The rest of the data was used for validation. The midsagittal CC (midCC) was initially segmented using image processing techniques128 on subjects from ADNI1 (N = 1032, 54–91 years), PING (N = 1178, 3–21 years), HCP (N = 963, 22–37 years) and UKB (N = 190, 45–81 years). These masks were then visually verified and manually edited by neuroanatomical experts which served as the ground truth. To evaluate the model, the area of overlap between the predicted segmentation and the ground truth was calculated.
CC shape metrics extracted with SMACC
SMACC provides outputs of global and regional shape metrics extracted from the CC segmentation, including area, thickness, length, perimeter and curvature. The regional shape metrics were based on a 5 compartment version of the Witelson atlas23,24. The Witelson atlas is composed of the (1) genu, (2) anterior midbody, (3) posterior midbody, (4) isthmus, and (5) splenium. The metrics used for the GWAS analysis were area and mean thickness of the total CC and all of the parcellations of the Witelson atlas. The thickness is defined as the distance in the inferior-superior direction between the top and bottom of the contour and at every point along the length of the segment, then averaged across the region of interest. The total area is the summation of the number of voxels with intensity value greater than 0.5 in the segmentation.
Corpus callosum segmentation quality control (QC) with SMACC
To ensure that segmentations were of appropriate quality without having to manually assess all output images, which eventually may scale to hundreds of thousands of scans, we included an automated quality control (QC) assessment into SMACC. The regional and global metrics were used as inputs to the machine learning models detailed below for automatic binary classification of segmentations as Pass or Fail. CC segmentations from SMACC were manually assessed across multiple datasets by neuroanatomical experts. This included data from UKB (N = 12,902, aged 45–81 years), ADNI1 (N = 724, aged 54–91 years), PING (N = 857, 3–21 years) and HCP (N = 615, 22–37 years), all of which served as the ground truth for QC model building. All data was split 80/20 for training/testing.
Figure 7 gives the overview and the flow of SMACC. Several architectures, including a 3-layer sequential neural network with 42 neurons, 22 in the second layer, and 11 in the third layer; a wide & deep neural network with 80 neurons in the first 3 layers and 40 in the last 3 layers, an XGBoost classifier, and an ensemble model were tested to classify the segmentations from the UNet as pass or fail. The ensemble model consisted of XGBoost, k-nearest neighbors (KNN), support vector classifier (SVC), logistic regression, and a random forest classifier. The results from all the classifiers in the ensemble model were combined using a majority voting classifier. All the models were compared using metrics including precision, recall, F1 score, and Area Under the Curve (AUC). Supplementary Data 34 shows the performance of different models based on the shape metrics extracted from the CC segmentations.
The midsagittal slice from a participant registered to MNI space with 6 degrees of freedom serves as an input to the UNet architecture used for the midsagittal CC segmentation. The Witelson atlas was used for segmenting the CC into five different regions. Global and subregion metrics (thickness and area-shown in green) were extracted from the segmentation. The thickness (black arrow) is defined as the distance in the inferior-superior direction between the top and bottom of the contour, after reorientation to standard space, at every point along the length of the segment, then averaged across the region of interest. These metrics serve as input for the ensemble machine learning model used for labeling CC segmentations as having passed or failed quality control (QC). MNI Montreal Neurological Institute, CC corpus callosum, ML Machine Learning, KNN K Nearest Neighbors, SVC Support Vector Classifier.
SMACC vs FreeSurfer
Comparing SMACC and FreeSurfer via Dice scores with respect to manual masks
For assessing the accuracy of the SMACC compared to the ground truth and compared to the commonly used tool FreeSurfer129, we ran the SMACC pipeline on 30 subjects from the Hangzhou Normal University (HNU) test-retest dataset130,131. Each subject in this dataset was scanned with a full brain T1w MRI 10 times within a period of 40 days, for a total of 300 scans. All 300 scans had also been manually segmented by a neuroanatomical expert to serve as the ground truth. Segmentations from SMACC and FreeSurfer v7.1 were compared to manual segmentations using the Dice overlap coefficient. The average Dice coefficient between automated CC masks from SMACC and ground truth segmentations was 0.94 across all scans. The average Dice score between FreeSurfer CC segmentations and manual masks was 0.82. The Dice score was consistently higher for all the subjects using SMACC. Supplementary Fig. 2 and Supplementary Data 35, show a few midCC segmentations obtained from SMACC compared to FreeSurfer.
ICC for SMACC
To assess test-retest reliability of SMACC the intraclass correlation (ICC) scores were calculated. Average ICC values for thickness and area of the Witelson parcellations and the total CC were greater than 0.9 and are shown in Supplementary Fig. 3.
Study cohorts
U.K. Biobank
The UK Biobank (UKB) is a large population level cohort study conducting longitudinal deep phenotyping of around 500,000 participants in the United Kingdom (UK) aged between 40–69 at recruitment. All participants provided informed consent to participate. The North West Centre for Research Ethics Committee (11/NW/0382) granted ethics approval for the UK Biobank study21. We used genotype data from UKB released in May 2018. The data was collected from 489,212 individuals, and 488,377 of those individuals passed quality control checks by UKB. The genotypes were then imputed using two reference panels: the Haplotype Reference Consortium (HRC) reference panel and a combined reference panel of the UK10K and 1000 Genomes projects Phase 3 (1000G) panels21. There were 8,422,770 SNPs following quality control (QC) of the data which included having a genotyping call rate (SNPs missing in individuals) of greater than 95%, removing variants with a minor allele frequency less than 0.01 (1%), removing variants with Hardy-Weinberg equilibrium p-values less than 1 × 10−6, and removing individuals with greater than three standard deviations away from the mean heterozygosity rate. To determine a conservative European ancestry in UKB, the ENIGMA MDS protocol (https://enigma.ini.usc.edu/protocols/genetics-protocols/) was completed using 10 components. The mean and standard deviations of the first and second genetic components of individuals who were classified as Utah residents with Northern and Western European ancestry from the CEPH collection (CEU) from the HapMap 3 release were then calculated. Individuals in UKB who were within a distance of 0.0101 on components 1 and 2 were classified as of European ancestry (Mean age: 64.73 ± 7.86 years, range 45.17–82.75; Sex: 21,717 females; N = 41,979). The MDS plot of individuals included in the analysis overlaid over the HapMap 3 population is available in Supplementary Fig. 4.
ABCD
The Adolescent Behavioral Cognitive Development (ABCD) study is the largest study in the United States (USA) following adolescent children starting from 9 years of age through adolescence with deep phenotyping including neuroimaging and genotyping using the Smokescreen™ Genotyping array consisting of over 300,000 SNPs115,132,133. Only neuroimaging from baseline were used. Following imputation using the ENIGMA protocol134 with the European 1000 Genomes Phase 3 Version 5 reference panel, phased using Eagle version 2.3135, and the QC process as described in the UKB cohort, a total of 5,683,360 SNPs were included from individuals of European ancestry (Mean age: 9.5 ± 0.5 years, range 8.0 - 11.0; Sex: 2086 females; N = 4706). To determine European ancestry in ABCD, the methods described for the UKB were completed. The MDS plot of individuals included in the analysis overlaid over the HapMap3 population is available in Supplementary Fig. 5.
We also analyzed non-European ancestry individuals to examine the generalization of the observed effects across ancestries. In order to accurately estimate non-European individuals using the HapMap3 reference panel, the KING software package136, which uses a well-validated MDS and support vector machine approach, was used to estimate ancestry composition in all individuals not used in the principal GWAS (Supplementary Data 36). While making sure no individuals classified as CEU, TSI, or a combination of both, were not included, we included 636 individuals from the UKB (Mean age: 62.1 ± 8.3 years, range 44.6–80.2; Sex: 286 females) and 4129 individuals from ABCD (Mean age: 9.5 ± 0.5 years, range 8.0–11.0; Sex: 2009 females).
GWAS meta-analysis of corpus callosum morphometry
Genome-wide association analysis (GWAS) for UKB and ABCD separately for all CC phenotypes were completed via a linear whole-genome ridge regression model using REGENIE, allowing for the control of genetic relatedness137. Covariates included age, sex, age*sex interaction, and the first 10 genetic principal components. A two-step REGENIE analysis was completed with the following parameters. For step 1, the entire dataset was used with a block size of 1000 and leave-one-out-chromosome validation137. Step 2 was completed with a threshold for minor allele count of 5, a block size of 1000, and otherwise default parameters.
A meta-analysis of GWAS summary statistics of all CC derived metrics in UKB and ABCD were conducted using METAL software and the random-metal extension29,30, based on the random-effects model. A random-effects model was chosen since the effect sizes of SNPs on the CC has the potential to be different between the UKB and ABCD cohorts due to age. White matter volume is known to increase through childhood and start decreasing in middle adulthood41, which may result in different genetic effect sizes being observed. We opted to conduct a meta-analysis instead of using a two-stage discovery-replication approach because Skol et al. have shown that this method is more powerful, despite using more stringent significance levels for multiple correction138, and is common practice in the literature86,98,139. Percent variance (R2) explained by each significant SNP was calculated using the approach described in Rietvield et al.140. The R2 of each variant j was calculated via:
where pj and qj are the minor and major allele frequencies, \(\hat{\beta }\) is the estimated effect of the variant within the meta-analysis and \(\hat{\sigma }\)2 is the estimated variance of the trait (for which we used the pooled variance of the trait across UKB and ABCD. In order to determine the number of independent traits, matrix spectral decomposition was computed using matSpD in R on the phenotypic correlations between CC traits using the method proposed by Li and Ji141,142. This resulted in 8.16 effective independent variables, and a significance threshold of p = 5 × 10−8/8.16 = 6.13 × 10−9. Meta-analyses were also completed for non-European individuals. To determine if the global measure of brain intracranial volume (ICV) would have an impact on the analysis, all GWAS were completed again using ICV as an additional covariate. All results comparing results without and with ICV as a covariate is shown in Supplementary Data 38.
To determine the degree of overlap of genes in the ICV vs no ICV analysis, the overlap coefficient, or Szymkiewicz–Simpson coefficient, was calculated143. In order to determine differences in enrichment in (1) gene ontology categories, (2) biological pathways, and (3) transcription factors, genes mapped to significant loci that were specific to the analysis with or without ICV as a covariate, or common to both, were separately entered into g:Profiler for a multi-gene list analysis144. All analyses were completed using the g:SCS threshold145, all gene ontology categories, all biological pathway categories, and the TRANSFAC database.
Heritability and genetic correlations within and between cohorts
To determine SNP heritability (h2SNP) tagged from SNPs used in the analysis, we used the GREML approach implemented in GCTA43,44, while adjusting for the same covariates as in the GWAS. The SNP heritability (h2SNP) from LDSC27, was also computed, which estimates heritability casually explained by common reference SNPs. Genetic correlations between the UKB and ABCD cohorts for area and thickness of each parcellation of the CC defined by the Witelson scheme, and total CC were completed using LDSC27. Between cohort heterogeneity of h2SNP should not be considered unusual, as the genetic influence observed on the CC has the potential to be different between the UKB and ABCD cohorts due to age - white matter volume is known to increase through childhood and start decreasing in middle adulthood41, as well as the smaller sample size in ABCD making it harder for LDSC to detect polygenic effects42. To complement these estimates and leverage the availability of individual-level data and greater statistical power, we also employed the bivariate GREML approach in GCTA (--reml-bivar), using the AI-REML algorithm while controlling for age, sex and age*sex, and testing for significance of the genetic correlation against the hypothesis that the genetic correlation is 0 (--reml-bivar-lrt-rg 0)40.
Gene-mapping and gene enrichment analyses
Genetic variants (SNPs) were mapped to genes using information about genomic position, expression quantitative trait loci (eQTL) information, and 3D chromatin interaction mapping as implemented in FUMA v1.5.2 with the experiment-wide significance threshold (p = 6.13 × 10−9)146. Pathway enrichment analyses using the results from the full meta-analyses with no pre-selection of genes via MAGMA v1.0845 gene-set analysis in FUMA. Genes located in the MHC region were excluded (hg19: chromosome 6: 26 Mb–34 Mb). There were 19,021 gene sets from MSigDB v7.0147 (Curated gene sets: 5500, GO terms: 9996), and 9 other data resources including KEGG, Reactome, and Biocarta (https://www.gsea-msigdb.org/gsea/msigdb/collection_details.jsp#C2). MAGMA uses gene-based P-values to identify genes that are more strongly associated with a phenotype than would be expected by chance. MAGMA then applies a competitive test to compare the association of genes in a gene set to the association of genes outside of the gene set. This allows MAGMA to identify gene sets that are enriched for association signals. MAGMA corrects for a number of confounding factors, such as gene length and size of the gene set, to ensure that the results are not due to chance. A gene-based association analysis (GWGAS) in MAGMA was completed using the full summary statistics for each trait from METAL. Corrections for multiple comparisons were completed using the Bonferroni approach.
To determine whether genes associated with CC morphometry cluster into biological functions, tissue types, or specific cell types, we used the full results of the meta-analyzed genome-wide association studies (GWAS) rather than prioritizing genes. Pathway analysis, as described above, was completed.
We performed gene-property and gene-set analysis using the MAGMA software on 54 tissue types from the GTEx v8 database and BrainSpan47,48, which includes 29 samples from individuals representing 29 different ages of brains, as well as 11 general developmental stages.
Single cell RNA-sequencing data sets used in the cell-type specific analyses included the human developmental and adult brain samples from the PsychENCODE consortium148, human brain samples of the middle temporal gyrus and lateral geniculate nucleus from the Allen Brain Atlas149, human brain samples using DroNc-seq150, two datasets of human prefrontal cortex brain samples across developmental stages which show per cell type average across different ages, and per cell type per age average expression151, two datasets of human brain samples with and without fetal tissue152, human brain samples from the temporal cortex153, and human samples from the ventral midbrain from 6–11 week old embryos154. A 3-step workflow is implemented in FUMA to determine the association between cell-type-specific expression and CC morphometry-gene association supported by multiple independent datasets, which has been extensively described50. All tests were corrected using the Bonferroni approach.
Partitioned heritability of meta-analysis results by cell and tissue type with LDSC
Partitioned heritability analysis was completed to estimate the amount of heritability explained by annotated regions of the genome46,49. We tested for enrichment of CC h2 of variants located in multiple tissues and cell types using the LDSC-SEG approach, with all analyses being corrected for the FDR49. Annotations indicating specific gene expression in multiple tissues/cell types from the Genotype-Tissue Expression (GTEx) project and Franke lab were downloaded from https://alkesgroup.broadinstitute.org/LDSCORE/LDSC_SEG_ldscores/. We also downloaded 489 tissue-specific chromatin-based annotations from narrow peaks for six epigenetic marks from the Roadmap Epigenomics and ENCODE projects155,156. These annotations were downloaded from the URL mentioned above. This would allow us to either verify or identify new findings from the gene expression analysis from an independent source using a different type of data. Finding new patterns of chromatin enrichment can help us to understand how genes are regulated. For example, if we find that a particular epigenetic mark is enriched in a region of the genome that is associated with a specific gene in a specific tissue type, this could suggest that the gene is regulated by that epigenetic mark in that specific tissue type. Gene expression data from the Immunological Genome (ImmGen) project157, which contains microarray data on 292 immune cell types from mice, was used to test immune cell-type-specific enrichments. Data was downloaded from the aforementioned link.
LAVA - TWAS
We used the LAVA-TWAS framework to investigate the relationship between CC traits and gene expression in brain tissues, fibroblasts, lymphocytes, and whole blood from the GTEx consortium (v8)158 in all protein coding genes, as it has ability to model the uncertainty of eQTL effects compared to other commonly used TWAS approaches, which have been shown to be prone to high type-I errors (false positives), and provides a directly interpretable effect size in the rG estimate51. Analyses were performed on all protein-coding genes (N = 18,380) between all CC phenotypes and eQTLs/sQTLs for each tissue. Genotype data from the European sample of the 1000 Genomes (phase 3) project159 was used to estimate SNP LD for LAVA. For each eQTL/sQTL that had a significant genetic signal for both the CC phenotype and cortical phenotype (univariate p-values less than 1 × 10−4), the local bivariate genetic correlation between the two was estimated and tested. All LAVA-TWAS results were corrected using the Bonferroni approach. Following TWAS, trait-specific enrichment analysis via a Fisher’s exact test of the top 1% of genes, to evaluate overrepresentation in 7246 MSigDB v6.2160 gene sets and gain insight into biological pathways, was conducted. Gene sets were subset such that they must have consisted of at least one of the top 1% of genes, to avoid testing gene-sets with no significantly associated genes. All enrichment testing for eQTLs and sQTLs was performed with Bonferroni correction for every test conducted across all CC phenotypes.
Global and local genetic correlations with cortical morphometry and mendelian randomization
The CC develops in such a manner that callosal projections are over-produced then refined during development. The majority of cortical projections are refined during postnatal stages and are under the influence of guidance cues6. As many genes are responsible for callosal axon guidance, we sought to investigate the genetic relationship between our derived CC traits and the genetic architecture of the human cerebral cortex6. We used LDSC to determine the global genetic correlation between area and thickness of the total and parcellated regions of the CC, and the GWAS summary statistics of each globally corrected region-of-interest of the cerebral cortex from the ENIGMA-3 GWAS161. We performed bi-directional Mendelian Randomization analyses to investigate if significant genetic correlations observed could be driven by genetic causal relationships between an exposure (e.g., area and thickness of different regions of the CC) and outcome (e.g., regional surface area & cortical thickness). Analyses were performed with summary statistics using GSMR52. GSMR includes an integrated HEIDI-outlier feature to detect and remove pleiotropic SNPs. Even if the small-effect pleiotropic SNPs persist, the estimate remains unaffected by pleiotropy52. Additionally, GSMR has been extensively shown to be robust to sample overlap between exposure and outcome variables162. All analyses were corrected using the Bonferroni approach. To capture potential local shared genetic effects across the genome, we ran LAVA28 for all protein-coding genes (N = 18,380) between all CC phenotypes and surface area and cortical thickness of regions in the ENIGMA3 GWAS. Genotype data from the European sample of the 1000 Genomes (phase 3) project159 was used to estimate SNP LD for LAVA. Sample overlap was estimated using the intercepts from bivariate LDSC and integrated into the analysis28,163. For each gene that had a significant genetic signal for both the CC phenotype and cortical phenotype (univariate p-values less than 1 × 10−4), the local bivariate genetic correlation between the two was estimated and tested. All results were corrected using the Bonferroni approach.
Global and local genetic correlations with neuropsychiatric conditions and mendelian randomization
Abnormalities of the corpus callosum (CC) have been widely implicated in various neurological and neuropsychiatric conditions. To investigate shared genetic architecture, we selected 22 traits for genetic correlation analyses based on their relevance to CC-related pathology and the availability of well-powered GWAS summary statistics, all of which have active ENIGMA working groups78. These traits include Alzheimer’s disease (AD)164, attention deficit-hyperactive disorder (ADHD)86, autism spectrum disorder (ASD)165, anorexia166, anxiety167, bipolar disorder168, chronic overlapping pain conditions169, depression170, epilepsy171, intelligence172, insomnia173, neuroticism139, obsessive-compulsive disorder (OCD)174, Parkinson’s disease (PD)175, post-traumatc stress disorder (PTSD)176, panic disorder177, schizophrenia178, substance abuse179, suicide attempt180, and Tourette’s syndrome181. We used linkage disequilibrium score regression (LDSC) to assess global genetic correlations between these traits and both the area and thickness of the total and parcellated CC. Mendelian randomization and local genetic correlation analyses were then conducted following the same analytic framework used for cortical brain phenotypes.
Cross study comparisons
There have been two recent GWASes looking at the volume of the corpus callosum and its subregions16,17. Building off the notion that area and thickness of brain phenotypes have distinct genetic influences25, we aimed to compare the genomic loci discovered in the present study with the previous volume GWASes. Although the previous studies used PLINK182 in the UKB, we used REGENIE137, which implements a mixed-model approach to account for potential kinship as 59.3% of individuals in the UKB are at least 5th degree relatives183,184. In order to determine distinct genetic loci associated with area vs thickness vs volume, and differing enriched biological pathways, significant genomic loci were obtained from Chen et al.17 and Campbell et al.16 SNPs indicating significant genomic loci from both studies were entered into the FUMA platform while using the Bonferroni corrected p-value of 5 × 10−8/11 = 4.55 × 10−9 for Chen et al., and the reported 9.6 × 10−9 from Campbell et al. The mapped genes from FUMA for each study were then entered into g:Profiler for a multi-gene-list analysis to determine common and distinct gene ontology categories, biological pathways, and transcription factors144. All analyses were completed using the g:SCS threshold145, all gene ontology categories, all biological pathway categories, and the TRANSFAC database.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
This work is a meta-analysis. The full meta-analytic summary statistics are available on the GWAS Catalog (https://www.ebi.ac.uk/gwas/) using the GCP ID GCP001419, and study accession numbers GCST90672009-GCST90672020.
Code availability
The code and model used to extract the CC and its metrics is available at https://github.com/USC-LoBeS/smacc/.
References
Fame, R. M., MacDonald, J. L. & Macklis, J. D. Development, specification, and diversity of callosal projection neurons. Trends Neurosci. 34, 41–50 (2011).
Fenlon, L. R. & Richards, L. J. Contralateral targeting of the corpus callosum in normal and pathological brain function. Trends Neurosci. 38, 264–272 (2015).
Paul, L. K. Developmental malformation of the corpus callosum: a review of typical callosal development and examples of developmental disorders with callosal involvement. J. Neurodev. Disord. 3, 3–27 (2011).
Brown, W. S., Jeeves, M. A., Dietrich, R. & Burnison, D. S. Bilateral field advantage and evoked potential interhemispheric transmission in commissurotomy and callosal agenesis. Neuropsychologia 37, 1165–1180 (1999).
Caminiti, R. et al. Diameter, length, speed, and conduction delay of callosal axons in macaque monkeys and humans: comparing data from histology and magnetic resonance imaging diffusion tractography. J. Neurosci. 33, 14501–14511 (2013).
De León Reyes, N. S., Bragg-Gonzalo, L. & Nieto, M. Development and plasticity of the corpus callosum. Development 147, dev189738 (2020).
Piras, F. et al. Corpus callosum morphology in major mental disorders: a magnetic resonance imaging study. Brain Commun. 3, fcab100 (2021).
Vermeulen, C. L., du Toit, P. J., Venter, G. & Human-Baron, R. A morphological study of the shape of the corpus callosum in normal, schizophrenic and bipolar patients. J. Anat. 242, 153–163 (2023).
Unterberger, I., Bauer, R., Walser, G. & Bauer, G. Corpus callosum and epilepsies. Seizure 37, 55–60 (2016).
Zhao, G. et al. A Comparative Multimodal Meta-analysis of Anisotropy and Volume Abnormalities in White Matter in People Suffering From Bipolar Disorder or Schizophrenia. Schizophr. Bull. 48, 69–79 (2022).
Zhou, L. et al. Alterations in white matter microarchitecture in adolescents and young adults with major depressive disorder: A voxel-based meta-analysis of diffusion tensor imaging. Psychiatry Res Neuroimaging 323, 111482 (2022).
Valenti, M. et al. Abnormal Structural and Functional Connectivity of the Corpus Callosum in Autism Spectrum Disorders: a Review. Rev. J. Autism Developmental Disord. 7, 46–62 (2020).
Videtta, G. et al. White matter modifications of corpus callosum in bipolar disorder: A DTI tractography review. J. Affect. Disord. 338, 220–227 (2023).
Scamvougeras, A., Kigar, D. L., Jones, D., Weinberger, D. R. & Witelson, S. F. Size of the human corpus callosum is genetically determined: an MRI study in mono and dizygotic twins. Neurosci. Lett. 338, 91–94 (2003).
Woldehawariat, G. et al. Corpus callosum size is highly heritable in humans, and may reflect distinct genetic influences on ventral and rostral regions. PLoS One 9, e99980 (2014).
Campbell, M. L. et al. Distributed genetic effects of the corpus callosum subregions suggest links to neuropsychiatric disorders and related traits. Acta Neuropsychiatr. 1–8 (2023).
Chen, S.-J. et al. The genetic architecture of the corpus callosum and its genetic overlap with common neuropsychiatric diseases. J. Affect. Disord. 335, 418–430 (2023).
Joshi, S. H. et al. Statistical shape analysis of the corpus callosum in Schizophrenia. Neuroimage 64, 547–559 (2013).
Luders, E., Thompson, P. M. & Toga, A. W. The development of the corpus callosum in the healthy human brain. J. Neurosci. 30, 10985–10990 (2010).
Gadewar, S. P. et al. A Comprehensive Corpus Callosum Segmentation Tool for Detecting Callosal Abnormalities and Genetic Associations from Multi Contrast MRIs. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2023, 1–4 (2023).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Volkow, N. D. et al. The conception of the ABCD study: From substance use to a broad NIH collaboration. Dev. Cogn. Neurosci. 32, 4–7 (2018).
Witelson, S. F. Hand and sex differences in the isthmus and genu of the human corpus callosum. A postmortem morphological study. Brain 112, 799–835 (1989).
Hofer, S. & Frahm, J. Topography of the human corpus callosum revisited—Comprehensive fiber tractography using diffusion tensor magnetic resonance imaging. Neuroimage 32, 989–994 (2006).
Panizzon, M. S. et al. Distinct Genetic Influences on Cortical Surface Area and Cortical Thickness. Cereb. Cortex 19, 2728–2735 (2009).
Winkler, A. M. et al. Cortical thickness or grey matter volume? The importance of selecting the phenotype for imaging genetics studies. Neuroimage 53, 1135–1146 (2010).
Bulik-Sullivan, B. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Werme, J., van der Sluis, S., Posthuma, D. & de Leeuw, C. A. An integrated framework for local genetic correlation analysis. Nat. Genet. 54, 274–282 (2022).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Hemani, G. Explodecomputer/random-Metal: Adding Random Effects Model. (Zenodo, 2022). https://doi.org/10.5281/ZENODO.6974695
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164 (2010).
Martin, P.-M. et al. Schwannomin-interacting protein-1 isoform IQCJ-SCHIP-1 is a late component of nodes of Ranvier and axon initial segments. J. Neurosci. 28, 6111–6117 (2008).
Kaufmann, I., Martin, G., Friedlein, A., Langen, H. & Keller, W. Human Fip1 is a subunit of CPSF that binds to U-rich RNA elements and stimulates poly(A) polymerase. EMBO J. 23, 616–626–626 (2004).
Oyagi, A. & Hara, H. Essential roles of heparin-binding epidermal growth factor-like growth factor in the brain. CNS Neurosci. Ther. 18, 803–810 (2012).
Song, C., Qi, Y., Zhang, J., Guo, C. & Yuan, C. CDKN2B-AS1: An indispensable long non-coding RNA in multiple diseases. Curr. Pharm. Des. 26, 5335–5346 (2020).
Kaltschmidt, B. & Kaltschmidt, C. NF-κB in the nervous system. Cold Spring Harb. Perspect. Biol. 1, a001271 (2009).
Nakajima, H. & Koizumi, K. Family with sequence similarity 107: A family of stress responsive small proteins with diverse functions in cancer and the nervous system (Review). Biomed. Rep. 2, 321–325 (2014).
Aschard, H., Vilhjálmsson, B. J., Joshi, A. D., Price, A. L. & Kraft, P. Adjusting for heritable covariates can bias effect estimates in genome-wide association studies. Am. J. Hum. Genet. 96, 329–339 (2015).
Day, F. R., Loh, P.-R., Scott, R. A., Ong, K. K. & Perry, J. R. B. A robust example of collider bias in a genetic association study. Am. J. Hum. Genet. 98, 392–393 (2016).
Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).
Bethlehem, R. A. I. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).
Rkwalters & Palmer, D. Nealelab/UKBB_ldsc: v2.0.0 (Round 2 GWAS Update). https://doi.org/10.5281/zenodo.7186871 (2022).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Miller, J. A. et al. Transcriptional landscape of the prenatal human brain. Nature 508, 199–206 (2014).
GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Watanabe, K., Umićević Mirkov, M., de Leeuw, C. A., van den Heuvel, M. P. & Posthuma, D. Genetic mapping of cell type specificity for complex traits. Nat. Commun. 10, 3222 (2019).
de Leeuw, C., Werme, J., Savage, J. E., Peyrot, W. J. & Posthuma, D. On the interpretation of transcriptome-wide association studies. PLoS Genet 19, e1010921 (2023).
Zhu, Z. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat. Commun. 9, 224 (2018).
Jenkinson, M., Beckmann, C. F., Behrens, T. E. J., Woolrich, M. W. & Smith, S. M. FSL. NeuroImage 62 782–790 (2012).
Ni, G. & Moser, G. Schizophrenia Working Group of the Psychiatric Genomics Consortium, Wray, N. R. & Lee, S. H. Estimation of genetic correlation via linkage disequilibrium score regression and genomic restricted maximum likelihood. Am. J. Hum. Genet. 102, 1185–1194 (2018).
Alex, A. M. et al. Genetic influences on the developing young brain and risk for neuropsychiatric disorders. Biol. Psychiatry 93, 905–920 (2023).
Papandréou, M.-J. et al. CK2-regulated schwannomin-interacting protein IQCJ-SCHIP-1 association with AnkG contributes to the maintenance of the axon initial segment. J. Neurochem. 134, 527–537 (2015).
Liu, J. et al. Wnt/β-catenin signalling: function, biological mechanisms, and therapeutic opportunities. Signal Transduct. Target Ther. 7, 3 (2022).
Munji, R. N., Choe, Y., Li, G., Siegenthaler, J. A. & Pleasure, S. J. Wnt signaling regulates neuronal differentiation of cortical intermediate progenitors. J. Neurosci. 31, 1676–1687 (2011).
Chenn, A. & Walsh, C. A. Regulation of cerebral cortical size by control of cell cycle exit in neural precursors. Science 297, 365–369 (2002).
Caric, D. et al. EGFRs mediate chemotactic migration in the developing telencephalon. Development 128, 4203–4216 (2001).
Zentner, G. E. & Henikoff, S. Regulation of nucleosome dynamics by histone modifications. Nat. Struct. Mol. Biol. 20, 259–266 (2013).
Huang, H. & Tindall, D. J. Dynamic FoxO transcription factors. J. Cell Sci. 120, 2479–2487 (2007).
Aboitiz, F., Scheibel, A. B., Fisher, R. S. & Zaidel, E. Fiber composition of the human corpus callosum. Brain Res 598, 143–153 (1992).
Aboitiz, F. & Montiel, J. One hundred million years of interhemispheric communication: the history of the corpus callosum. Braz. J. Med. Biol. Res. 36, 409–420 (2003).
Faust, T. E., Gunner, G. & Schafer, D. P. Mechanisms governing activity-dependent synaptic pruning in the developing mammalian CNS. Nat. Rev. Neurosci. 22, 657–673 (2021).
Zengeler, K. E. & Lukens, J. R. Innate immunity at the crossroads of healthy brain maturation and neurodevelopmental disorders. Nat. Rev. Immunol. 21, 454–468 (2021).
Renault, V. M. et al. FoxO3 regulates neural stem cell homeostasis. Cell Stem Cell 5, 527–539 (2009).
Liu, T., Zhang, L., Joo, D. & Sun, S.-C. NF-κB signaling in inflammation. Signal Transduct. Target. Ther. 2, 17023 (2017).
van Veen, S. et al. ATP13A2 deficiency disrupts lysosomal polyamine export. Nature 578, 419–424 (2020).
Smith, K. M. et al. Midline radial glia translocation and corpus callosum formation require FGF signaling. Nat. Neurosci. 9, 787–797 (2006).
Pânzaru, M.-C. et al. Genetic heterogeneity in corpus callosum agenesis. Front. Genet. 13, 958570 (2022).
Paul, L. K. et al. Agenesis of the corpus callosum: genetic, developmental and functional aspects of connectivity. Nat. Rev. Neurosci. 8, 287–299 (2007).
Castets, F. et al. A novel calmodulin-binding protein, belonging to the WD-repeat family, is localized in dendrites of a subset of CNS neurons. J. Cell Biol. 134, 1051–1062 (1996).
Bartoli, M., Monneron, A. & Ladant, D. Interaction of calmodulin with striatin, a WD-repeat protein present in neuronal dendritic spines. J. Biol. Chem. 273, 22248–22253 (1998).
Yatsenko, S. A. et al. Identification of critical regions for clinical features of distal 10q deletion syndrome. Clin. Genet. 76, 54–62 (2009).
Vera-Carbonell, A. et al. Clinical comparison of 10q26 overlapping deletions: delineating the critical region for urogenital anomalies. Am. J. Med. Genet. A 167A, 786–790 (2015).
Lin, S. et al. Chromosome 10q26 deletion syndrome: Two new cases and a review of the literature. Mol. Med. Rep. 14, 5134–5140 (2016).
Thompson, P. M. et al. ENIGMA and global neuroscience: A decade of large-scale studies of the brain in health and disease across more than 40 countries. Transl. Psychiatry 10, 100 (2020).
Palumbo, P. et al. Clinical and molecular characterization of a de novo 19p13.3 microdeletion. Mol. Cytogenet. 9, 40 (2016).
Swan, L. & Coman, D. Ocular Manifestations of a Novel Proximal 19p13.3 Microdeletion. Case Rep. Genet. 2018, 2492437 (2018).
Innocenti, G. M. & Price, D. J. Exuberance in the development of cortical networks. Nat. Rev. Neurosci. 6, 955–965 (2005).
Gavrish, M. et al. Molecular mechanisms of corpus callosum development: a four-step journey. Front. Neuroanat. 17, 1276325 (2023).
Edwards, T. J., Sherr, E. H., Barkovich, A. J. & Richards, L. J. Clinical, genetic and imaging findings identify new causes for corpus callosum development syndromes. Brain 137, 1579–1613 (2014).
Taniguchi, K. et al. Genetic and Molecular Analyses indicate independent effects of TGIFs on Nodal and Gli3 in neural tube patterning. Eur. J. Hum. Genet. 25, 208–215 (2017).
Okada, A. et al. Boc is a receptor for sonic hedgehog in the guidance of commissural axons. Nature 444, 369–373 (2006).
Demontis, D. et al. Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains. Nat. Genet. 55, 198–208 (2023).
Luders, E. et al. The inattentive and hyperactive brain: Significant links between corpus callosum features and ADHD symptoms in adulthood. Eur. Psychiatry 33, S197–S198 (2016).
Luders, E. et al. Associations between corpus callosum size and ADHD symptoms in older adults: The PATH through life study. Psychiatry Res Neuroimaging 256, 8–14 (2016).
Hutchinson, A. D., Mathias, J. L. & Banich, M. T. Corpus callosum morphology in children and adolescents with attention deficit hyperactivity disorder: a meta-analytic review. Neuropsychology 22, 341–349 (2008).
Sarrazin, S. et al. Corpus callosum area in patients with bipolar disorder with and without psychotic features: an international multicentre study. J. Psychiatry Neurosci. 40, 352–359 (2015).
Wang, F. et al. Abnormal corpus callosum integrity in bipolar disorder: a diffusion tensor imaging study. Biol. Psychiatry 64, 730–733 (2008).
Binder, E. B. & Nemeroff, C. B. The CRF system, stress, depression and anxiety-insights from human genetic studies. Mol. Psychiatry 15, 574–588 (2010).
Ruiz-Gabarre, D., Carnero-Espejo, A., Ávila, J. & García-Escudero, V. What’s in a gene? The outstanding diversity of MAPT. Cells 11, 840 (2022).
Strang, K. H., Golde, T. E. & Giasson, B. I. MAPT mutations, tauopathy, and mechanisms of neurodegeneration. Lab. Invest. 99, 912–928 (2019).
Desikan, R. S. et al. Genetic overlap between Alzheimer’s disease and Parkinson’s disease at the MAPT locus. Mol. Psychiatry 20, 1588–1595 (2015).
Rittman, T. et al. Regional expression of the MAPT gene is associated with loss of hubs in brain networks and cognitive impairment in Parkinson disease and progressive supranuclear palsy. Neurobiol. Aging 48, 153–160 (2016).
Pascale, E. et al. Genetic architecture of MAPT gene region in Parkinson disease subtypes. Front. Cell. Neurosci. 10, 96 (2016).
Kim, J. J. et al. Multi-ancestry genome-wide association meta-analysis of Parkinson’s disease. Nat. Genet. 56, 27–36 (2024).
Poewe, W. et al. Parkinson disease. Nat. Rev. Dis. Prim. 3, 17013 (2017).
Aarsland, D. et al. Parkinson disease-associated cognitive impairment. Nat. Rev. Dis. Prim. 7, 47 (2021).
Amandola, M., Sinha, A., Amandola, M. J. & Leung, H.-C. Longitudinal corpus callosum microstructural decline in early-stage Parkinson’s disease in association with akinetic-rigid symptom severity. NPJ Parkinsons Dis. 8, 108 (2022).
Goldman, J. G. et al. Corpus callosal atrophy and associations with cognitive impairment in Parkinson disease. Neurology 88, 1265–1272 (2017).
Sasikumar, S. & Strafella, A. P. Imaging mild cognitive impairment and dementia in Parkinson’s disease. Front. Neurol. 11, 47 (2020).
Tijms, B. M. et al. Cerebrospinal fluid proteomics in patients with Alzheimer’s disease reveals five molecular subtypes with distinct genetic risk profiles. Nat. Aging 4, 33–47 (2024).
Vogel, J. W. et al. Four distinct trajectories of tau deposition identified in Alzheimer’s disease. Nat. Med. 27, 871–881 (2021).
Kaplan, C. M. et al. Deciphering nociplastic pain: clinical features, risk factors and potential mechanisms. Nat. Rev. Neurol. https://doi.org/10.1038/s41582-024-00966-8 (2024).
Roskoski, R. Jr The role of small molecule platelet-derived growth factor receptor (PDGFR) inhibitors in the treatment of neoplastic disorders. Pharmacol. Res. 129, 65–83 (2018).
Fresno Vara, J. A. et al. PI3K/Akt signalling pathway and cancer. Cancer Treat. Rev. 30, 193–204 (2004).
Duan, Y., Haybaeck, J. & Yang, Z. Therapeutic potential of PI3K/AKT/mTOR pathway in gastrointestinal stromal tumors: Rationale and progress. Cancers (Basel) 12, 2972 (2020).
Heldin, C.-H. Targeting the PDGF signaling pathway in tumor treatment. Cell Commun. Signal. 11, 97 (2013).
Wheeler, R. J., Gluenz, E. & Gull, K. Basal body multipotency and axonemal remodelling are two pathways to a 9+0 flagellum. Nat. Commun. 6, 8964 (2015).
Brusini, I. et al. Automatic deep learning multicontrast corpus callosum segmentation in multiple sclerosis. J. Neuroimaging 32, 459–470 (2022).
Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 19, 1523–1536 (2016).
Alfaro-Almagro, F. et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, 400–424 (2018).
Hagler, D. J. et al. Image processing and analysis methods for the Adolescent Brain Cognitive Development Study. Neuroimage 202, 116091 (2019).
Mazziotta, J. C., Toga, A. W., Evans, A., Fox, P. & Lancaster, J. A probabilistic atlas of the human brain: Theory and rationale for its development. Neuroimage 2, 89–101 (1995).
Mazziotta, J. et al. A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM). Philos. Trans. R. Soc. Lond. B Biol. Sci. 356, 1293–1322 (2001).
Mazziotta, J. et al. A four-dimensional probabilistic atlas of the human brain. J. Am. Med. Inform. Assoc. 8, 401–430 (2001).
Jenkinson, M., Bannister, P., Brady, M. & Smith, S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17, 825–841 (2002).
Jernigan, T. L. et al. The Pediatric Imaging, Neurocognition, and Genetics (PING) Data Repository. Neuroimage 124, 1149–1154 (2016).
Van Essen, D. C. et al. The WU-Minn Human Connectome Project: An overview. Neuroimage 80, 62–79 (2013).
Petersen, R. C. et al. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010).
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 60 (2019).
Tournier, J.-D. et al. MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation. Neuroimage 202, 116137 (2019).
Liu, M. et al. Style Transfer generative adversarial networks to harmonize multi-site mri to a single reference image to avoid over-correction. bioRxiv 2022.09.12.506445 https://doi.org/10.1101/2022.09.12.506445 (2022).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 234–241 (Springer, Cham, 2015).
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. arXiv: 1412.6980 https://doi.org/10.48550/arXiv.1412.6980 (2017).
Zhu, A. H. et al. Robust automatic corpus callosum analysis toolkit: mapping callosal development across heterogeneous multisite data. In Proc. SPIE 10975, 14th International Symposium on Medical Information Processing and Analysis 109750M (SPIE, 2018).
Fischl, B. FreeSurfer. Neuroimage 62, 774–781 (2012).
Gorgolewski, K. J. et al. A high resolution 7-Tesla resting-state fMRI test-retest dataset with cognitive and physiological measures. Sci. Data 2, 140054 (2015).
Zuo, X.-N. et al. An open science resource for establishing reliability and reproducibility in functional connectomics. Sci. Data 1, 140049 (2014).
Baurley, J. W., Edlund, C. K., Pardamean, C. I., Conti, D. V. & Bergen, A. W. Smokescreen: a targeted genotyping array for addiction research. BMC Genomics 17, 145 (2016).
Uban, K. A. et al. Biospecimens and the ABCD study: Rationale, methods of collection, measurement and early data. Dev. Cogn. Neurosci. 32, 97–106 (2018).
Stein, J. L. et al. Identification of common variants associated with human hippocampal and intracranial volumes. Nat. Genet. 44, 552–561 (2012).
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006).
Nagel, M. et al. Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nat. Genet. 50, 920–927 (2018).
Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).
Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).
McGill, M. An evaluation of factors affecting document ranking by information retrieval systems (Report No. ED188587). School of Information Studies, Syracuse University. National Science Foundation. (Syracuse, NY, 1979).
Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res 47, W191–W198 (2019).
Reimand, J., Kull, M., Peterson, H., Hansen, J. & Vilo, J. g:Profiler-a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res 35, W193–W200 (2007).
Watanabe, K., Taskesen, E., Van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1–10 (2017).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, https://doi.org/10.1126/science.aat8464 (2018).
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).
Habib, N. et al. Massively parallel single-nucleus RNA-seq with DroNc-seq. Nat. Methods 14, 955–958 (2017).
Zhong, S. et al. A single-cell RNA-seq survey of the developmental landscape of the human prefrontal cortex. Nature 555, 524–528 (2018).
Darmanis, S. et al. A survey of human brain transcriptome diversity at the single cell level. Proc. Natl Acad. Sci. USA 112, 7285–7290 (2015).
Hochgerner, H. et al. STRT-seq-2i: dual-index 5’ single cell and nucleus RNA-seq on an addressable microwell array. Sci. Rep. 7, 16327 (2017).
La Manno, G. et al. Molecular Diversity of Midbrain Development in Mouse, Human, and Stem Cells. Cell 167, 566–580.e19 (2016).
ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Heng, T. S. P., Painter, M. W. & Immunological Genome Project Consortium The Immunological Genome Project: networks of gene expression in immune cells. Nat. Immunol. 9, 1091–1094 (2008).
GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Grasby, K. L. et al. The genetic architecture of the human cerebral cortex. Science 367, 10.1126/science.aay6690 (2020).
Revez, J. A. et al. Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration. Nat. Commun. 11, 1647 (2020).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Bellenguez, C. et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 54, 412–436 (2022).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Watson, H. J. et al. Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa. Nat. Genet. 51, 1207–1214 (2019).
Friligkou, E. et al. Gene discovery and biological insights into anxiety disorders from a large-scale multi-ancestry genome-wide association study. Nat. Genet. https://doi.org/10.1038/s41588-024-01908-2 (2024).
Mullins, N. et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat. Genet. 53, 817–829 (2021).
Khoury, S. et al. Genome-wide analysis identifies impaired axonogenesis in chronic overlapping pain conditions. Brain 145, 1111–1123 (2022).
Als, T. D. et al. Depression pathophysiology, risk prediction of recurrence and comorbid psychiatric disorders using genome-wide analyses. Nat. Med. 29, 1832–1844 (2023).
International League Against Epilepsy Consortium on Complex Epilepsies GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nat. Genet. 55, 1471–1482 (2023).
Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. 50, 912–919 (2018).
Watanabe, K. et al. Genome-wide meta-analysis of insomnia prioritizes genes associated with metabolic and psychiatric pathways. Nat. Genet. 54, 1125–1132 (2022).
International Obsessive Compulsive Disorder Foundation Genetics Collaborative (IOCDF-GC) and OCD Collaborative Genetics Association Studies (OCGAS) Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol. Psychiatry 23, 1181–1188 (2018).
Leonard, H. L. & Global Parkinson’s Genetics Program (GP2). Novel Parkinson’s disease genetic risk factors within and across European populations. medRxiv https://doi.org/10.1101/2025.03.14.24319455 (2025).
Nievergelt, C. M. et al. Genome-wide association analyses identify 95 risk loci and provide insights into the neurobiology of post-traumatic stress disorder. Nat. Genet. https://doi.org/10.1038/s41588-024-01707-9 (2024).
Forstner, A. J. et al. Genome-wide association study of panic disorder reveals genetic overlap with neuroticism and depression. Mol. Psychiatry 26, 4179–4190 (2021).
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Hatoum, A. S. et al. Multivariate genome-wide association meta-analysis of over 1 million subjects identifies loci underlying multiple substance use disorders. Nat. Ment. Health 1, 210–223 (2023).
Mullins, N. et al. Dissecting the shared genetic architecture of suicide attempt, psychiatric disorders, and known risk factors. Biol. Psychiatry 91, 313–327 (2022).
Yu, D. et al. Interrogating the genetic determinants of Tourette’s syndrome and other tic disorders through genome-wide association studies. Am. J. Psychiatry 176, 217–227 (2019).
PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Human Gene. 10.1086/519795 (2025)
Zhang, Q.-X. et al. Precise estimation of in-depth relatedness in biobank-scale datasets using deepKin. Cell Rep. Methods 5, 101053 (2025).
Shatokhina, N. et al. ENIGMA-Vis: A Web Portal to Browse, Navigate & Visualize Brain Genome-Wide Association Studies (GWAS). Biol. Psychiatry 89, S136 (2021).
Acknowledgements
This work was supported by the National Institutes of Health (Grant Nos. R01MH134004 and R01AG059874 [NJ], RF1NS136995 [PMT and NJ] and R01AG087513 [NJ], National Science Foundation Graduate Research Fellowship Program (Grant No. 2020290241 [RRB], R01MH126213, R01NS105746, the Adolescent Brain Cognitive Development (ABCD) Study (https://abcdstudy.org), and UK Biobank (Resource Application No. 11559). SEM was supported by NHMRC grants APP1172917 and APP1158127. Research reported in this publication was supported by the Office Of The Director, National Institutes Of Health of the National Institutes of Health under Award Number S10OD032285. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
Conceptualization and design: R.R.B., S.P.G., and N.J.; Genetics Methodology: RRB, SPG, AS, CDL, SEM, PMT, NJ; Imaging Methodology: S.P.G., I.B.G., S.J., A.R., E.N., A.H.Z., R.R.B., and N.J.; Data analysis: R.R.B., S.P.G., A.S., S.J., I.B.G., C.D.L., S.E.M., and N.J.; Visualization: R.R.B., S.P.G., A.S., E.H., and N.J.; Drafting of manuscript: RRB, SPG, NJ; All authors contributed to the critical revision of the manuscript. All figures are original, and there are no copyright restrictions or attribution requirements.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Rene Human-Baron and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bhatt, R.R., Gadewar, S.P., Shetty, A. et al. The Genetic Architecture of the Human Corpus Callosum and its Subregions. Nat Commun 16, 9708 (2025). https://doi.org/10.1038/s41467-025-64791-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-64791-3









