A pattern-learning algorithm associates copy number variations with brain structure and behavioural variables in an adolescent population cohort

Kopal, Jakub; Huguet, Guillaume; Marotta, Justin; Aggarwal, Shambhavi; Osayande, Nicole; Kumar, Kuldeep; Saci, Zohra; Jean-Louis, Martineau; Chai, Xiaoqian J.; Ge, Tian; Yeo, B. T. Thomas; Thompson, Paul M.; Bearden, Carrie E.; Andreassen, Ole A.; Jacquemont, Sébastien; Bzdok, Danilo

doi:10.1038/s41551-025-01454-0

Download PDF

Article
Open access
Published: 18 July 2025

A pattern-learning algorithm associates copy number variations with brain structure and behavioural variables in an adolescent population cohort

Nature Biomedical Engineering volume 10, pages 354–369 (2026)Cite this article

4709 Accesses
1 Citations
10 Altmetric
Metrics details

Subjects

Abstract

Our genetic makeup, together with environmental and social influences, shape our brain’s development. Yet, the imaging-genetics field has struggled to integrate all these modalities to investigate the interplay between genetic blueprint, brain architecture, environment, human health and daily living skills. Here we interrogate the Adolescent Brain Cognitive Development (ABCD) cohort to outline the effects of rare high-effect genetic variants on brain architecture and their corresponding implications on cognitive, behavioural, psychosocial and socioeconomic traits. We design a holistic pattern-learning framework that quantitatively dissects the impacts of copy number variations (CNVs) on brain structure and 938 behavioural variables spanning 20 categories in 7,338 adolescents. Our results reveal associations between genetic alterations, higher-order brain networks and specific parameters of the family wellbeing, including increased parental and child stress, anxiety and depression, or neighbourhood dynamics such as decreased safety. We thus find effects extending beyond the impairment of cognitive ability or language capacity which have been previously reported. Our investigation spotlights the interplay between genetic variation and subjective life quality in adolescents and their families.

Neurobiological, familial and genetic risk factors for dimensional psychopathology in the Adolescent Brain Cognitive Development study

Article 31 March 2022

The effects of adverse life events on brain development in the ABCD study®: a propensity-weighted analysis

Article 22 November 2024

The diagnostic yield, candidate genes, and pitfalls for a genetic study of intellectual disability in 118 middle eastern families

Article Open access 07 November 2022

Main

Genetic architecture contributes directly and indirectly to the wiring of brain circuits and provides the foundation of behaviour repertoire manifestation^1,2. By understanding genetic underpinnings, it is possible to unravel the origins of individual differences in cognitive processes and behaviours, offering insights into both adaptive capacities and developmental vulnerabilities³. Identifying biological determinants behind brain organization and behavioural differentiation necessitates an integrative approach that cuts across an array of disciplines. Nevertheless, neuroimaging genetics, psychiatric genetics and environmental factor studies have been conducted in isolated silos.

Genetic underpinnings of phenotypes or disease have been traditionally studied through genome-wide association studies (GWAS). However, GWAS have been restricted to common variants which mainly reside in non-coding regions and exert only small effects on many phenotypes, including those studied in neuroscience⁴. Compared with incumbent single nucleotide polymorphism (SNP) analyses in GWAS, protein-coding copy number variations (CNVs) represent rare and consequential genome-wide perturbations leading to a large decrease or increase in gene expression. This class of genetic variation is defined as either a deletion or duplication of sequences of nucleotides more than 1,000 base pairs long^5,6. Notably, CNVs have been associated with neurodevelopmental disorders (for example, autism spectrum disorder⁷ or attention-deficit-hyperactivity-disorder⁸) and psychological/psychiatric disorders (for example, schizophrenia^9,10, bipolar disorder¹¹ or major depressive disorder¹²).

Many protein-coding CNVs are now being understood to exert body-wide implications^13,14 and cortical alterations¹⁵. Research indicates that CNVs contribute to cortical changes in the brain, affecting both its structure and function^16,17. The observed patterns of robust cortical alterations were largely specific to individual CNVs^14,18. The different brain alterations can lead to ramifications beyond the impairment of cognitive ability or language capacity, dominantly reported in the CNV literature^19,20. Systemic associations outside the central nervous system, including the cardiovascular system, might contribute to decreased longevity of CNV carriers in the general population^13,14. Since protein-coding CNVs are cumulatively frequent in the population and have the potential for substantial effects on a given phenotype, they represent an emerging potent imaging-genetics tool.

During the period of adolescence, brain circuits and behavioural tendencies undergo dynamic changes shaped by genetic factors, environmental influences and their interactions^21,22. Adolescence is also a life stage during which symptoms of numerous psychiatric disorders become apparent²³. Recent findings underscore the necessity of adopting a multidimensional and interdisciplinary approach that cuts across sociology, psychology and biology, conventionally studied in isolation. Such a holistic perspective is essential for a more nuanced understanding of the intricate interplay of genetic, socioeconomic and environmental factors influencing healthy children’s development²⁴. By integrating information from cognitive assessments, genetic information and socio-environmental measures, it is possible to identify potential risk factors as well as unveil protective elements contributing to resilience in individuals navigating the complexities of adolescence^25,26. There is an ongoing debate on whether CNVs exhibit specific associations with particular disorders, or rather influence neurodevelopment as a whole. As a result, we should carry out analyses and studies that are open to the possibility that CNVs will impact behaviour in various ways throughout adolescence²⁷. The analysis of CNVs in adolescents is positioned to carve out important interactions between our genetic heritage, the environmental milieu, and the intricacies of cognitive and social development.

In the present study, we leveraged understudied rare genetic alterations (genome-wide CNVs) with strong downstream effects. We interrogated the Adolescent Brain Cognitive Development℠ Study (ABCD Study)²⁸, which represents one of the largest collections of brain images and genetic profiles from over 10,000 children aged 9–11 years at baseline. These adolescents are prospectively deeply phenotyped by means of an extensive battery of cognitive, behavioural, clinical, psychosocial and socioeconomic measures. Benefiting from this comprehensive multimodal data, we investigated the effects of a genomic deletion and duplication on patterns of individual participants’ brain architecture linked with cognitive, behavioural, psychosocial and socioeconomic measures in a single unified multivariate analysis. Specifically, we first probed curated data from 7,338 children for the presence of CNVs. We then deployed multivariate pattern-learning algorithms in children without any CNV to estimate modes of population covariation between brain architecture, represented by 148 regional atlas volumes and ~1,000 behavioural variables spanning 20 rich categories. Finally, we quantified the effects of deletions and duplications on the revealed canonical modes. The robustness of derived modes and CNV-induced differences were substantiated by cross-validation and permutation testing^24,29. This multidimensional and doubly multivariate framework revealed the multifaceted relationships between genes, brain architecture and behaviour, which paves the way for innovation in neuroscience, genetics and personalized medicine.

Results

Genome-wide mutations alter patterns of brain and behaviour

We used a pattern-learning approach to analyse the impact of genome-wide CNVs in the ABCD cohort by means of its uniquely deep phenotype profiling. To this end, in the group of 7,338 children that passed genetic and MRI quality control, we first identified 486 children carrying at least one genomic deletion fully encompassing at least one gene and 1,406 children carrying at least one duplication that fully encompassed at least one gene. In addition, we identified 132 children who carried both a deletion and a duplication, and these individuals were included in both the deletion and duplication groups for subsequent analysis. The remaining 72% (5,314) of the children did not carry any protein-coding CNV larger than 50 kb across the genome (Fig. 1a). These participant groups (deletions, duplications, controls) showed similar proportions of sex (percentage of females: 44–48%) and distributions of age (Fig. 1a).

**Fig. 1: Linking genome-wide CNVs and brain–behaviour population covariation.**

Next, we zoomed in on the CNVs that we localized in the children’s genetic profiles (Fig. 1b). Almost 60% of deletions encompassed a single complete gene. Duplications generally encompassed more affected genes than deletions, although a single-gene duplication was the most common (~30% of cases). Besides the genetic profiling, the ABCD resource provides brain and behaviour measurements for each participant: brain measurements were represented by 148 regional brain volumes defined according to the Desikan–Killiany standard atlas. Behavioural measures drew across 938 different phenotypes spanning 20 categories for in-depth follow-up analyses.

To investigate how genetic mutations impact brain and behaviour, we first established the link between measurements of brain architecture and behaviour using a single holistic multivariate model. Specifically, we brought to bear a partial least squares (PLS) model that maximizes the covariation between the weighted set (linear combination) of sociodemographics, family wellbeing, physical characteristics, or behavioural measures and a weighted set (linear combination) of brain structure measures (Fig. 1c). The PLS model parameters were initially estimated in participants without any CNV, as a reference group, to reveal the modes of covariation that reflect the general population. The participant-wise expressions of each brain–behaviour covariation mode are hereafter called ‘scores’. In other words, these ‘scores’ are calculated as a linear combination (weighted sum) of the original variables with PLS weights. Each identified PLS mode can thus be characterized by a set of brain and behaviour scores for all participants. Using a robust protocol for cross-validation and empirical permutation testing²⁴, we identified three significant PLS modes (Supplementary Fig. 1). These revealed major sources of population covariation in adolescents captured the ways in which brain features are intertwined with early life events, mental wellbeing or environment.

In the next step, we wished to evaluate whether carrying a coding CNV led to statistically significant shifts in the observed brain and behaviour patterns. To this end, we devised a cross-validation scheme that compares PLS scores between controls and CNV carriers, all derived from a single PLS model (Supplementary Fig. 2). Specifically, we first estimated a single PLS model using the control group data, which captured population-level brain–behaviour covariation. We then fed the brain and behaviour data of CNV carriers through this same model, yielding analogous estimates in the CNV group. This approach ensures that the same PLS modes represent the same brain–behaviour associations across both groups. In the subsequent CNV–control comparisons, deletion and duplication carriers were pitted against control participants not used to derive PLS parameters to prevent overfitting (cf. Methods for details). The comparison was based on separately testing the difference in the average behaviour scores and the average brain scores. We were thus able to assess mode expression differences separately for deletions and duplications (Fig. 1c). We found significant differences between deletions and controls in behaviour scores for all 3 identified modes (P_mode1 = 0.003, P_mode2 = 0.003, P_mode3 = 0.010 after false discovery rate (FDR) correction). By contrast, duplications showed significant difference for the first 2 behavioural modes (P_mode1 = 0.003, P_mode2 = 0.016, P_mode3 = 0.051). Furthermore, there was a significant shift in brain scores for the all 3 modes in duplications (P_mode1 = 0.003, P_mode2 = 0.001, P_mode3 = 0.037) and the second mode in deletions (P_mode1 = 0.349, P_mode2 = 0.018, P_mode3 = 0.161). A sensitivity analysis demonstrated that the obtained differences were not driven by the presence of recurrent CNVs, such as 16p11.2 or 22q11.2 (Supplementary Fig. 3). To further test the robustness of our findings, we conducted an additional analysis where all but one sibling per family were excluded from the main dataset. This analysis also produced results consistent with those of our primary study, reinforcing the reliability of our findings (Supplementary Fig. 4). Collectively, our results revealed that carrying a CNV significantly impacts the expression of patterns linking brain architecture and diverse aspects of cognitive, psychosocial and socioeconomic measures in our ABCD sample. In other words, genetic factors contribute to individual differences in brain–behaviour correspondences in adolescents.

Tri-modal population modes link brain, behaviour and environment

After identifying robust deviations of brain–behaviour patterns in CNV carriers at population scale, we examined each revealed mode in more detail. The dominant (that is, the first) mode portrayed the ties between large-scale brain networks with sociodemographics and cognition. Specifically, we first re-expressed the difference in PLS scores between controls and CNV carriers using Cohen’s d measure to provide a standardized measure of CNV-carriership effect size. The dominant PLS mode was characteristic of significantly altered behaviour scores, with the shift being more prominent in CNV deletions (CNV–controls Cohen’s d_DEL = 0.17, d_DUP = 0.12) (Fig. 2a). To find which phenotypes play a prominent role in the first mode, we calculated brain and behaviour loadings. Our version of these loadings here was obtained by Pearson’s correlation between a respective PLS score and the original measurement (Supplementary Table 1). As an example, each brain loading indexes the linear association strength between brain region measurements and brain scores in our reference group. Among the strongest brain effects, we observed the medial orbital sulcus (average of left and right hemisphere Pearson’s r = 0.30), a part of the frontal lobe which may be involved in various cognitive functions, including decision making, emotional processing and social cognition³⁰. Since duplication carriers displayed higher brain scores compared with controls and since the medial orbital sulcus was associated with positive loading (higher volume = higher score), this result pointed to increased volume in this region for duplication carriers. Other strong loadings included the middle occipital sulcus, subcallosal area, superior occipital gyrus, or right lingual gyrus (Fig. 2b). We subsequently mapped obtained loadings onto a brain surface. Notably, the temporal lobe, parietal cortex and parts of the frontal cortex played a crucial role in the dominant mode. We then computed the average absolute loading effects in each of the seven large-scale networks according to Schaefer–Yeo definitions (Fig. 1c). Finally, we submitted brain loadings to a formal bootstrap test to determine whether they were significantly different from zero (cf. Methods). This test was based on 1,000 PLS model instances built on a randomly perturbed version of our ABCD participants created by sampling a participant cohort of the same sample size (with replacement). We observed that at least 64% of the loadings were significant, highlighting the robustness of this first mode.

**Fig. 2: The leading population mode tracks decreased cognitive functioning in CNV carriers.**

Furthermore, we inspected a broad portfolio of behaviour characteristics interlocked with the above-described brain-level effects. To this end, we calculated behaviour loadings similarly to brain loadings. The strongest loadings included family income (Pearson’s r = −0.68), poverty index (Pearson’s r = 0.66), parental education (Pearson’s r = −0.58), measures of cognitive performance (Pearson’s r = −0.56) and also screen time or sleep duration (Pearson’s r = 0.46) (Fig. 2c). To obtain a synopsis of the dominant behavioural profile, we averaged absolute behaviour loadings in each of the 20 categories. Demographics, cognitive and socioeconomic categories had the strongest average loadings (average absolute Pearson’s r > 0.22). Since CNV carriers displayed higher expression compared with controls for this mode characterized by negative loading for measures of cognitive performance (lower performance = higher score), these results thus point to decreased cognitive abilities and real-life functioning, especially in deletion carriers. Collectively, the dominant canonical mode highlighted the crosslinks between (i) frontoparietal and temporal regions and assessments of (ii) cognition and (iii) demographics.

The second PLS mode spotlighted opposing gene dosage effects on the brain structure that we identified to tie into family history of mental health. Specifically, we observed significant opposing brain average expressions for both deletions and duplication (CNV–controls Cohen’s d_DEL = 0.04, d_DUP = −0.06) (Fig. 3a), which might reflect the mirroring effect on brain architecture previously reported for CNVs at specific genomic loci³¹. Similarly to the dominant behavioural mode, we also observed significantly different behavioural scores with stronger effects for deletions (CNV–controls Cohen’s d_DEL = −0.07, d_DUP = −0.05). According to the calculated brain loadings (Supplementary Table 1), the mirroring brain scores were mainly driven by the precentral gyrus (across-hemisphere average Pearson’s r = −0.37), followed by supramarginal, postcentral, or lingual gyri (Fig. 3b). Despite being part of distinct brain networks, these regions were previously associated with neural mechanisms supporting complex cognitive tasks, especially those involving semantic processing or executive functions^32,33. Following the conducted bootstrap significance test, 43% of the brain loadings were significantly different from zero. Collectively, regions with the strongest loadings belonged to somatomotor, dorsal attention and frontoparietal networks. The interactions and coordinated activity of these networks are known to be essential for the efficient integration and execution of complex cognitive and motor tasks³⁴.

**Fig. 3: The second population mode spotlights a shift in brain scores associated with mental wellbeing.**

The prominent deviations in behaviour scores in CNV carriers can be explained by elevated assessments of mental wellbeing as revealed by behaviour loadings. Specifically, phenotypes from the Child Behaviour Checklist (CBCL) and the Adult Self Report (ASR) dominated the set of relevant behaviour loadings (Fig. 3c). Particularly, the total scores of CBCL (Pearson’s r = 0.70) and ASR (Pearson’s r = 0.74) emerged as the two strongest loadings. They were followed by measures of both parental and child anxiety, stress and depression, as well as child sleep disorders. Indeed, when averaged across categories, the sleep category joined child behaviour and parental questionnaires as the most prominent (average Pearson’s r = 0.22). The combination of flagged phenotypes from both children and adult assessments suggests that the second mode captures a comprehensive view of the wellbeing intricately tied to the family system. In addition, it points towards potential dynastic effects, that is, the impacts of (inherited) genetic variants on family environments. Collectively, the second canonical mode proposed decreased familial mental wellbeing as a prominent marker of deletion carriers.

In the third and last canonical mode, we observed the relationship of the default mode and frontoparietal networks with environmental measures. Despite the mirrored effects on brain structure, the only significant shift was found for brain scores in duplication carriers and behaviour scores in deletion carriers (Fig. 4a). The third mode was characterized by a strong contribution of the insula (average Pearson’s r = −0.35) as well as middle temporal and lateral superior temporal gyri (Fig. 4b). The bootstrap test points to a lower stability in this mode, where 10% of brain loadings show significance. The strongest brain loadings were part of the default mode network (average absolute Pearson’s r = 0.13). This network belongs to the multimodal end of the unimodal-to-multimodal characterization of large-scale brain networks. Previous research suggests that the relevance of this network and underlying regions could imply their crucial roles in several key cognitive processes, including self-reference, social cognition, episodic and autobiographical memory, language or semantic memory³⁵.

**Fig. 4: The third population mode links higher-order networks to environment measures.**

Examining the behaviour profile in the third canonical mode highlighted variables associated with the environment (Fig. 4c). Concretely, phenotypes related to the neighbourhood, such as crime reports (Pearson’s r = 0.71), drug possession, violent crimes, adult offence and feelings of safety emerged as strongly associated. These phenotypes reflect social and community dynamics, which might affect the overall quality of life for individuals within that context. In summary, the third significant mode revealed an alteration in how environmental differences link to higher-order networks in adolescent deletion carriers.

Exploring brain–behaviour relationship across genes, population, time

After describing the interconnections between genetic mutations and the expressions of behaviour patterns, we explored how characteristics of genes encompassed in CNVs shaped behaviour scores. In other words, for each individual with a deletion or duplication, we scored the genes inside a CNV using a total of seven complementary descriptions, including the average temporal expression, number of genes preferentially expressed in the brain, number of genes associated with autism spectrum disorder, schizophrenia, or a broader portfolio of disorders, and a functional intolerance score: the inverse of loss-of-function observed/expected upper bound fraction (1/LOEUF). The 1/LOEUF score reflects the degree of negative selection pressure against loss-of-function mutations in a gene, with higher values indicating stronger evolutionary constraint and greater intolerance to functional disruption in the general population. We then performed an exploratory analysis using Pearson’s correlation between behaviour scores and the quantitative descriptions of CNVs occurring in the genome (Fig. 5a). For deletions, the strongest observed association was with the sum of 1/LOEUF (Pearson’s r_mode3 = −0.09, P_FDR = 0.10). For duplications, the strongest observed association was with temporal gene expression (Pearson’s r_mode1 = 0.10, P_FDR = 0.001). This result suggested deteriorating impact of altering dosage in genes expressed later during human development. Another strong association was between the dominant mode and the sum of 1/LOEUF scores, where the positive correlation with PLS scores suggested decreased cognitive performance among CNV carriers. Due to the limited number of CNV carriers, only the association with genetic temporal profile reached significance after applying FDR correction to the control for multiple comparisons. Nevertheless, the reported associations can serve as valuable pointers for further research. Additional analysis using probability of loss intolerance (pLI) as another measure of functional intolerance of CNVs is in Supplementary Fig. 5. In summary, our findings underscore the intricate relationship between genetic characteristics and behavioural outcomes, highlighting the importance of considering both genetic and temporal dimensions in understanding the aetiology of behavioural patterns and susceptibility to disorders.

**Fig. 5: Population modes are driven by temporal gene characteristics rather than sociodemographic factors.**

In the next step, we explored whether modes of population stratification, that is, specific sociodemographically defined groups, also influence the derived patterns. In other words, we quantified whether ethnicity, sex, age, or genetic background are linked with the shifts in brain and behaviour scores. As a concrete example of this sensitivity analysis, we stratified participant-wise scores for the first mode by reported ethnicity as defined by the ABCD team³⁶ (Fig. 5b). Using one-way analysis of variance (ANOVA), we assessed whether there were significant differences in scores as a function of these diverse ethnic categories. Notably, the findings revealed that the scores did not exhibit a statistically significant difference among ethnicities (F-statistic = 0.89, P = 0.56). We then extended this post hoc analysis to other modes of population covariation and other metrics of population stratification. Namely, we quantified the difference in scores between males and females using a two-sample t-test and as a function of the 21 recruitment sites using one-way ANOVA. Moreover, we probed the linear association of scores with age and the ancestry structure of the cohort measured using the first ten principal components of genotyping data. We collected all P values and applied FDR correction to control for multiple comparisons across the totality of 52 performed tests. None of the performed tests revealed significant association (Fig. 5c). This comprehensive examination provided valuable insights into the potential universality of the observed scores among modes of population stratification, underscoring the importance of considering the generalizability aspect in the broader context of the study’s implications³⁷.

As the final step, we extended our analyses by examining longitudinal changes in brain structure between controls and CNV carriers at the 2-years-after-imaging timepoint (Supplementary Fig. 6). We focused on brain structure for this analysis step because the majority of behavioural phenotypes were not available for the second timepoint. Investigating the trajectory of brain development over time can provide insights into whether individuals with CNVs exhibit distinct patterns of structural change. We benefited from the availability of 3,715 brain scans measured 2 years after the first visit (51% of participants passing quality control). We observed a high correlation between regional volumes acquired at these two timepoints (Supplementary Fig. 7). We used the PLS model derived using baseline measurements to re-express brain measurement from the 2-year follow-up (cf. Methods). Put differently, we applied the baseline PLS model to the follow-up brain measurements to assess how the brain measurements at follow-up align with the brain patterns established at baseline. In doing so, the original PLS model provided a holistic summary of brain maturation by calculating a brain score for each participant at each visit, for each of the three modes. We then calculated Pearson’s correlation between the brain scores from the baseline and follow-up measurements. Similar to brain structure measurements, we observed a strong link between PLS scores in the dominant mode between the baseline and follow-up measurements in controls (Pearson’s r = 0.89) (Fig. 5d). Furthermore, we used a linear mixed-effects model (cf. Methods) to examine whether the rate of change in brain scores differed between controls and CNV carriers (Fig. 5e). We found a significant main effect of measurement time point, indicating that brain scores changed significantly over time across all three groups (Supplementary Table 2). The interactions between time and group (that is, control or CNV carrier) were not significant after FDR corrections, suggesting that the rate of change in brain scores over time did not differ significantly between controls, duplication carriers and deletion carriers (Supplementary Table 2). Therefore, both CNV groups displayed similar brain maturation patterns compared to controls. Nevertheless, the second mode remains a promising target for further exploration in studies with greater statistical power (coefficient for the interaction between CNV and time, uncorrected P_DEL = 0.045, P_DUP = 0.076). Given the observed similarity in brain structure developmental patterns of CNV carriers and controls, further exploration of earlier stages of life may provide further valuable insights into distinctions in neurodevelopmental processes.

Discussion

In this quantitative population neuroscience study, we carefully examined the ramifications of carrying an exonic CNV on brain organization and behaviour. To this end, we designed an analytic protocol based on a holistic tri-modal pattern-learning framework that can cleanly dissect the impact of genetic mutations on multimodal measurements that cut across disciplines to untangle the complex genes–brain–behaviour interplay. This multivariate model uncovered three significant modes of covariation between brain volume and behaviour. The first mode connected robust volumetric differences in frontoparietal and temporal regions with measures of cognition and demographics. The second mode linked dorsal attention, somatomotor and frontoparietal networks with mental health measures. Finally, the third mode highlighted associations between the higher-order networks and environmental factors. We then drew a detailed picture of how carrying a genomic deletion or duplication influences the expression of these comprehensive brain and behaviour patterns. Specifically, deletions and duplications were linked with negative effects on family wellbeing, as seen in the adverse effects on cognitive functioning, mental health and socioeconomic outcomes. Our collective results also highlight the similar ramifications for cognition and behaviour associated with deletions and duplications despite their distinct effects on brain anatomy, corroborating some of our earlier CNV-imaging studies on the UK Biobank cohort^14,38.

The analyses of genetic influences have long been dominated by univariate frameworks^39,40. These standard regression approaches model one input variable at a time and thus focus on individual variables independently while neglecting the complex relationships and synergies that exist among genes, brain and behaviour. In other words, univariate approaches struggle to provide a natural approach to analysing high-dimensional data and harnessing the ‘curse of dimensionality’, making it challenging to capture the joint influence of multiple variables⁴¹. The growing availability of variable-rich and multimodal datasets with deep phenotypic profiling prompts a change in our traditional analytic toolkit⁴². Doubly multivariate techniques, for example, PLS, address several limitations of mass univariate approaches, providing a more nuanced and integrated perspective on the relationships between thousands of measures of brain architecture and behaviour in the general population^24,29. Previous research showed that particular genes are an important contributor to the interindividual variability of thus uncovered latent patterns⁴³. Building on the heritability of the latent patterns, we showcased that their expression is further shaped by the presence of genome-wide protein-coding mutations. Our findings address the need for a deeper examination of the relationship between CNVs, brain structure and behaviour, as recently proposed as an important research direction going forward¹⁵, revealing their effects on social, familial and environmental factors.

The consequences for various aspects of human health and wellbeing often go unnoticed because analyses of genomic deletions and duplications most commonly focus on intellectual disability and developmental delay^19,44. Developmental delay phenotypes, especially language and motor disorders, are the earliest symptoms for which children are clinically referred for assessments and genetic testing⁴⁵. Recent results showcased potential lifelong implications represented by diminished academic qualifications, occupation or household income for a small set of schizophrenia-associated CNVs⁴⁶. As an important contribution of our present investigation, our results demonstrated that the genome-wide presence of any coding CNV might be linked to impaired real-life functioning, represented here by cognitive performance, income, education, screen time, or sleep duration. These characteristics played a driving role in our dominant mode of population covariation, which is tightly linked to frontoparietal and temporal networks—regions frequently reported to be altered by previous single-CNV studies, such as 22q11.2 (ref. ¹⁵). A similar dominant mode characterized by cognitive measures as well as screen time was identified in the Human Connectome Project population resources²⁹. The stronger influence of deletions on the dominant mode compared with duplications is concordant with the more pronounced effect of deletions on cognitive ability observed in clinical studies⁴⁷.

We also reveal additional consequences beyond just the dominant population mode, which are at risk of staying hidden in classical analyses⁴⁸. Concretely, our second mode highlighted impoverished familial mental wellbeing as a prominent marker of CNV carriers. Notably, the presence of phenotypes from both child and parental questionnaires demonstrates how wellbeing is closely tied to the family system. It has been estimated that over 99% of CNVs are inherited⁴⁹. Therefore, in addition to influencing offspring phenotype through genetic inheritance, the parental genotype can indirectly influence offspring phenotype through its expression in the parental phenotype⁵⁰. Where this occurs, offspring may be subject to both phenotype-associated CNV and phenotype-associated environments from parents. In conclusion, the CNVs we studied in adolescents have likely been passed down from either parent, which points toward influences on the overall family system. The multigenerational impact where genetic and also environmental legacies contribute to the behavioural outcomes highlights the complex interplay between inherited genetic variations and the environments shaped by parental phenotypes.

Finally, our across-CNV analyses also revealed new associations between genomic deletions, environmental factors and higher-order brain networks, represented here mainly by regions in the temporal lobes. Interestingly, the higher association cortex, especially the default mode network, was suggested to be more ‘life wired’, resulting from differences in the circumstances and contexts in which people grow up and everyday life experiences⁵¹. The deeper layers of the neural processing hierarchy, such as the default mode network, allow for greater environmental influence and plasticity, as demonstrated by prolonged maturation and slower myelination compared with sensory/motor circuits in human primates^52,53. Our finding adds more evidence for the adaptive and dynamic nature of the recently evolved parts of the human brain, emphasizing the prominent role of genetic and environmental interplay in shaping neural development and function. Importantly, the environmental milieu, here represented by measures of crimes, drug possession, or violence, is related to health through psychological, physiological and behavioural pathways⁵⁴. Previous research documented chronic health conditions to be more prevalent in low-income neighbourhoods, including those affecting infants (low birth weight), children (asthma) and adults (cardiovascular health)⁵⁵. Specifically, living in low socioeconomic status neighbourhoods⁵⁶ and neighbourhoods perceived as unsafe⁵⁷ displayed elevated physiological risk, which includes indicators of inflammation and neuroendocrine and cardiovascular functioning.

Here we provided a detailed depiction of how environmental and behavioural factors are reflected in brain structure. Moreover, we documented the alterations in these brain-wide patterns in the presence of genetic mutations. In concordance, previous single-CNV studies identified brain-wide patterns of regional alterations that robustly differentiate controls from carriers of clinical CNVs (sensitivity 94.2% and specificity 93.3% in classifying 22q11.2 cases from healthy controls)^14,58. Here we broaden the incumbent analysis scope of a few selected CNVs towards any coding CNV present in the genome. The brain pattern corresponding to our second across-CNV population mode highlighted opposing effects of deletions and duplications, recapitulating the mirroring effects observed in clinical studies³¹. As a primary example, the lingual gyrus here played a dominant role in two altered brain patterns. The effects of CNVs on this region have been documented for carriers of 16p11.2 CNVs⁵⁹. Similarly, as prominent examples from the frontal lobe, we observed significant contributions of the middle and superior frontal gyrus, which have been shown to be impacted by 1q21.1 (ref. ¹⁸) and 15q11.2 (ref. ⁶⁰) alteration, respectively. Impairments of lingual and frontal gyri have been associated with anxiety–depression severity⁶¹ or attention control deficit⁶²—phenotypes often present in CNV carriers⁶³. While these regions recapitulate certain findings from commonly conducted single-CNV studies, our analysis puts forward the concept of shifted multivariate patterns, capturing more complex interactions and revealing how CNVs influence broader brain–behaviour relationships. While the identified whole-brain patterns represent a general trend in each type of genetic mutation, the specific alterations pertinent to a specific CNV (for example, 22q11.2 or 16p11.2) are further moulded by the attributes of genes that are affected by a given CNV. According to analyses, the final brain and behaviour profile can be shaped by various attributes of deleted/duplicated genes, including their tolerance to being mutated, or the temporal expression profile of affected genes. This may be part of the reason why previous research found brain patterns associated with deletions at 22q11.2 loci to strongly resemble deletions at 15q11.2 loci while being different from 16p11.2 deletions¹⁴. Our findings reveal overarching brain patterns shared across various CNVs, yet the precise alterations associated with individual CNVs vary and are determined by the characteristics of the affected genes, such as their temporal and spatial expression profiles as well as environmental impacts^19,64,65.

In conclusion, we developed a multilevel pattern-learning framework to investigate the effects of genome-wide protein-coding mutations on brain organization and behaviour. This approach offers a comprehensive view of the multifaceted impact of rare genetic variations, surpassing limitations of many traditional univariate frameworks. We revealed that both genomic deletions and duplications may contribute to challenges in family wellbeing through associations with increased parental and child stress, anxiety and depression, as well as neighbourhood violence. These behavioural and emotional challenges are mirrored by shifts in brain organization, with alterations predominantly seen in higher-order networks, underscoring the profound cross-associations between genetic mutations, behavioural outcomes and changes in brain structure. Future research building on such approaches, transcending levels of description usually studied in isolation, can better appreciate the complexity of the relationship between genetic determinants and human health.

Methods

ABCD population data source

Brain imaging, behavioural, clinical and genetic data in this study were obtained from the Adolescent Brain Cognitive Development Study (ABCD), representing the most extensive biomedical child development study of its kind. The ABCD Study acquired data from 11,877 children aged 9–10 years (mean age = 9.49 years) from 21 sites across the United States (48% girls; 57% Caucasian, 15% African American, 20% Hispanic, 8% other)⁶⁶. We leveraged baseline measurements from ABCD Annual curated release 4.0, which contains baseline data on the entire participant cohort as well as early longitudinal data, including 2-year follow-up neuroimaging data (second brain-imaging timepoint). All protocols for ABCD were approved by either a central or site-specific institutional review board committee⁶⁷. Caregivers provided written, informed consent and children provided verbal assent to all research protocols^68,69.

Genetic annotation and CNV calling

Our study is built on the identification of exonic CNVs in the ABCD study sample. The genotyping protocol for the ABCD sample (n = 11,088) has been described previously⁷⁰. In addition to the quality control (QC) provided by ABCD, we performed several additional steps to ensure high quality of the genetic data. Using PLINK (v.1.9)⁷¹, we removed SNP variants with a missing rate >5% as well as SNPs with a Hardy–Weinberg equilibrium exact test P < 0.0001. We only considered arrays with call rate ≥99%, log R ratio s.d. < 0.35, B allele frequency s.d. < 0.08, absolute value of wave factor <0.05 and count of all unfiltered CNV per sample ≤10 (n = 7,896). These thresholds align with the quality control guidelines established by the Psychiatric Genomics Consortium CNV calling pipeline^10,19,72.

Concurrently, on the basis of genotyping, we identified 73 participants with >5% missing data and 419 participants with a high degree of identity-by-descent (PI_HAT >0.8), indicative of duplicated data or monozygotic twins. For these cases, we retained the array with the highest call rate. On the basis of these criteria, we excluded a total of 238 individuals from the CNV dataset, resulting in a final sample of n = 7,658. No individuals had discordant phenotypic and genetic sex information. Finally, we excluded 51 individuals associated with plate 461 (based on ABCD instructions, n = 7,607).

The identification of CNVs using SNP array (GRCh37/hg19) data followed previously published methods^19,20. CNVs were called using the pipeline described at https://github.com/labjacquemont/MIND-GENESPARALLELCNV. In short, we computed PFB-files (Human Genome Build NCBI37/hg19) on the basis of 500 best arrays in ABCD, and we used GC (content)-model files (https://kentinformatics.com and https://github.com/ucscGenomeBrowser/kent.git). Autosomal CNV detections from either PennCNV⁷³ or QuantiSNP⁷⁴, or both, were combined using CNVision⁷⁵. All identified CNVs met stringent quality control criteria: confidence score ≥30 (for at least one of the two detection algorithms), size ≥50 kb, unambiguous type (deletion or duplication), overlap with segmental duplicates, and HLA regions or centromeric regions <50%. In addition, we employed our in-house machine-learning algorithm, DigCNV (https://github.com/labjacquemont/DigCNV), which leverages nine CNV characteristics—including array metrics, localization metrics and CNV-specific metrics—to detect additional artefact CNVs. Finally, all carriers (1 participant) of a structural variant ≥10 Mb, a mosaic CNV or a chromosome anomaly (aneuploidy or sexual chromosome anomaly) were removed (n = 7,606). For the final set of participants, we calculated the first 10 genetic principal components (PCs) using the –pca function in PLINK (v.2.3)⁷⁶. After these quality control steps, we identified 668 sibling pairs using the KING protocol (‘king -b file.bed–related’, https://www.kingrelatedness.com/manual.shtml).

All identified CNVs were annotated using Gencode V19 (hg19) with ENSEMBL (https://grch37.ensembl.org/index.html). In this study, we only used exonic CNVs that fully encompassed at least one gene. In addition to the number of encompassed genes, each CNV was further annotated with seven other previously used scores. Specifically, we used an annotation quantifying the tolerance to protein-loss-of-function of each gene: 1/LOEUF⁷⁷. Each CNV was then characterized by the sum of 1/LOEUF of encompassed genes. Higher scores thus indicated greater intolerance to loss-of-function mutations. Furthermore, CNVs were described using average temporal expression⁷⁸ and average peak epoch. Gene-wise temporal expression was calculated as the developmental trajectory that the gene follows based on trajectory analysis (gene-specific trajectory coding: ‘Rising’ = 1, ‘Non-transitional’ = 0, ‘Falling’ = −1). The peak epoch corresponds to an epoch of highest expression, where epochs correspond to the developmental period defined previously⁷⁹. Each CNV was also characterized by the number of genes, for which expressions in the brain were labelled as ‘High’ or ‘Elevated’ according to the GTEx resource (https://www.gtexportal.org). Finally, we quantified how many genes in each CNV were previously associated with autism spectrum disorder⁸⁰, schizophrenia⁸¹ and any disorder by either rare or common variation⁷⁸. Additional analysis using probability of loss intolerance (pLI) as another measure of functional intolerance of CNVs is provided in Supplementary Fig. 5. The similarity of the seven annotations is summarized in Supplementary Fig. 8.

As part of our sensitivity analyses, we compiled a list of 85 CNVs previously proposed to be pathogenic^{10,44,82,83,84} (sum of 1/LOEUF for each gene encompassed in CNV ≥6 or inclusion in ClinGen resource; https://clinicalgenome.org). Regional coordinates are available elsewhere¹⁹. CNV was defined as recurrent if it overlapped by ≥50% with one of the 85 CNVs and/or included the key genes of corresponding region (see details for each recurrent CNV in Supplementary Table 3). This 50% threshold has been shown to provide excellent sensitivity and specificity to detect recurrent CNVs¹⁹.

Detailed profiling of behavioural and cognitive data

We analysed a rich battery of 1,319 cognitive, sociodemographic and environmental data from 11,879 participants partially reported in previous research²⁴. In line with previous research²⁴, we used robust z-scores for the preprocessing of each phenotype. The robust z-scores were derived by calculating each phenotype’s absolute deviation from the median absolute deviation (MAD)⁸⁵. In other words, the resulting score indicates how many standard deviations each value deviates from the median, with robustness to outliers. Subsequently, we removed values with a z-score >4. We then excluded phenotypes with <90% retained values before excluding participants with <90% retained values across the retained phenotypes. The remaining participants (n = 11,618) were considered for further analysis. The complete list of 962 phenotypes from 20 predefined categories included in the analysis is available in Supplementary Table 4. These categories were defined by the NIH and can be found online (https://nda.nih.gov/general-query.html?). To avoid potential confusion, we adjusted names of three categories as follows: Questionnaires to Parental Questionnaire, Summary to Mental Health Summary, Diagnosis to Diagnosis (K-SADS), where K-SADS stands for the Kiddie Schedule for Affective Disorders and Schizophrenia interview. As the last step, for the purpose of data analysis, missing values were imputed using the KNNImputer function (n_neighbors = 5, weights = ‘uniform’) in the scikit-learn package. All derived phenotypic measures were then adjusted for variation that can be explained by age and sex.

MRI imaging-derived phenotypes

Our data sample included expert-curated brain-imaging phenotypes of grey matter morphology. The images were acquired across 21 sites in the United States with harmonized imaging protocols for GE, Philips and Siemens scanners⁸⁶. We used baseline structural T1-weighted tabulated MRI data from ABCD curated release 4.0. We only included participants who passed quality assurance using the recommended QC parameters (n = 11,723) described in the ABCD 4.0 Imaging Instruments Release Notes. ABCD preprocessing and QC steps are described in detail in the methodological reference for the ABCD study⁸⁶.

The downloaded tabulated brain-imaging phenotypes were guided by the topographical brain region definitions based on the Destrieux parcellation atlas⁸⁷. This feature-generation step provides neurobiologically interpretable measures of grey matter volume in 148 regions. For each included regional volume, we calculated the MAD for each brain region and removed values with MAD >4 (ref. ²⁴). Participants with <90% of regional volume retained in any region were excluded from the analysis. The remaining participants (n = 11,681) were included for further analysis. Finally, interindividual variations in the volumes that could be explained by age, sex, total brain volume and scanning site were regressed out.

To analyse temporal changes in brain structure, we also acquired structural T1-weighted tabulated MRI data during the follow-up 2 years after the first MRI recording. Brain-imaging data from this second timepoint underwent the same cleaning steps as the baseline brain-imaging data. In total, follow-up brain measurements were available for 3,715 participants (2,608 controls, 317 deletion carriers, 790 duplication carriers).

Multivariate pattern analysis protocol

After rigorous quality control of brain, behaviour and genetic data sources, we analysed a total of 7,338 participants that met all established criteria, ensuring robust data integrity across each measurement domain. As a first data preparation step, each brain and behaviour measurement was normalized (z-scored) to ensure comparability across different scales. The normalized measurements were then submitted to principal component analysis (PCA). PCA is known to be robust to noisy, sparse and mixed-valued data, which makes it particularly effective in enhancing the stability of subsequent analyses by addressing rank deficiency and minimizing noise. Furthermore, PCA is ideally suited for mixed data types, as it seamlessly converts continuous, binary and otherwise categorical variables into continuous components, allowing for efficient dimensionality reduction while preserving the essential variability in the dataset⁸⁸. On the basis of a thorough examination, we extracted the first 50 PCA components for both the brain and behaviour measurements (Supplementary Fig. 9). Notably, even though our primary analysis utilized regional volumes as indicators of brain structure, we found that the behavioural loadings were similar when using regional thickness and area. Furthermore, the brain loadings from our first and third modes were similar to those obtained using brain area, while the brain loadings from our second mode closely matched those derived from thickness (Supplementary Fig. 10).

After initial data cleaning, we focused on participants without CNVs to identify modes of covariation representative of the general population. The first cleaned input dataset included regional brain volumes (5,314 × 50 matrix), and the second comprised behavioural measures (5,314 × 50 matrix). To uncover multivariate relationships between these high-dimensional datasets, we applied canonical PLS analysis—a method well suited for identifying latent structures that maximize covariance between two variable sets⁸⁹. In other words, PLS identifies canonical modes by finding linear combinations of brain and behavioural variables that co-vary most strongly, solving the generalized eigenvalue problem of their cross-covariance matrix. The resulting participant-level projections are referred to as brain scores and behaviour scores throughout the paper. These scores represent the expression of each latent mode in each individual. The model produces k orthogonal modes, ordered by the amount of brain–behaviour covariance they explain. The first (that is, the dominant) and strongest mode explained the largest fraction of covariance between brain and behaviour measurements, while subsequent modes account for residual covariation unexplained by the preceding ones. PLS draws similarities with canonical correlation analysis (CCA). However, CCA can be prone to instability⁹⁰. Nevertheless, our obtained PLS solutions strongly resemble those obtained with CCA (Supplementary Fig. 11).

Contribution of original phenotypes to latent variables

To quantify the contribution of each regional volume and behavioural measure to the construction of the latent population mode, we computed PLS loadings as Pearson’s correlation between a respective PLS score and the original measurement across participants. The thus obtained loadings indicate the strength and direction of the relationship between the original phenotype and the identified PLS score. Stronger loading values signify greater importance in contributing to the latent structures, offering insights into which variables drive the covariation patterns. All brain and behaviour loadings are available in Supplementary Table 1.

To assess the significance of phenotype contributions beyond chance, we employed a bootstrapping strategy. In each iteration, a perturbed version of the dataset was generated by resampling control participants with replacement, maintaining the original sample size. This procedure was repeated 1,000 times, ensuring consistent ordering and orientation of the PLS modes across iterations. Each bootstrap iteration yielded a new realization of the full analysis pipeline, resulting in 1,000 trained PLS models and corresponding sets of PLS coefficients. A coefficient was considered statistically robust if its two-sided 95% confidence interval—based on the bootstrap distribution (2.5th to 97.5th percentile)—did not include zero, indicating a consistent and significant contribution across resamples.

We used the BrainStat toolbox⁹¹ to contextualize obtained patterns with respect to large-scale brain networks based on the Schaefer–Yeo definition⁹². Specifically, we mapped the brain loadings from the 148 regions to ‘fsaverage5’ vertices. We then computed the average absolute loading in each of the seven resting-state brain networks.

Optimal number of PLS dimensions in the general population

Each identified PLS mode was submitted to statistical significance tests of robustness consistent with an established combination of cross-validation and permutation testing²⁴ (Supplementary Fig. 1). Initially, controls were split into 10 folds, where 9 folds of participants were used as a train set, and 1 fold was used as a test set. The splitting into train and test set was stratified on the basis of genetic relatedness, ensuring that siblings were kept together to avoid scenarios where one sibling is used in training while the other sibling is in the test set. To identify siblings, we used the KING protocol (Kinship-based INference for Gwas) to generate kinship coefficients, which allowed us to define two groups: non-siblings (<0.177 kinship coefficients) and siblings (>0.177 kinship coefficients). The controls in the training set were used to estimate the parameters of all subsequent tools. In the first step, each brain or behaviour measurement was z-scored column-wise across all controls in the training set. PCA then separately reduced the dimension of brain and behaviour measurements to 50 features. In the next step, the preprocessed behaviour and brain measures were used as input variables to estimate a single canonical PLS model, where the output of the model is a set of scores (latent variables). This PLS model can also be characterized by weights (projection matrices used to transform input variables). These training-PLS weights were back projected using the PCA model to obtain brain and behaviour weights in the original non-reduced ambient space.

In the next step, brain and behaviour scores were computed for controls from the test set. Specifically, z-scoring followed by PCA dimensionality reduction was applied with parameters learned using the training set. The resulting preprocessed measurements were multiplied by the original training-PLS weights to obtain PLS scores for test-sample controls. Finally, the covariance between brain and behaviour scores was calculated for each canonical mode. We took the average of these canonical covariances across the 10 folds. This procedure was repeated 100 times with a random fold split of controls to obtain a distribution of out-of-sample covariances for each PLS mode.

To assess the statistical significance of the resulting PLS modes, we ran 1,000 iterations of the same 10-fold cross-validation procedure described above, where the order of participants of the brain measurements was randomly permuted in each iteration. In contrast to the unpermuted dataset, we collected covariances for the training rather than the testing participants to account for overfitting by the PLS. In other words, using covariance from the permuted train set, and not the test set, represents a more stringent criterion. Finally, P values for each of the PLS modes were calculated as a percentage of cases when permuted covariance was greater than the mean cross-validated covariance.

Group differences in brain–behaviour pattern expression

We developed a pipeline quantifying the differences in brain and behaviour scores between controls and CNV carriers for the identified PLS modes (Supplementary Fig. 2). Initially, participants without any CNV were split into a training set (90%) and a test set (10%) with stratification based on genetic relatedness to ensure proper grouping. As outlined above, brain and behaviour scores for controls in the test set were calculated using parameters learned from the training set. Importantly, the same PLS model was applied to CNV carriers (both deletion and duplication), allowing us to compute their brain and behaviour scores without re-estimating the PLS model. In other words, the PLS model trained on the training control group was used to analogously analyse the brain and behaviour data of CNV carriers, ensuring consistency in the interpretation of brain–behaviour associations across groups. More specifically, z-scoring followed by PCA dimensionality reduction was applied with parameters learned using the training set of controls. The resulting preprocessed measurements were multiplied by the precalculated PLS weights to obtain PLS scores for CNV carriers. Finally, we calculated the average across the 10 folds in differences between out-of-sample controls and CNV carriers for both brain and behaviour scores in each canonical mode. This procedure was repeated 1,000 times with a random 90:10 split of controls to obtain a distribution of PLS score differences. Finally, P values for each of the PLS modes were calculated based on the percentage of cases when the difference between mean scores of CNV carriers and mean scores of out-of-sample controls was greater than zero (respectively lower for modes with negative mean expression). Resulting P values were adjusted across all modes using FDR correction to control for multiple comparisons.

Temporal shift in brain pattern expressions

A total of 3,715 participants passed the quality control of genetic data and had brain recordings measured at the baseline and 2-year follow-up. The majority of behavioural phenotypes were not available for the second timepoint Therefore, rather than training a new PLS model, which would not be directly comparable to the original, we focused on assessing longitudinal changes in brain structure using the original model⁹³. Specifically, we applied the PLS model trained on baseline brain and behavioural data to the follow-up brain measurements. This allowed us to compute brain scores at follow-up for each participant and for each PLS mode by projecting the follow-up brain data onto the baseline-derived brain loadings. In essence, we evaluated how each participant’s follow-up brain measurements aligned with the brain–behaviour patterns identified at baseline.

The aim of this analysis was to explore whether the rate of change in brain scores differs between controls, duplication carriers and deletion carriers. We employed a linear mixed-effects model for each PLS mode and for each CNV group separately. This approach accounted for repeated measurements at two timepoints. For each PLS mode, we constructed a mixed-effects model with the brain scores as the dependent variable. The fixed effects included time T (coded as 0 for baseline and 1 for follow-up), CNV group (controls as 0, duplication/deletion carriers as 1, with the control group serving as the reference category), and the interaction between time and CNV group. The interaction term was included to assess whether the rate of change in brain scores over time differed between the control group and the CNV carrier groups. Random intercept and slope terms were included to capture variability in baseline levels and time effects across individuals. For each PLS mode and each CNV group, the model was formulated as follows:

$${{\rm{brain}}\; {\rm{score}}}_{{ij}}={\beta }_{0}+{\beta }_{1}{T}_{{ij}}+{\beta }_{2}{\rm{CNV}}+{\beta }_{3}\left({T}_{{ij}}\times{\rm{CNV}}\right)+{u}_{0i}+{u}_{1i}{T}_{{ij}}+{\varepsilon }_{{ij}},$$

(1)

where β₀ is the intercept, β₁ captures the fixed effect of time, β₂ represents the fixed group effects, β₃ is the interaction effect between time and CNV carrier status, ${u}_{0i}$ represents the random intercept for time for participant i, u_1i is the random slope for time for participant i, and ${\varepsilon }_{{ij}}$ is the residual error term for participant i at timepoint j.

A significant interaction term ${\beta }_{3}$ would suggest that the rate of change in brain scores differed between the control group and one or both CNV carrier groups in a given PLS mode.

Effect size of CNV carriership

We used Cohen’s d to quantify the effect size of the CNVs on revealed PLS modes. For a given mode and separately for brain and behaviour, Cohen’s d was calculated as:

$$d=\frac{\underline{{x}_{1}}-\underline{{x}_{2}}}{\sqrt{\frac{{s}_{1}^{2}+{s}_{2}^{2}}{2}}},$$

(2)

where $\underline{{x}_{1}}$ corresponds to the mean PLS score across CNV carriers, and $\underline{{x}_{2}}$ corresponds to the mean PLS score across controls. Similarly, s1 and s2 correspond to standard deviations of PLS scores of CNV carriers and controls.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data supporting the findings of this study are available from the Adolescent Brain Cognitive DevelopmentSM Study (ABCD Study) dataset (https://abcdstudy.org) release 4.0. The ABCD Study is a publicly available resource accessible through the National Institute of Mental Health Data Archive (NDA). All relevant instructions to obtain the data can be found online (https://nda.nih.gov/abcd/request-access).

All protocols for the ABCD Study are approved by a central Institutional Review Board (cIRB) at the University of California, San Diego, for the ethical review and approval of the research protocol, with a few sites obtaining local IRB approval (https://www.sciencedirect.com/science/article/pii/S1878929317302268#sec0040). Caregivers have provided written, informed consent and children provided verbal consent to all research protocols in accordance with US Department of Health and Human Services (HHS) regulations. Source data are provided with this paper.

Code availability

The processing scripts and custom analysis software used in this work are available in a publicly accessible GitHub repository archived on Zenodo⁹⁴, along with examples of key visualizations in the paper (https://github.com/jakubkopal/CNV-covariance). The whole pipeline was written in Python (v.3.12.0), with the following external packages: scikit-learn (v.1.3.1), numpy (v.1.26.4), scipy (1.13.1), pandas (v.2.1.1), nilearn (.0.10.2), Python (v.v3.8.12), statsmodels (0.14.0), BrainStat (0.4.2). All plots were created with matplotlib (v.3.8.0) and seaborn (v.0.13.0). Figures were post processed with Inkscape (1.4).

References

Thompson, P. M. et al. Genetic influences on brain structure. Nat. Neurosci. 4, 1253–1258 (2001).
Article CAS PubMed Google Scholar
Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 19, 110–124 (2018).
Article CAS PubMed Google Scholar
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
Article CAS PubMed PubMed Central Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Article CAS PubMed PubMed Central Google Scholar
Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
Article CAS PubMed Google Scholar
Freeman, J. L. et al. Copy number variation: new insights in genome diversity. Genome Res. 16, 949–961 (2006).
Article CAS PubMed Google Scholar
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gudmundsson, O. O. et al. Attention-deficit hyperactivity disorder shares copy number variant risk with schizophrenia and autism spectrum disorder. Transl. Psychiatry 9, 258 (2019).
Article PubMed PubMed Central Google Scholar
Chawner, S. J. R. A. et al. Genotype–phenotype associations in children with copy number variants associated with high neuropsychiatric risk in the UK (IMAGINE-ID): a case-control cohort study. Lancet Psychiatry 6, 493–505 (2019).
Article PubMed Google Scholar
Marshall, C. R. et al. Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321 subjects. Nat. Genet. 49, 27–35 (2017).
Article CAS PubMed Google Scholar
Green, E. K. et al. Copy number variation in bipolar disorder. Mol. Psychiatry 21, 89–93 (2016).
Article CAS PubMed Google Scholar
Kendall, K. M. et al. Association of rare copy number variants with risk of depression. JAMA Psychiatry 76, 818–825 (2019).
Article PubMed PubMed Central Google Scholar
Auwerx, C. et al. The individual and global impact of copy-number variants on complex human traits. Am. J. Hum. Genet. https://doi.org/10.1016/j.ajhg.2022.02.010 (2022).
Kopal, J. et al. Rare CNVs and phenome-wide profiling highlight brain structural divergence and phenotypical convergence. Nat. Hum. Behav. 7, 1001–1017 (2023).
Article PubMed PubMed Central Google Scholar
Sønderby, I. E. et al. Effects of copy number variations on brain structure and risk for psychiatric illness: large-scale studies from the ENIGMA working groups on CNVs. Hum. Brain Mapp. 43, 300–328 (2022).
Article PubMed Google Scholar
Moreau, C. A., Ching, C. R., Kumar, K., Jacquemont, S. & Bearden, C. E. Structural and functional brain alterations revealed by neuroimaging in CNV carriers. Curr. Opin. Genet. Dev. 68, 88–98 (2021).
Article CAS PubMed PubMed Central Google Scholar
Raznahan, A., Won, H., Glahn, D. C. & Jacquemont, S. Convergence and divergence of rare genetic disorders on brain phenotypes. JAMA Psychiatry 79, 818–828 (2022).
Article PubMed Google Scholar
Modenato, C. et al. Lessons learned from neuroimaging studies of copy number variants: a systematic review. Biol. Psychiatry 90, 596–610 (2021).
Article PubMed Google Scholar
Huguet, G. et al. Measuring and estimating the effect sizes of copy number variants on general intelligence in community-based samples. JAMA Psychiatry 75, 447–457 (2018).
Article PubMed PubMed Central Google Scholar
Huguet, G. et al. Genome-wide analysis of gene dosage in 24,092 individuals estimates that 10,000 genes modulate cognitive ability. Mol. Psychiatry 26, 2663–2676 (2021).
Article CAS PubMed PubMed Central Google Scholar
Fuhrmann, D., Knoll, L. J. & Blakemore, S.-J. Adolescence as a sensitive period of brain development. Trends Cogn. Sci. 19, 558–566 (2015).
Article PubMed Google Scholar
Tost, H., Champagne, F. A. & Meyer-Lindenberg, A. Environmental influence in the brain, human welfare and mental health. Nat. Neurosci. 18, 1421–1431 (2015).
Article CAS PubMed Google Scholar
Paus, T., Keshavan, M. & Giedd, J. N. Why do many psychiatric disorders emerge during adolescence? Nat. Rev. Neurosci. 9, 947–957 (2008).
Article CAS PubMed PubMed Central Google Scholar
Alnæs, D., Kaufmann, T., Marquand, A. F., Smith, S. M. & Westlye, L. T. Patterns of sociocognitive stratification and perinatal risk in the child brain. Proc. Natl Acad. Sci. USA 117, 12419–12427 (2020).
Article PubMed PubMed Central Google Scholar
Dahl, R. E., Allen, N. B., Wilbrecht, L. & Suleiman, A. B. Importance of investing in adolescence from a developmental science perspective. Nature 554, 441–450 (2018).
Article CAS PubMed Google Scholar
Millan, M. J. et al. Altering the course of schizophrenia: progress and perspectives. Nat. Rev. Drug Discov. 15, 485–515 (2016).
Article CAS PubMed Google Scholar
Molloy, C. J., Quigley, C., McNicholas, Á., Lisanti, L. & Gallagher, L. A review of the cognitive impact of neurodevelopmental and neuropsychiatric associated copy number variants. Transl. Psychiatry 13, 116 (2023).
Article PubMed PubMed Central Google Scholar
Casey, B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32, 43–54 (2018).
Article CAS PubMed PubMed Central Google Scholar
Smith, S. M. et al. A positive-negative mode of population covariation links brain connectivity, demographics and behavior. Nat. Neurosci. 18, 1565–1567 (2015).
Article CAS PubMed PubMed Central Google Scholar
Stalnaker, T. A., Cooch, N. K. & Schoenbaum, G. What the orbitofrontal cortex does not do. Nat. Neurosci. 18, 620–627 (2015).
Article CAS PubMed PubMed Central Google Scholar
Modenato, C. et al. Effects of eight neuropsychiatric copy number variants on human brain structure. Transl. Psychiatry 11, 399 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hartwigsen, G. et al. Phonological decisions require both the left and right supramarginal gyri. Proc. Natl Acad. Sci. USA 107, 16494–16499 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mechelli, A., Humphreys, G. W., Mayall, K., Olson, A. & Price, C. J. Differential effects of word length and visual contrast in the fusiform and lingual gyri during reading. Proc. Biol. Sci. 267, 1909–1913 (2000).
Article CAS PubMed PubMed Central Google Scholar
Vossel, S., Geng, J. J. & Fink, G. R. Dorsal and ventral attention systems. Neuroscientist 20, 150–159 (2014).
Article PubMed Google Scholar
Menon, V. 20 years of the default mode network: a review and synthesis. Neuron 111, 2469–2487 (2023).
Article CAS PubMed PubMed Central Google Scholar
ABCD Study. ABCD General Data https://wiki.abcdstudy.org/release-notes/non-imaging/general.html (2025).
Kopal, J., Uddin, L. Q. & Bzdok, D. The end game: respecting major sources of population diversity. Nat. Methods 20, 1122–1128 (2023).
Article CAS PubMed Google Scholar
Kopal, J. et al. Using rare genetic mutations to revisit structural brain asymmetry. Nat. Commun. 15, 2639 (2024).
Bzdok, D. Classical statistics and statistical learning in imaging neuroscience. Front. Neurosci. 11, 543 (2017).
Smith, S. M. & Nichols, T. E. Statistical challenges in ‘big data’ human neuroimaging. Neuron 97, 263–268 (2018).
Article CAS PubMed Google Scholar
Bzdok, D., Nichols, T. E. & Smith, S. M. Towards algorithmic analytics for large-scale datasets. Nat. Mach. Intell. 1, 296–306 (2019).
Article PubMed PubMed Central Google Scholar
Bzdok, D. & Ioannidis, J. P. A. Exploration, inference, and prediction in neuroscience and biomedicine. Trends Neurosci. 42, 251–262 (2019).
Article CAS PubMed Google Scholar
Nicolaisen-Sobesky, E. et al. A cross-cohort replicable and heritable latent dimension linking behaviour to multi-featured brain structure. Commun. Biol. 5, 1297 (2022).
Article PubMed PubMed Central Google Scholar
Cooper, G. M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kim, S. H. et al. Language characterization in 16p11.2 deletion and duplication syndromes. Am. J. Med. Genet. B 183, 380–391 (2020).
Article CAS Google Scholar
Kendall, K. M. et al. Cognitive performance and functional outcomes of carriers of pathogenic copy number variants: analysis of the UK Biobank. Br. J. Psychiatry 214, 297–304 (2019).
Article PubMed PubMed Central Google Scholar
Mollon, J., Almasy, L., Jacquemont, S. & Glahn, D. C. The contribution of copy number variants to psychiatric symptoms and cognitive ability. Mol. Psychiatry https://doi.org/10.1038/s41380-023-01978-4 (2023).
Wang, H.-T. et al. Finding the needle in a high-dimensional haystack: canonical correlation analysis for neuroscientists. NeuroImage 216, 116745 (2020).
Article PubMed Google Scholar
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40, 1166–1174 (2008).
Article CAS PubMed Google Scholar
Morris, T. T., Davies, N. M., Hemani, G. & Smith, G. D. Population phenomena inflate genetic associations of complex social traits. Sci. Adv. 6, eaay0328 (2020).
Article CAS PubMed PubMed Central Google Scholar
Benkarim, O. et al. Population heterogeneity in clinical cohorts affects the predictive accuracy of brain imaging. PLoS Biol. 20, e3001627 (2022).
Article CAS PubMed PubMed Central Google Scholar
Glasser, M. F. & Van Essen, D. C. Mapping human cortical areas in vivo based on myelin content as revealed by T1- and T2-weighted MRI. J. Neurosci. 31, 11597–11616 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gogtay, N. et al. Dynamic mapping of human cortical development during childhood through early adulthood. Proc. Natl Acad. Sci. USA 101, 8174–8179 (2004).
Article CAS PubMed PubMed Central Google Scholar
Schulz, A. & Northridge, M. E. Social determinants of health: implications for environmental health promotion. Health Educ. Behav. 31, 455–471 (2004).
Article PubMed Google Scholar
Pickett, K. E. & Pearl, M. Multilevel analyses of neighbourhood socioeconomic context and health outcomes: a critical review. J. Epidemiol. Community Health 55, 111–122 (2001).
Article CAS PubMed PubMed Central Google Scholar
Robinette, J. W., Charles, S. T. & Gruenewald, T. L. Neighborhood socioeconomic status and health: a longitudinal analysis. J. Community Health 42, 865–871 (2017).
Article PubMed PubMed Central Google Scholar
Robinette, J. W., Piazza, J. R. & Stawski, R. S. Neighborhood safety concerns and daily well-being: a national diary study. Wellbeing Space Soc. 2, 100047 (2021).
Article PubMed PubMed Central Google Scholar
Sun, D. et al. Large-scale mapping of cortical alterations in 22q11.2 deletion syndrome: convergence with idiopathic psychosis and effects of deletion size. Mol. Psychiatry 25, 1822–1834 (2020).
Article CAS PubMed Google Scholar
Martin-Brevet, S. et al. Quantifying the effects of 16p11.2 copy number variants on brain structure: a multisite genetic-first study. Biol. Psychiatry 84, 253–264 (2018).
Article CAS PubMed Google Scholar
Ulfarsson, M. O. et al. 15q11.2 CNV affects cognitive, structural and functional correlates of dyslexia and dyscalculia. Transl. Psychiatry 7, e1109 (2017).
Article CAS PubMed PubMed Central Google Scholar
Couvy-Duchesne, B. et al. Lingual gyrus surface area is associated with anxiety-depression severity in young adults: a genetic clustering approach. eNeuro 5, ENEURO.0153-17.2017 (2018).
Article PubMed PubMed Central Google Scholar
Japee, S., Holiday, K., Satyshur, M. D., Mukai, I. & Ungerleider, L. G. A role of right middle frontal gyrus in reorienting of attention: a case study. Front. Syst. Neurosci. 9, 23 (2015).
Article PubMed PubMed Central Google Scholar
Alexander-Bloch, A. et al. Copy number variant risk scores associated with cognition, psychopathology, and brain structure in youths in the Philadelphia Neurodevelopmental Cohort. JAMA Psychiatry 79, 699–709 (2022).
Article PubMed PubMed Central Google Scholar
Cleynen, I. et al. Genetic contributors to risk of schizophrenia in the presence of a 22q11.2 deletion. Mol. Psychiatry 26, 4496–4510 (2021).
Article CAS PubMed Google Scholar
Kirov, G. et al. The penetrance of copy number variations for schizophrenia and developmental delay. Biol. Psychiatry 75, 378–385 (2014).
Article CAS PubMed Google Scholar
Volkow, N. D. et al. The conception of the ABCD study: from substance use to a broad NIH collaboration. Dev. Cogn. Neurosci. 32, 4–7 (2018).
Article PubMed Google Scholar
Auchter, A. M. et al. A description of the ABCD organizational structure and communication framework. Dev. Cogn. Neurosci. 32, 8–15 (2018).
Article PubMed PubMed Central Google Scholar
Paul, S. E. et al. Associations between prenatal cannabis exposure and childhood outcomes: results from the ABCD study. JAMA Psychiatry 78, 64–76 (2021).
Article PubMed PubMed Central Google Scholar
Garavan, H. et al. Recruiting the ABCD sample: design considerations and procedures. Dev. Cogn. Neurosci. 32, 16–22 (2018).
Article CAS PubMed PubMed Central Google Scholar
Fan, C. C., Loughnan, R., Wilson, S., Hewitt, J. K. & ABCD Genetic Working Group. Genotype data and derived genetic instruments of Adolescent Brain Cognitive Development Study® for better understanding of human brain development. Behav. Genet. 53, 159–168 (2023).
Article PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Douard, E. et al. Effect sizes of deletions and duplications on autism risk across the genome. Am. J. Psychiatry 178, 87–98 (2021).
Article PubMed Google Scholar
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Article CAS PubMed PubMed Central Google Scholar
Colella, S. et al. QuantiSNP: an objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007).
Article CAS PubMed PubMed Central Google Scholar
Sanders, S. J. et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Werling, D. M. et al. Whole-genome and RNA sequencing reveal variation and transcriptomic coordination in the developing human prefrontal cortex. Cell Rep. 31, 107489 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).
Article CAS PubMed PubMed Central Google Scholar
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e23 (2020).
Article CAS PubMed PubMed Central Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article Google Scholar
Coe, B. P. et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat. Genet. 46, 1063–1071 (2014).
Article CAS PubMed PubMed Central Google Scholar
Stefansson, H. et al. CNVs conferring risk of autism or schizophrenia affect cognition in controls. Nature 505, 361–366 (2014).
Article CAS PubMed Google Scholar
Moreno-De-Luca, D. et al. Using large clinical data sets to infer pathogenicity for rare copy number variants in autism cohorts. Mol. Psychiatry 18, 1090–1095 (2013).
Article CAS PubMed Google Scholar
Leys, C., Ley, C., Klein, O., Bernard, P. & Licata, L. Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 49, 764–766 (2013).
Article Google Scholar
Hagler, D. J. et al. Image processing and analysis methods for the Adolescent Brain Cognitive Development Study. NeuroImage 202, 116091 (2019).
Article CAS PubMed PubMed Central Google Scholar
Destrieux, C., Fischl, B., Dale, A. & Halgren, E. Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage 53, 1–15 (2010).
Article PubMed PubMed Central Google Scholar
Agarwal, A., Shah, D., Shen, D. & Song, D. in Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 9893–9903 (Curran Associates, 2019).
Wegelin, J. A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case Technical Report 371 (Univ. Washington Statistics Department, 2000).
Mihalik, A. et al. Canonical correlation analysis and partial least squares for identifying brain–behavior associations: a tutorial and a comparative study. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 7, 1055–1067 (2022).
PubMed Google Scholar
Larivière, S. et al. BrainStat: a toolbox for brain-wide statistics and multimodal feature associations. NeuroImage 266, 119807 (2023).
Article PubMed Google Scholar
Schaefer, A. et al. Local–global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).
Article PubMed PubMed Central Google Scholar
Kopal, J. & Bzdok, D. Endorsing complexity through diversity: computational psychiatry meets big data analytics. Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2022.07.023 (2022).
Kopal, J. CNV-covariance. Source code. Zenodo https://doi.org/10.5281/zenodo.15229008 (2025).

Download references

Acknowledgements

D.B. was supported by the Brain Canada Foundation, through the Canada Brain Research Fund, with the financial support of Health Canada, National Institutes of Health (NIH R01 AG068563A, NIH R01 DA053301-01A1, NIH R01MH129858-01A1), the Canadian Institute of Health Research (CIHR 438531, CIHR 470425), the Healthy Brains Healthy Lives initiative (Canada First Research Excellence fund), the IVADO R3AI initiative (Canada First Research Excellence fund), and by the CIFAR Artificial Intelligence Chairs programme (Canada Institute for Advanced Research). This research was also supported by Calcul Quebec (http://www.calculquebec.ca) and Compute Canada (http://www.computecanada.ca), the Brain Canada Multi-Investigator initiative, the Canadian Institutes of Health Research, CIHR_400528, the Institute of Data Valorization (IVADO) through the Canada First Research Excellence Fund, Healthy Brains for Healthy Lives through the Canada First Research Excellence Fund. S.J. is a recipient of a Canada Research Chair in neurodevelopmental disorders and a chair from the Jeanne et Jean Louis Levesque Foundation. K.K. was supported by the Institute of Data Valorization (IVADO) Postdoctoral Fellowship programme through the Canada First Research Excellence Fund. B.T.T.Y. was supported by the NUS Yong Loo Lin School of Medicine (NUHSRO/2020/124/TMR/LOA), the Singapore National Medical Research Council (NMRC) LCG (OFLCG19May-0035), NMRC CTG-IIT (CTGIIT23jan-0001), NMRC OF-IRG (OFIRG24jan-0006), NMRC STaR (STaR20nov-0003), Singapore Ministry of Health (MOH) Centre Grant (CG21APR1009), and the United States National Institutes of Health (R01MH120080, R01MH133334). C.E.B. was supported by the National Institutes of Health (R37MH085953, R01MH129858) and the Simons Foundation (SFI-AN-AR-HUMAN-00004264-4). X.J.C. was supported by the Canada Research Chairs Program, the Canadian Institute of Health Research (CIHR-PAU-185726), and the Healthy Brains Healthy Lives initiative (Canada First Research Excellence fund). O.A.A. was supported by Nordforsk (#164218), the Research Council of Norway (#324499), an EU H2020 RIA grant (#964874 REALMENT), and NIH (R01MH129858). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Data used in the preparation of this article were obtained from the Adolescent Brain Cognitive DevelopmentSM (ABCD) Study (https://abcdstudy.org), held in the NIMH Data Archive (NDA). The ABCD Study is supported by the National Institutes of Health and additional federal partners under award numbers: U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, U24DA041147. A full list of supporters is available at https://abcdstudy.org/federal-partners.html. A listing of participating sites and a complete listing of the study investigators can be found at https://abcdstudy.org/consortium_members/. ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in the analysis or writing of this report. This paper reflects the views of the authors and may not reflect the opinions or views of the NIH or ABCD consortium investigators.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
Jakub Kopal, Justin Marotta, Shambhavi Aggarwal, Nicole Osayande & Danilo Bzdok
Mila - Quebec Artificial Intelligence Institute, Montreal, Quebec, Canada
Jakub Kopal, Justin Marotta, Shambhavi Aggarwal, Nicole Osayande & Danilo Bzdok
Centre for Precision Psychiatry, Division of Mental Health and Addiction, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
Jakub Kopal & Ole A. Andreassen
Centre de recherche CHU Sainte-Justine, Montreal, Quebec, Canada
Guillaume Huguet, Kuldeep Kumar, Zohra Saci, Martineau Jean-Louis & Sébastien Jacquemont
Department of Neurology and Neurosurgery, McGill University, Montreal, Quebec, Canada
Xiaoqian J. Chai
TheNeuro - Montreal Neurological Institute (MNI), McConnell Brain Imaging Centre, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
Xiaoqian J. Chai & Danilo Bzdok
Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Tian Ge
Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
B. T. Thomas Yeo
Centre for Sleep and Cognition, and Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, Singapore, Singapore
B. T. Thomas Yeo
N.1 Institute for Health and Institute for Digital Medicine, National University of Singapore, Singapore, Singapore
B. T. Thomas Yeo
Imaging Genetics Center, Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, Los Angeles, CA, USA
Paul M. Thompson
Semel Institute for Neuroscience and Human Behavior, Departments of Psychiatry and Biobehavioral Sciences and Psychology, UCLA, Los Angeles, CA, USA
Carrie E. Bearden
Section for Precision Psychiatry, Division of Mental Health and Addiction, Oslo University Hospital, Oslo, Norway
Ole A. Andreassen
Department of Pediatrics, University of Montreal, Montreal, Quebec, Canada
Sébastien Jacquemont

Authors

Jakub Kopal
View author publications
Search author on:PubMed Google Scholar
Guillaume Huguet
View author publications
Search author on:PubMed Google Scholar
Justin Marotta
View author publications
Search author on:PubMed Google Scholar
Shambhavi Aggarwal
View author publications
Search author on:PubMed Google Scholar
Nicole Osayande
View author publications
Search author on:PubMed Google Scholar
Kuldeep Kumar
View author publications
Search author on:PubMed Google Scholar
Zohra Saci
View author publications
Search author on:PubMed Google Scholar
Martineau Jean-Louis
View author publications
Search author on:PubMed Google Scholar
Xiaoqian J. Chai
View author publications
Search author on:PubMed Google Scholar
Tian Ge
View author publications
Search author on:PubMed Google Scholar
B. T. Thomas Yeo
View author publications
Search author on:PubMed Google Scholar
Paul M. Thompson
View author publications
Search author on:PubMed Google Scholar
Carrie E. Bearden
View author publications
Search author on:PubMed Google Scholar
Ole A. Andreassen
View author publications
Search author on:PubMed Google Scholar
Sébastien Jacquemont
View author publications
Search author on:PubMed Google Scholar
Danilo Bzdok
View author publications
Search author on:PubMed Google Scholar

Contributions

D.B. and J.K. designed the study, analysed brain and behavioural data, and drafted the paper. S.J., G.H., Z.S. and M.J.-L. called CNVs. D.B. and S.J. contributed to the interpretation of the results and the editing of the paper. All authors provided feedback on the paper. D.B. led data analytics.

Corresponding author

Correspondence to Danilo Bzdok.

Ethics declarations

Competing interests

D.B. is a shareholder and advisory board member at MindState Design Labs, USA. O.A.A. is a consultant to Cortechs.ai and Precision Health, and received speaker’s honoraria from Lundbeck, Janssen, Lilly and BMS. P.M.T. obtained grant support from Biogen, Inc., for research unrelated to this paper. B.T.T.Y. is a shareholder and advisor to B1neuro. The content in this paper is unrelated to B1neuro’s activities. The other authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–11 and Tables 1–4.

Reporting Summary

Source data

Source Data Fig. 1

Dataset description, canonical PLS illustration.

Source Data Fig. 2

First PLS mode illustration.

Source Data Fig. 3

Second PLS mode illustration.

Source Data Fig. 4

Third PLS mode illustration.

Source Data Fig. 5

Association of PLS modes with gene characteristics and sociodemographic factors.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kopal, J., Huguet, G., Marotta, J. et al. A pattern-learning algorithm associates copy number variations with brain structure and behavioural variables in an adolescent population cohort. Nat. Biomed. Eng 10, 354–369 (2026). https://doi.org/10.1038/s41551-025-01454-0

Download citation

Received: 15 May 2024
Accepted: 09 June 2025
Published: 18 July 2025
Version of record: 18 July 2025
Issue date: February 2026
DOI: https://doi.org/10.1038/s41551-025-01454-0