Deleterious coding variation associated with autism is shared across ancestries

Natividad Avila, Marina; Jung, Seulgi; Satterstrom, F. Kyle; Fu, Jack M.; Levy, Tess; Sloofman, Laura G.; Klei, Lambertus; Pichardo, Thariana; Marquez, Dalia; Stevens, Christine R.; Cusick, Caroline M.; Ames, Jennifer L.; Campos, Gabriele S.; Cerros, Hilda; Chaskel, Roberto; Costa, Claudia I. S.; Cuccaro, Michael L.; Lopez, Andrea del Pilar; Fernandez, Magdalena; Ferro, Eugenio; Galeano, Liliana; Girardi, Ana Cristina D. E. S.; Griswold, Anthony J.; Hernandez, Luis C.; Lourenço, Naila; Ludena, Yunin; Núñez-Ríos, Diana; Oyama, Rosa; Peña, Katherine P.; Pessah, Isaac; Schmidt, Rebecca; Sweeney, Holly M.; Tolentino, Lizbeth; Wang, Jaqueline Y. T.; Albores-Gallo, Lilia; Croen, Lisa A.; Cruz-Fuentes, Carlos S.; Hertz-Picciotto, Irva; Kolevzon, Alexander; Lattig, Maria Claudia; Mayo, Liliana; Passos-Bueno, Maria Rita; Pericak-Vance, Margaret A.; Siper, Paige M.; Tassone, Flora; Trelles, M. Pilar; Talkowski, Michael E.; Daly, Mark J.; Mahjani, Behrang; De Rubeis, Silvia; Cook, Edwin H.; Roeder, Kathryn; Betancur, Catalina; Devlin, Bernie; Buxbaum, Joseph D.

doi:10.1038/s41591-026-04228-6

Download PDF

Article
Open access
Published: 30 March 2026

Deleterious coding variation associated with autism is shared across ancestries

Marina Natividad Avila^1,2,3,4,5,6,
Seulgi Jung^1,2,3,4,5,6,
F. Kyle Satterstrom^7,8,9,
Jack M. Fu^7,10,11,
Tess Levy^1,2,4,6,
Laura G. Sloofman ORCID: orcid.org/0000-0001-7628-4378^1,2,3,4,5,6,
Lambertus Klei¹²,
Thariana Pichardo^1,2,4,6,
Dalia Marquez^1,2,
Christine R. Stevens^7,8,9,
Caroline M. Cusick⁸,
Jennifer L. Ames¹³,
Gabriele S. Campos¹⁴,
Hilda Cerros¹³,
Roberto Chaskel^15,16,
Claudia I. S. Costa¹⁴,
Michael L. Cuccaro^17,18,
Andrea del Pilar Lopez ORCID: orcid.org/0000-0003-1278-7894¹⁵,
Magdalena Fernandez¹⁶,
Eugenio Ferro¹⁶,
Liliana Galeano¹⁹,
Ana Cristina D. E. S. Girardi ORCID: orcid.org/0000-0001-6531-501X¹⁴,
Anthony J. Griswold ORCID: orcid.org/0000-0003-1925-7810^17,18,
Luis C. Hernandez¹⁹,
Naila Lourenço¹⁴,
Yunin Ludena²⁰,
Diana Núñez-Ríos^21,22,
Rosa Oyama²³,
Katherine P. Peña¹⁹,
Isaac Pessah²⁰,
Rebecca Schmidt ORCID: orcid.org/0000-0003-1582-2747²⁰,
Holly M. Sweeney²⁴,
Lizbeth Tolentino²³,
Jaqueline Y. T. Wang¹⁴,
Lilia Albores-Gallo ORCID: orcid.org/0000-0001-5862-4404^25,26,
Lisa A. Croen ORCID: orcid.org/0000-0001-7849-9428^13,27,
Carlos S. Cruz-Fuentes²⁸,
Irva Hertz-Picciotto²⁰,
Alexander Kolevzon^1,2,29,
Maria Claudia Lattig¹⁹,
Liliana Mayo²³,
Maria Rita Passos-Bueno¹⁴,
Margaret A. Pericak-Vance ORCID: orcid.org/0000-0001-7283-8804^17,18,
Paige M. Siper^1,2,6,
Flora Tassone^20,30,
M. Pilar Trelles³¹,
GALA Consortium,
The Autism Sequencing Consortium (ASC),
Michael E. Talkowski^7,8,10,11,32,
Mark J. Daly ORCID: orcid.org/0000-0002-0949-8752^{7,8,9,10,33,34},
Behrang Mahjani ORCID: orcid.org/0000-0001-6087-9537^{1,2,3,6,35,36},
Silvia De Rubeis ORCID: orcid.org/0000-0001-9383-6883^1,2,4,6,37,
Edwin H. Cook ORCID: orcid.org/0000-0002-5848-5114³⁸,
Kathryn Roeder^39,40,
Catalina Betancur ORCID: orcid.org/0000-0002-3327-4804⁴¹,
Bernie Devlin¹² &
…
Joseph D. Buxbaum ORCID: orcid.org/0000-0001-8898-8313^1,2,3,4,5,6

Nature Medicine (2026)Cite this article

2099 Accesses
144 Altmetric
Metrics details

Subjects

Abstract

The past decade has seen remarkable progress in identifying genes that, when impacted by deleterious coding variation, confer high likelihood for autism spectrum disorder (ASD), intellectual disability and other associated developmental disorders. However, most underlying gene discovery efforts have focused on individuals of European ancestry, limiting insights into genetic liability across diverse populations. To help address this, the Genomics of Autism in Latin American Ancestries (GALA) Consortium was formed, presenting here the largest sequencing study of autism in Latin American individuals (n > 15,000, including 4,717 participants with an ASD diagnosis). We identified 35 genome-wide significant (false discovery rate < 0.05) autism-associated genes, with substantial overlap with findings from European cohorts, and highly constrained genes showing consistent signal across populations. The results provide support for emerging (for example, MARK2, YWHAG, PACS1, RERE, SPEN, GSE1, GLS, TNPO3 and ANKRD17) and established autism genes and for the utility of genetic testing approaches for deleterious variants in individuals from diverse backgrounds; the results also demonstrate the ongoing need for more inclusive genetic research and testing. We conclude that the biology of autism is consistent across populations, with no detectable influence of ancestry.

A recurrent SHANK3 frameshift variant in Autism Spectrum Disorder

Article Open access 04 November 2021

Identifying rare genetic variants in 21 highly multiplex autism families: the role of diagnosis and autistic traits

Article Open access 26 January 2023

The genetic landscape of autism spectrum disorder in an ancestrally diverse cohort

Article Open access 04 December 2024

Main

ASD is characterized by deficits in social communication and the presence of restricted interests and/or repetitive behaviors¹. Although the majority of the genetic liability for autism is attributed to common genetic variation, rare variants, often arising de novo, play a substantial role in individual liability^2,3. Multiple large-scale studies of rare and common variation associated with autism likelihood are ongoing, and dozens of genes strongly associated with autism have emerged^4,5, primarily coding for proteins involved in gene expression regulation, neuronal communication or the cytoskeleton⁶. These findings have contributed to improved interpretation of genetic tests and represent initial steps in the development of personalized interventions and targeted therapies. Although translation to broad clinical care remains limited, gene-targeted therapeutic strategies for rare genetic disorders associated with autism and other neurodevelopmental disorders (NDDs) have emerged as a very dynamic area of study in both academia and industry⁷. The overwhelming majority of participants in gene discovery studies are of European (EUR) ancestry, even though they comprise only 16% of the global population⁸. This limited window into genetic architecture across ancestries could exacerbate preexisting disparities in diagnostics and service use for autism⁹. Indeed, recent studies have reported high rates of inconclusive results after genetic testing in non-EUR individuals, likely because of uncertainty in interpreting genomic variants^10,11,12.

We established the GALA Consortium to investigate the impact of genetic and environmental factors on autism across Latin Americans, including participants from all of the Americas, corresponding to the Admixed American (AMR) superpopulation in the 1000 Genomes Project¹³. These AMR individuals comprise the largest recently admixed population in the world and the largest minority in the United States. It is as yet unknown whether the genetic architecture of autism differs across ancestral populations, and the genetic diversity of the AMR group^14,15 makes this question especially relevant.

We present, to our knowledge, the largest sequencing study to date of autism in AMR individuals and compare our results to findings from non-AMR cohorts. We show that a common measure of evolutionary impact on gene-level variation—that is, genomic constraint scores—differs by ancestry. However, this is not the case for the most constrained genes, which exhibit less population-level variation than expected based on their sequence composition. This is important because most identified autism-associated genes are evolutionarily constrained^4,16, and this applies over diverse populations. Using Bayesian models, we identify 35 genome-wide significant genes associated with autism in Latin American individuals and observe a great degree of overlap with findings in largely EUR cohorts. These results indicate that autism and other NDD genes are shared across ancestries and that existing genetic testing pipelines are effective for the most deleterious variation, especially if information on allele frequency across ancestries is incorporated. We conclude that the biology of autism is consistent across populations and not impacted to any detectable degree by ancestry.

Results

Rare variant landscape in Latin Americans diagnosed with ASD

GALA currently encompasses 10 cohorts across the Americas, with data from eight included in this study (Fig. 1 and Methods, ‘Description of GALA sites’ section). Some GALA samples were contributed to other large-scale whole-exome sequencing (WES) and whole-genome sequencing (WGS) efforts^4,17; analyses of 1,613 samples (including 707 ASD probands) are reported here for the first time. The GALA analyses reported here include all sequenced samples from GALA cohorts as well as additional genetically inferred AMR samples from the Autism Sequencing Consortium (ASC)¹⁸ and Simons Powering Autism Research (SPARK)¹⁹.

**Fig. 1: Overview of GALA cohort sites and pedigree structure.**

A substantial source of individual autism liability resides in rare deleterious variation in conserved genes^4,6, often de novo or very recent. Hence, to maximize power for discovery, we focus on data collected from trios—that is, an affected proband and both unaffected parents and their typically developing sibling(s), when available. When parental DNA samples could not be collected, we incorporated probands using a case−control framework. After extensive quality control (Extended Data Fig. 1), our analysis included 6,977 individuals: 4,717 ASD cases and the remainder consisting of controls and typically developing siblings (Fig. 1b and Supplementary Table 1). In total, 15,427 individuals were sequenced, including parents from trio-based collections who contributed to de novo variant detection but were not themselves analyzed for variant burden. Specifically, 14,359 individuals were sequenced, with WES (n = 14,152) or WGS (n = 207), as part of family-based analysis: 4,450 AMR ASD individuals, 1,459 siblings and 8,450 parents (Supplementary Table 2). For case−control analysis, 267 ASD AMR samples were matched to 801 non-psychiatric AMR controls from the Mount Sinai BioMe biobank^20,21.

We identified 6,555 rare (that is, allele frequency < 0.1% in our dataset and in the population-specific non-neuro subsets of gnomAD versions 2.1.1 and 3.1.2 (refs. ^22,23)) and unique de novo coding sequence variants (5,062 in ASD probands and 1,493 in siblings) (Supplementary Table 3). We identified 36 de novo variants that occurred twice in individuals with ASD: 18 were found in affected siblings, consistent with germline mosaicism, and 18 occurred in unrelated individuals. Additionally, we observed 211 and 15 rare autosomal de novo small genic copy number variants (CNVs)²⁴ in 2,191 probands and 707 siblings, respectively (Supplementary Tables 4 and 5).

In previous studies, highly constrained genes showed an aggregated signal of variants contributing to autism liability¹⁶, and integrating genomic constraint scores has proven powerful for gene discovery⁴. However, constraint scores are derived from cohorts largely of EUR ancestry. Therefore, we first sought to evaluate the utility of these scores on samples of diverse ancestries.

First, we examined the distribution of de novo variants as a function of a well-established metric of tolerance to loss-of-function variants, the loss-of-function observed/expected upper bound fraction (LOEUF)²², derived from gnomAD version 2.1.1. Genes with low LOEUF scores are depleted for loss-of-function variation compared to expectation as a result of negative natural selection²². Our results demonstrate that rates of de novo variants for both protein truncating (PTV) and deleterious missense (MisB, with a ‘missense badness, PolyPhen-2 and constraint’ (MPC) score²⁵ ≥2) variants are elevated in probands compared to typically developing siblings in genes with low LOEUF scores (Fig. 2). Comparing our findings with previously published results⁴, we observed that the overall rates of de novo variation in AMR individuals are consistent with those observed in other ancestry groups (Extended Data Fig. 2). Notably, we found a statistically significant enrichment of PTVs in constrained genes among AMR probands. We also observed a trend toward enrichment of missense variants with MPC score ≥ 2 (P = 0.077).

**Fig. 2: Comparison of rare de novo variant counts per sample between ASD probands and unaffected siblings, normalized to synonymous variant rates.**

Second, we examined whether LOEUF is well calibrated across ancestral populations. Effective population size differs across Native American, EUR and African populations²⁶, but current estimates of gene constraint are derived from cohorts that are largely of EUR ancestry. Existing LOEUF scores are modestly over-conservative when applied to AMR samples (Fig. 3, Extended Data Fig. 3 and Karczewski et al.²²), but, when focusing on the most constrained (lower) deciles, they correlate well with the observed number of PTVs normalized by sample size and gene length (Fig. 3). Because association signal concentrates to these lower deciles (Fig. 2), these observations justify the use of existing LOEUF scores for our study and generally for studies focusing on highly constrained genes in other ancestries, including admixed African ancestries (Extended Data Fig. 3).

**Fig. 3: Genic burden of PTVs in EUR versus AMR ancestries as a function of gene constraint.**

Autism gene discovery in Latin Americans

For gene discovery, we used TADA (transmission and de novo association), an algorithm that integrates de novo, inherited and case−control variants as well as LOEUF scores and small genic CNVs^4,6,27,28. Sixteen genes were associated with autism at a false discovery rate (FDR) < 0.01; 35 genes met genome-wide significant association (FDR < 0.05); and 61 genes were associated at FDR < 0.1 (Fig. 4, Table 1 and Supplementary Table 6). To examine the overlap of these findings with those in largely EUR ASD cohorts, we first identified and removed all AMR samples in Fu et al.⁴, yielding a non-AMR complementary set (Fu_COMP) with no overlap with our analyses. Nineteen of the 35 GALA genes with FDR < 0.05 showed significant signal in Fu_COMP. We next compared the observed numbers of variants in the GALA cohort with the expected number of variants derived from TADA analysis in Fu_COMP. To do this, we compared results for concordant genes, defined as genes that show FDR < 0.05 in GALA and in Fu_COMP, and we observed that, overall, the findings are consistent with expectation (Extended Data Table 1). We also compared our gene findings from the GALA cohort with those in a large cohort ascertained for severe developmental disorders²⁹: six of the 16 genes that had an FDR < 0.05 in GALA and an FDR < 0.1 in Fu_COMP showed an FDR < 0.05 in the developmental disorder cohort (Table 1).

**Fig. 4: Manhattan plot of autism genes identified in Latin American participants.**

Table 1 Genome-wide and clinical findings for the top 35 genes

Full size table

As in previous studies, de novo variation provided a major source of signal for top genes (Extended Data Fig. 4). Similarly, PTVs are a major source of signal, and it was interesting to note that missense variants were also an important source of rare variation association signal (Extended Data Fig. 5). For several of the top genes, the association signal is fully or almost fully derived from missense variants in the GALA cohort, which, for MTOR, YWHAG, GRIN1, PACS1 and CACNA1D, is consistent with previous findings and may suggest a dominant negative or gain-of-function mechanism (Table 1 and Extended Data Table 2). Gene Ontology and Mammalian Phenotype enrichment analyses (Supplementary Tables 7 and 8) highlighted biological processes and phenotypes related to synaptic function, neuronal development and social and repetitive behaviors.

Implications for clinical genetics

With compelling evidence for overlapping autism gene findings in AMR samples, we next asked about the fraction of findings that are identified as pathogenic or likely pathogenic (P/LP) as per American College of Medical Genetics (ACMG) guidelines³⁰. We used VarSome³¹—minimizing the use of proprietary databases and approaches used by commercial testing laboratories—to evaluate (1) genome-wide de novo variation and (2) inherited variation in X-linked genes associated with autism (Supplementary Table 9). This analysis included all de novo variants observed across the genome, not just those meeting the TADA inclusion criteria. Specifically, we included all protein-truncating, missense and synonymous variants, including those in genes lacking mutation rate or LOEUF estimates. For inherited variants, we focused on rare variants in known X-linked genes associated with autism and/or NDDs. We analyzed all GALA and Fu_COMP samples, focusing on genes for which there was a reported association with an autism and/or a broader NDD phenotype.

Among the 20,571 de novo variants in our analysis, 926 (4.5%) were classified by VarSome as P/LP when we focused on genes that included autism among the associated phenotypes (Supplementary Table 10). In the AMR cohort, 195 variants (3.8%, 95% confidence interval (CI): 3.27−4.32%) were identified as P/LP (Supplementary Table 11) compared to 731 out of 15,386 (4.75%, 95% CI: 4.42−5.10%) in non-AMR samples. In terms of participants with findings, 4.31% (95% CI: 3.75−4.96%) of AMR and 5.53% (95% CI: 5.15−5.94%) of non-AMR probands had at least one P/LP variant identified. Comparisons between EUR and non-EUR participants revealed that EUR individuals had a higher rate of de novo P/LP variants. Specifically, EUR participants had 634 (4.83%, 95% CI: 4.47−5.21%) P/LP variants identified compared to 292 (3.92%, 95% CI: 3.50−4.40%) in non-EUR participants. Overall, EUR participants had a higher rate of P/LP variants identified than non-EUR participants (5.61%, 95% CI: 5.20−6.06% versus 4.54%, 95% CI: 4.05−5.09%).

When broadening our criteria to include other NDD phenotypes, 1,339 de novo variants were deemed to be P/LP (Supplementary Table 12). In AMR, 276 variants were classified as P/LP (5.32%, 95% CI: 4.74−5.98%) versus 1,063 in non-AMR individuals (6.91%, 95% CI: 6.52−7.32%). In terms of participants with de novo findings, 6.07% (95% CI: 5.39−6.82%) of AMR participants and 7.99% (95% CI: 7.53−8.47%) of non-AMR participants had at least one P/LP finding. EUR participants had a notably higher rate of findings (8.22%, 95% CI: 7.72−8.75%) compared to non-EUR participants (6.24%, 95% CI: 5.66−6.87%).

Extending our analysis to include X-linked inherited findings, we observed a further increase in P/LP detection rates. Specifically, 201 de novo or X-linked variants (2.80%, 95% CI: 2.43−3.21%) in AMR samples and 758 variants (3.58%, 95% CI: 3.33−3.84%) in non-AMR samples were classified as P/LP for ASD. When we broadened the terms to include other NDD-related genes, the proportion of P/LP variants rose to 4.10% (95% CI: 3.65−4.58%) in AMR participants and to 5.26% (95% CI: 4.96−5.57%) in non-AMR participants. The rate of participants with at least one P/LP variant increased to 6.47% (95% CI: 5.78−7.24%) in AMR samples and to 8.38% (95% CI: 7.91−8.87%) in non-AMR samples (Supplementary Table 11). EUR participants showed a higher yield of P/LP findings (8.62%, 95% CI: 8.11−9.16%) compared to non-EUR participants (6.63%, 95% CI: 6.04−7.28%).

Qualitatively similar results were obtained when using Neptune³², which uses databases of previously identified variants to call P/LP variants in a set of 73 ACMG-recommended genes with actionable findings³³ (Extended Data Fig. 6 and Supplementary Table 11). Although greater numbers of rare variants were identified in individuals from diverse ancestries, the proportion of these that could be classified as P/LP was lower. This combination of higher variant detection but reduced classification rate of P/LP variants contributes to a somewhat lower overall yield of P/LP findings per individual in AMR or non-EUR individuals when compared to non-AMR or EUR ancestries, respectively. Considering the VarSome and Neptune results together, the findings provide support for the translatability of rare genetic findings in autism across ancestries in a clinical setting, albeit with opportunities for improvement.

Discussion

The past decade has seen major advances in deciphering the overall and the genetic architecture of autism but largely from EUR cohorts. It is not yet known whether the genetic architecture of autism differs across ancestral populations, including in admixed populations. Latin American individuals comprise the largest recently admixed population in the world and the largest minority in the United States. Diverse sites with large AMR representation have joined to form GALA, and here we report a first, large-scale multinational analysis of rare variant liability in Latin Americans with ASD, identifying autism-associated genes in this cohort and comparing genetic architecture with that observed in non-AMR ASD.

As in previous studies, we found that signal for genes strongly associated with autism was concentrated in highly conserved genes and largely driven by very rare de novo variation. For the discovery of autism-associated genes impacted by very rare de novo or case−control variation, it is critical to have reliable estimates of expected genic mutation rates, which can be derived from both cross-species comparisons and empirical data from massive, aggregated sequencing resources, such as gnomAD. Although representation of diverse populations is improving, much of the existing sequence data are skewed toward EUR samples. Thus, there is much more to be done regarding genetic variability within underrepresented populations. Our analyses confirm that metrics of gene-level constraint are overly conservative, due to the overreliance on EUR samples that have a lower effective population size. However, we also demonstrate that the key metric LOEUF, when applied to the most conserved genes, is well calibrated across diverse ancestral populations.

Because deleterious variation in highly conserved genes is subject to strong purifying selection, such variation is both very rare and frequently de novo. Allele frequency filtering based on gnomAD or similar datasets is, hence, an important means to infer very rare variation. However, we observe that relying on overall allele frequency allows for the introduction of more common variation into the analyses, hence reducing power and increasing the false-positive rate. We began our analyses using established best practices for filtering by global allele frequency in the analysis of potentially de novo variants^4,6. However, we noticed that some variants initially classified as rare in gnomAD (allele frequency < 0.1%) turned out to be more common in particular populations. To address this heterogeneity, we recommend annotating variants with allele frequencies across all subpopulations in the non-neuro releases of gnomAD, as we have done here. Building upon this strategy, we extended the same annotation to our analysis of inherited variation, adopting a more stringent allele frequency threshold of <0.01%, to ensure even more precision in our findings^34,35.

We next used TADA to identify 35 genes associated with autism at an FDR threshold <0.05 in the GALA dataset, 16 with FDR < 0.01 and eight with FDR < 0.001 (Fig. 4 and Table 1). Consistent with previous studies in largely EUR cohorts, gene expression regulation, neuronal communication and cytoplasmic genes are well represented among the autism-associated genes identified in GALA (Table 1 and Supplementary Tables 7 and 8). FDR is well calibrated in TADA⁶, and genes identified with TADA in smaller cohorts are consistently replicated at expected levels in larger samples. However, it is still important to evaluate the level of confidence in the genes identified. First, as noted above, we compared results for top genes across GALA and a recent large-scale study (FDR < 0.05 in both AMR samples and non-AMR/Fu_COMP studies), and we observed that findings are consistent with expectation (Extended Data Table 1). (Note that, although individual gene-level counts may differ, this variation is expected given the rarity of events; by contrast, when we aggregated data from the top genes, the number of observed variants across genes and variant classes in GALA closely matches the expected total derived from Fu_COMP.) However, there are multiple genes with evidence in GALA but not in Fu_COMP. This can be for one of several reasons, including (1) sparseness of de novo events and, hence, overrepresentation/underrepresentation of de novo events in subsamples; (2) differences in ascertainment; and (3) the possibility that some findings are false-positive findings. Although all three could make some contribution, (1) was extensively evaluated previously^4,6, and the analyses suggested that it is likely to be the major contributor to discordance. To further evaluate whether discordant genes may still represent true positives, we first compared GALA findings to results from Fu_COMP, a non-AMR cohort. Although many top (FDR < 0.05) GALA genes were also supported in Fu_COMP, a subset of 17 genes showed an FDR > 0.05 in Fu_COMP, suggesting weaker support. We, therefore, examined their support in a large cohort of individuals with severe developmental disorders²⁹, and seven of these 17 genes show a clear support. Finally, among the 35 autism-associated genes with an FDR < 0.05, most have a dominant neurodevelopmental morbid association in OMIM, ClinGen and/or Gene2Phenotype (Table 1). The concordance of findings between genome-wide studies (GALA, Fu_COMP and developmental disorders) and curated clinical databases indicate that our approach is valid for autism gene discovery in AMR samples and that the FDRs are likely well calibrated.

We next examined emerging and known genes found in the GALA analyses, including contrasting results with those seen in non-AMR samples (Fu_COMP) and curated databases (Table 1, Fig. 5, Extended Data Table 1 and Extended Data Fig. 7). These genes provide further support for MTOR signaling (for example, MARK2, MTOR, TSC2, YWHAG and GLS), synaptic and cytoskeletal function (for example, DYNC1H1, PAK2, DLG4, GRIN1 and SYNGAP1) and transcriptional regulation (for example, SPEN, RERE and GSE1) in autism. Notably, these pathways are also strongly implicated across intellectual disability and NDDs, underscoring the tremendous overlap in genetic discovery that transcends traditional diagnostic boundaries. Many of these genes are constrained for PTVs and/or missense variants and show support from independent datasets, including de novo events in severe developmental disorder cohorts. A description of top and interesting genes is found in Extended Data Table 2.

**Fig. 5: Lollipop diagrams illustrating variants identified in emerging autism-associated genes.**

Altogether, the results are consistent with the assumption that the same set of highly constrained genes identified in ongoing genome-wide studies is associated with autism, regardless of ancestry. This perspective also receives support from common variant studies in complex traits, where causal effects appear to be highly similar across ancestries^36,37: Hou et al.³⁶ analyzed 53,001 African-European admixed individuals and observed that causal effects of common variants (allele frequency > 0.5%) for 38 complex traits are largely similar across local ancestries, in agreement with other studies, including a recent analysis showing that cis-genetic effects on gene expression are highly similar between EUR and African individuals³⁷.

We considered whether the observed similarity in deleterious variant burden between AMR-assigned and EUR-assigned individuals could reflect the influence of EUR admixture within AMR genomes. In principle, local ancestry inference (LAI) would allow mapping of individual variants to ancestral tracts, enabling a more granular test of whether such variants preferentially arise on EUR versus non-EUR backgrounds. However, current LAI methods require dense haplotypic data across the genome, typically from WGS. The sparse and uneven coverage of exome data poses considerable challenges for LAI, and performance has been shown to decline substantially in this context^38,39. Moreover, because much of our gene discovery relies on de novo rather than inherited variants, the signal is unlikely to be biased by local ancestry tracts, and we also confirmed that variant and gene discovery is clearly driven by the large proportion of individuals with modest overall EUR ancestry (Extended Data Fig. 8). Still, we acknowledge that this is a potential limitation of the study and a valuable direction for future work in cohorts with whole-genome data where LAI can be reliably determined.

Using clinical genetics software platforms, we confirm the overall translatability of clinical genetic approaches when focusing on rare deleterious variation; however, we also reveal differences in the rate of P/LP variants between AMR and non-AMR individuals and between EUR and non-EUR individuals. The causes driving differences in rates of P/LP need to be better understood, as this is a limitation that complicates the interpretation of our analyses. A recent study focusing on pediatric patients with serious neurologic, cardiac or immunologic conditions reported similar diagnostic yield for genome sequencing in European Americans and Latin Americans (19.8% versus 17.2%); however, yields were lower (11.5%) and inconclusive results were higher in African Americans¹¹. In that study, genome sequencing was carried out by commercial diagnostic laboratories, making use of a proprietary pipeline that incorporates variant databases; the degree to which proprietary algorithms and the degree to which reliance on previously observed variation influenced the higher rate of inconclusive results cannot be determined.

Analysis of pathogenic variation in the All of Us Research Program, which integrates data from a diverse cohort to identify genetic differences across ancestries, further highlights the disparities in variant classification across populations. The study examined P/LP variants in a modest number of genes with actionable findings, showing differences as a function of ancestry, with 42% fewer pathogenic variants identified in Latin American versus EUR individuals (1.32% versus 2.26%)¹⁰. All of Us analyses used Neptune, a system developed for clinical genetic reporting³². Neptune relies heavily on variants identified in prior curated data, which will bias the findings in diverse populations. Consistent with this, analyses of the GALA cohort using Neptune show lower rates of findings compared to non-AMR samples. Our results suggest that with a focus on deleterious de novo variation, use of prior results is less necessary, and others have shown that even highly curated variant databases include false-positive findings that can lead to incorrect information to subsequent families^40,41,42,43. Where possible, we recommend minimizing reliance on previously reported pathogenic variants. In addition, to further improve genetic testing results across diverse populations, our results show that it is of key importance to use allele frequency from all relevant populations, as we have done here.

We should, however, recognize the limitations inherent in our study and in any study that focuses on ancestries beyond EUR and a few other commonly characterized populations. For instance, we focused on de novo variants and their interpretation in AMR populations. Variants called de novo in our sample, and within subjects, are likely a mixture of true and false positives. For populations not deeply characterized for genetic variation, it is reasonable to expect elevation in the false-positive rate, simply because we do not know the frequencies of variants therein and which variants are relatively more common. For this reason, more of the variation called de novo is likely to be inherited variation.

At the same time, it is possible that unknown genomic complexity, such as common structural variants^44,45,46, elevate false negatives within these populations, including genomic variation important for phenotypes like autism, which is another limitation of our study. The combination of these three quantities—true positives, false positives and false negatives—determines the total variation that we observe. Based on our results, which show similar patterns to those observed in EUR studies, we can conclude that the vast majority of our results arise from true positives. Nonetheless, we should not conclude that populations are all the same when it comes to calling de novo variation. Indeed, we can be confident that they are not, given what we know about increased genetic diversity in African populations^47,48,49,50 and the impact that cryptic structural variation and singleton events have on the reliability of calling ultra-rare variation. Only through deeper genetic studies can we expect completely comparable results to those of EUR population samples, ameliorating the above issues.

In conclusion, our observations are consistent with the neurobiology of autism being shared across ancestries and provide support for the translatability of autism clinical genetic approaches across ancestries.

Methods

Cohort description

GALA comprises multiple sites from North, Central and South America recruiting AMR participants for studies on the genetic architecture of autism. Study procedures were approved by the institutional review board (IRB) of the Program for the Protection of Human Subjects at Mount Sinai (no. 16-01262). Informed consent was obtained from the parents or legal guardians of all study participants.

Study procedures for participant enrollment were approved by the Program for the Protection of Human Subjects at Mount Sinai (no. 16-01262 for the Seaver Center at Mount Sinai, São Paulo, Brazil, and Bogotá, Colombia; and no. 21-00039 for Peru), the University of California, Davis IRB (no. 226028-22) and the University of Miami IRB (no. 20070193). Two cohorts were collected previously: study procedures for participant enrollment in Costa Rica were approved under the guidelines of the Ministry of Health of Costa Rica, the Ethical Committee of the National Children’s Hospital in San Jose and the IRB at Mount Sinai, as described previously^53,54; and The Autism Simplex Collection (TASC), which included an estimated 12% of individuals of Latin American ancestry, was recruited across 13 sites in North America and Europe, as described previously⁵⁵, with local IRB oversight and all consents reviewed before depositing biospecimens and data to the National Institutes of Health repository.

For clarity, we use ‘ASD’ to refer to individuals who received a clinical diagnosis according to the procedure outlined below and ‘autism’ elsewhere. ASD diagnoses are based on expert clinical evaluations using Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) criteria, incorporating all available data, including standardized assessments. Participants can be any age. Individuals with a known genetic condition (for example, fragile X syndrome) are excluded from analyses. Once a diagnosis of ASD is confirmed, the individual and their parents contribute a sample (blood or saliva) for genetic analyses. If both parents are not available, collection of other biological family members is encouraged (siblings, grandparents, etc.). Participating sites generally also collect additional clinical and family history information.

Description of GALA sites

New York, USA

The Seaver Autism Center for Research and Treatment at the Icahn School of Medicine at Mount Sinai, located in New York City, is the main coordinating site within the GALA Consortium. AMR individuals make up almost 30% of the population of New York City. Affected individuals undergo a full diagnostic ASD workup and receive additional assessments, including a cognitive test, adaptive behavior measure, medical checklist and behavioral checklists. Participating families receive $100 USD in compensation.

São Paulo, Brazil

The Human Genome and Stem Cell Research Center (HUG-CELL) at the Universidade de São Paulo in Brazil has over 20 years of experience in clinical and molecular research in autism, with more than 2,000 families seen. Brazil has a multiethnic admixed population, including African and Amerindian ancestry⁵⁶. The HUG-CELL conducts research in human and medical genetics of rare diseases, providing genetic counseling services and genetic tests for the population. A team of psychiatrists, psychologists and neurologists completes a formal ASD diagnostic workup prior to obtaining samples for genetic testing from the individual and their family members. Financial compensation for participation is not permitted at this site; however, individuals who meet clinical criteria are offered free fragile X testing.

Bogotá, Colombia

The Centro de Investigaciones Genéticas en Enfermedades Humanas (CIGEn) at the Universidad de los Andes in Bogotá, Colombia, in close collaboration with the Instituto Colombiano del Sistema Nervioso, Clínica Montserrat, focuses on unraveling the prevalence and characteristics of autism within the Colombian population. Through ASD referrals, the impact of CIGEn extends beyond Bogotá, reaching out to other cities throughout Colombia (Medellín, Cali, Armenia, Pereira, Bucaramanga, Cartagena, Barranquilla and Santa Marta), with the aim of including families from diverse backgrounds. Financial compensation is not offered for participation.

Mexico City, Mexico

The Children’s Psychiatric Hospital ‘Juan N. Navarro’ (HPIJNN), which is part of the Psychiatric Care Services of the Mexican Government’s Ministry of Health, provides professional care for minors with mental health, psychiatric and behavioral problems. As the largest teaching center in child and adolescent psychiatry in Mexico, it performs diverse biomedical and clinical research activities. One of the main lines of research focuses on autism, in collaboration with the Genetics Department at the National Institute of Psychiatry Ramón de la Fuente Muñíz (INPRFM). The samples from Mexico are being sequenced and were not included in the current analyses. Financial compensation is not provided at this site, in accordance with ethics committee requirements.

Lima, Peru

The Centro Ann Sullivan del Perú is a non-profit center in Lima, Peru, that serves individuals with varying abilities and their families. The center specializes in helping individuals with ASD. GALA investigators from the Seaver Autism Center (M.P.T. and A.K.) traveled to Lima to perform 40 psychiatric evaluations, aid in ASD diagnostics and collect blood samples from individuals with ASD and their families. Behavioral surveys were carried out for all participants, and ASD and attention-deficit/hyperactivity disorder diagnoses were made using DSM-5 criteria. Financial compensation was not offered; instead, participating individuals received their clinical evaluation results.

California, USA (CHARGE)

The Childhood Autism Risks from Genetics and the Environment (CHARGE) cohort is a population-based case−control study collected in California at the University of California, Davis, Center for Children’s Environmental Health laboratories with the intent of addressing the impact of environmental exposures on risk⁵⁷.

Florida, USA

The John P. Hussman Institute for Human Genomics at the University of Miami, located in Miami, Florida, recruits families through clinical referrals and lay organizations, providing services to families with ASD. Upwards of 70% of the Miami population identifies as AMR. The diagnostic workup included the Autism Diagnostic Interview-Revised (ADI-R) and assessment of adaptive behavior. Discrepancies between ADI-R and clinical findings were resolved using additional clinical measures, including the Autism Diagnostic Observation Schedule (ADOS).

Central Valley, Costa Rica

The founder population of the Central Valley of Costa Rica (CVCR) originated at the end of the 16th century from the intermarriage of 86 Spanish families and Indigenous Americans. The population was geographically isolated until the late 19th century; therefore, the current inhabitants are estimated to descend from fewer than 1,000 founders⁵⁸. A genetic study on autism in the CVCR was initiated in 2003, and affected individuals were ascertained using the translated Spanish versions of the ADI-R and the ADOS as well as assessment of intellectual abilities and adaptive behavior⁵³.

USA and Europe (TASC)

TASC was a collaboration among 13 sites in North America and Western Europe funded by the National Alliance for Autism Research, now Autism Speaks, and the National Institute of Mental Health. As detailed previously⁵⁵, more than 1,700 individuals with ASD confirmed with extensive prospective assessment, as well as additional family members including parents, completed this study. Individuals within this study were sequenced, and those who were of AMR ancestry were included in these analyses.

California, USA (Kaiser Permanente)

The Autism Research Program (ARP) at the Kaiser Permanente Northern California (KPNC) Division of Research was established in 2002 by Senior Research Scientist Lisa Croen. The program focuses on research identifying genetic and environmental factors associated with autism and understanding patterns of detection, diagnosis and utilization of health services for individuals with ASD across the lifespan. The ARP created the Autism Family Biobank, a repository including genetic, medical and environmental information from more than 1,000 individuals with ASD and their two biological parents, who donated blood or saliva between 2015 and 2017. This collection is representative of the diverse population served by KPNC, an integrated healthcare system. The samples from Kaiser Permanente are being sequenced and were not included in the current analyses. Participants receive $15 USD per biospecimen, and families receive an additional $15 USD upon completion of the parent surveys.

Ancestry determination and sample-level quality control

Latin American samples analyzed in the current freeze include (1) GALA participants (some published in Fu et al.⁴); (2) non-overlapping AMR samples in the ASC and SPARK¹⁹ reported in Fu et al.⁴; and (3) additional AMR samples from the new release of SPARK (iWESv2). The current freeze includes trio data from 14,359 AMR samples, including 4,450 affected individuals (609 from GALA and 3,841 from ASC and the SPARK releases) and 1,459 typically developing siblings and case−control data from 267 cases and 801 controls.

To assign ancestry to each case, we followed an approach modeled after the pipeline used by gnomAD (https://gnomad.broadinstitute.org/news/2021-09-using-the-gnomad-ancestry-principal-components-analysis-loadings-and-random-forest-classifier-on-your-dataset/). Specifically, each of three jointly called datasets, derived from unpublished GALA sequencing, Fu et al. and SPARK (iWESv2), was merged with the Human Genome Diversity Project (HGDP) + 1000 Genomes Project (1KG) subset of gnomAD²², and principal component analysis (PCA) was performed in the joint dataset after they had been restricted to 5,000 ancestry-informative single-nucleotide polymorphisms⁵⁹. A random forest classifier was trained on the HGDP + 1KG reference samples using the first 10 principal components and used to assign superpopulation/continental ancestry to individuals in our dataset. AMR ancestry classification was based on the predicted ancestry label assigned by the random forest model. Non-AMR cases included any individuals with ASD in ASC or SPARK releases who did not meet our criteria for genetically inferred AMR ancestry (28,818 parents, 13,030 probands and 4,749 typically developing siblings).

Hail 0.2 was used to process the SPARK (iWESv2) and unpublished GALA joint-genotyped variant call files (VCFs). Multiallelic sites were split; variants were annotated using the Variant Effect Predictor (VEP)⁶⁰; and low-complexity regions (https://github.com/lh3/varcmp/blob/master/scripts/LCR-hs38.bed.gz) were removed. Hail’s pc_relate() function was used to confirm reported pedigrees and identify duplicate samples within and between datasets, which were removed. Sex was imputed using the impute_sex() function, and genotype filters were applied as described in previous methodology⁶ to generate working datasets (Extended Data Fig. 1).

De novo variants

Previously published de novo calls were extracted from Supplementary Table 20 from Fu et al.⁴. For the unpublished GALA and SPARK (iWESv2) datasets, de novo variants were called using the my_de_novo_v16() function (https://discuss.hail.is/t/de-novo-calls-on-hemizygous-x-variants/2357/19) with variant frequencies from the non-neuro subset of gnomAD exomes version 2.1.1 as priors. Potential de novo variants were dropped if they were present at a frequency greater than 0.1% within the non-neuro subset of gnomAD version 2.1.1, gnomAD version 3.1.2, in any subpopulation of these gnomAD datasets or the dataset in which they were called. Variants were further excluded if they had ‘ExcessHet’ in the Filters field, exhibited a proband allele balance < 0.3 or demonstrated a depth ratio < 0.3. Only ‘HIGH’-confidence or ‘MEDIUM’-confidence variants were kept, with the MEDIUM-confidence calls limited to a maximum allele count in the dataset of 1. A single variant per person per gene was chosen, giving preference to variants with more damaging consequences. Samples were finally excluded if the count of coding de novo variants was significantly greater than expected.

Inherited variants

Starting with the same working datasets as for de novo calling, counts of transmitted and non-transmitted alleles were generated using Hail’s transmission_disequilibrium_test() function. Variants were filtered out if they were marked ‘ExcessHet’ by GATK4 or had allele frequencies greater than 0.01% within their own dataset, within the non-neuro subset of gnomAD version 2.1.1, gnomAD version 3.1.2 or within any subpopulation of these gnomAD datasets. Variants with an allele count > 6 in the total parents of the dataset were excluded as well. Hard filtering was applied according to GATK recommendations (https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants). Final counts of transmitted and non-transmitted alleles were produced for PTV, MisB, MisA (1 ≤ MPC < 2) and synonymous variants.

Case−control variants

Probands within incomplete trios were identified from the ASC and GALA cohorts and matched using the top 10 principal components (‘Ancestry determination and sample-level quality control’) with non-psychiatric, unrelated controls from BioMe at a ratio of three controls to one case (3:1). Incomplete trios from SPARK (iWESv2) were removed. To ensure genome build standardization between these two cohorts, CRAM files from ASD cases were unmapped using GATK4 (ref. ⁶¹) and then remapped to a different version of the hg38 reference genome (https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=838) using GATK3.5. Single-nucleotide variants (SNVs) and insertions/deletions (indels) were joint-genotyped across cases using the Haplotypecaller of GATK4. Like for the trio dataset processing, Hail 0.2 (https://hail.is) was used to process the joint-genotyped VCF file. The identity_by_descent() function of Hail was used to test for relatedness, which resulted in the removal of 13 cases. Sex was imputed for every sample using the impute_sex() function of Hail and cross-checked with metadata provided by all sites to ensure sample concordance.

As was done for the previous datasets⁴, multiallelic sites were split, variants were annotated using the VEP and low-complexity regions were removed. Variants were removed if they had an allele count ≥ 2 in the entire case−control dataset as well as an allele count ≥ 5 in the non-psychiatric subset of gnomAD version 2.1.1. Genotype calls were filtered to genotype quality > 25 and allele balance > 0.3. For case−control coverage harmonization, variants in high coverage, defined as a call rate ≥ 90%, were kept. To perform case−control matching, we excluded one case that was an outlier in the distribution of the number of synonymous variants. Finally, 267 cases were matched to 801 controls by sex and the first 10 principal components using the match_on function of the R package optmatch⁶².

CNV analysis

De novo CNVs called in Fu et al.⁴ coming from AMR samples were extracted (1,861 probands and 680 unaffected siblings). Trio and case−control datasets were analyzed separately, and GATK-gCNV²⁴ was used to detect CNVs. First, raw CRAM files were compressed into read counts that covered the annotated exons to serve as input data. Then, a PCA-based approach that combines density and distance-based clustering was employed on the observed read counts to organize batches of samples for parallel processing. GATK-gCNV was run on cohort mode analysis for 200 samples within the cluster identified through PCA, and the remaining samples were subjected to GATK-gCNV analysis using the case mode, with models specific to the cohort (368 probands and 29 typically developing siblings). For quality control, CNV calls were processed according to Fu et al.⁴ methodology; CNVs were retained if they had an allele frequency < 1% that spanned more than two captured exons. For homozygous deletions, the quality score threshold was set to the lesser of 400 or 10 times the number of intervals. For heterozygous deletions, the quality score threshold was set to the lesser of 100 or 10 times the number of intervals. For duplications, the quality score threshold was set to the lesser of 50 or four times the number of intervals. For sample-level quality control, samples were retained if the number of raw, autosomal CNV calls detected by GATK-gCNV did not exceed 200 and if the number of calls with quality score ≥ 20 did not exceed 35. After quality control, 291 probands, 25 typically developing siblings, 209 cases and 735 controls remained.

A gene was considered impacted by a deletion if at least 10% of its non-redundant exons were overlapped by the deletion. For a duplication, a gene was considered impacted if at least 75% of its non-redundant exons were overlapped. Additionally, CNVs were annotated against a list of 79 curated genomic disorder loci (see Supplementary Table 10 in Fu et al.⁴), and a CNV call was classified as a genomic disorder CNV if it shared at least 50% reciprocal overlap with an annotated genomic disorder.

Genetic association analyses

TADA^4,27 was performed for three types of inheritance classes: de novo (PTV, MisB, MisA, deletion (DEL) and duplication (DUP)), inherited (PTV, MisB and MisA) and case−control (PTV, MisB, MisA, DEL and DUP) variation. CNVs resulting from non-allelic homologous recombination (NAHR) were excluded, and only CNVs impacting fewer than nine constrained genes were retained (LOEUF < 0.6) (Supplementary Tables 13–19).

Bayes factors were constructed separately for each variant class (PTV, MisA, MisB, DEL and DUP) as described, accounting for sample size and directly using relative risk priors from Fu et al. directly (see Supplementary Table 8 in Fu et al.⁴). Previously published mutation rates were adjusted to align with the observed variant counts in unaffected siblings for each variant type in the dataset⁴.

Expected versus observed mutations in GALA

As noted in the main text, for top genes in GALA that were also significant in Fu_COMP, we compared the observed numbers of variants in the GALA cohort with the expected number of variants derived from TADA analysis in Fu_COMP. Although observed and expected counts may vary at the individual gene level, as expected for ultra-rare events, the overall observed and expected totals across all genes are well matched, supporting the consistency of signal with expectation (Extended Data Table 1).

Clinical genetics analyses

In addition to VarSome described in the main text, we also ran Neptune³², which uses databases of previously identified variants to call P/LP variants in a set of target genes; we took a similar approach to a recent study¹⁰ carried out in the All Of Us Research Program by focusing on 73 actionable ACMG genes⁶³. Of the 12,162 variants in these genes among the 4,450 family-based AMR cases, Neptune provided a classification for 8,501 (69.9%); this compares to 28,262 variants among the 13,030 non-AMR family-based cases, of which 20,750 (73.4%) were classified by Neptune. In AMR participants, 136 variants were classified as P/LP, representing 1.12% (136/12,162) of all variants in these genes and 1.60% (136/8,501) of all classified variants. In non-AMR participants, 344 variants were classified as P/LP, representing 1.22% (344/28,262) of all variants and 1.66% (344/20,750) of all classified variants. Examining the results from the perspective of the participants, in AMR we observed 2.73 variants in these genes per individual, of which 1.91 per individual could be classified by Neptune, and 0.031 per individual were classified as P/LP. The corresponding numbers were 2.17 variants, 1.59 Neptune classified variants and 0.026 P/LP variants per non-AMR individual. The results show that, on the variant level, the differences in AMR versus non-AMR participants trace in part to a reduced ability of Neptune to classify non-AMR variants (Extended Data Fig. 6 and Supplementary Table 11). However, as also noted above, there are more variants per AMR participant (both total and Neptune classified), leading to an apparent lessening of impact in terms of P/LP variants per individual.

ACMG interpretation of variants

As noted above, for genetic association analyses, the TADA framework was limited to autosomal genes with available mutation rates and LOEUF scores (n = 18,128 genes) and considered only missense variants with an MPC score ≥ 1. By contrast, the clinical interpretation of variants included all autosomal or X-linked protein-truncating, synonymous and missense variants, regardless of gene annotation. In addition to applying the allele frequency cutoff of 0.1% (‘De novo variants’), X-linked variants were subjected to an allele frequency cutoff of 0.1% in the male non-psychiatric subsets of gnomAD versions 2.1.1 and 3.1.2 and their subpopulations. This resulted in 20,571 de novo variants being included for clinical genetics annotation. Inherited variant analysis was restricted to a list of well-established X-linked genes implicated in autism and/or intellectual disability (Supplementary Table 9) and subjected to the same allele frequency cutoff.

The commercially available VarSome package³¹ was used to evaluate the clinical impact of both de novo variants and X-linked inherited variation in the selected genes. Given the large number of variants, a batch environment was used, which limited the parameters that could be optimized for each gene. Additionally, as ACMG guidelines³⁰ consider patient phenotype, the focus was placed on genes for which there was a reported relationship with an autism phenotype (Autism Spectrum Disorder, Autism and Autistic Behavior) and/or with a broader NDD phenotype (including the three autism terms as well as Intellectual Disability, Global Developmental Delay, Seizure, Epileptic Encephalopathy and Complex Neurodevelopmental Disorder), without knowing the full spectrum of non-autism phenotypes in the participants. Hence, the results presented here (Supplementary Tables 10–12), although based on a more transparent algorithm, should not be considered fully compliant with ACMG classification guidelines.

The api.batch_lookup function in VarSome was used to obtain germline variant-level information related to ACMG classification, nucleotide substitution and amino acid substitution, along with pathogenicity predictions. When possible, transcripts with the most severe coding impact were selected. Otherwise, the MANE Select transcript, longest canonical transcript, MANE Plus transcript, longest transcript or RefSeq transcript was chosen in that order by default.

For de novo variation, variant lists containing unique sets of variants found in each sex and zygosity were annotated. Inheritance in VarSome was set to ‘Confirmed De Novo’. Output from each list was returned in separate JSON files, which were then read into R for downstream processing into tab-separated tables. Inherited variation was examined in a similar manner; however, inheritance was set to the parent of origin of the variant.

To extend these analyses further, we used Neptune³², examining 73 ACMG actionable genes analyzed in All Of Us¹⁰. The VIP database used for annotation in Neptune was downloaded from https://gitlab.com/bcm-hgsc/neptune in VCF format, and all variants were lifted over⁶³ from GRCh37 to GRCh38. Clinical significance annotations were parsed from the INFO field, and variants classified as Pathogenic/Likely Pathogenic, Uncertain significance and Benign/Likely Benign were noted. All rare variants in probands, regardless of mode of inheritance, were used in these analyses. Of the 73 genes, Venner et al.¹⁰ annotated only biallelic variants as P/LP in three recessive genes (MUTYH, ATP3B and KCNQ1) and only a specific variant as P/LP in HFE; we did not observe P/LP variants in these four genes, so no additional corrections were made.

Inclusion and ethics statement

This study was conducted in accordance with Nature Portfolio’s guidelines on inclusion and ethics in global research. The research was designed to include participants of diverse ancestries, with the goal of improving representation in autism genetics research. Study protocols were approved by the IRBs at all participating sites, including the Program for the Protection of Human Subjects at the Icahn School of Medicine at Mount Sinai (GCO no. 14-1082(0001)) as well as the local IRBs in Brazil, Colombia, Peru, Mexico, Kaiser Permanente and the CHARGE study (see ‘Cohort description’). Written informed consent was obtained from all participants or from parents or legal guardians where necessary. Data collection adhered to relevant ethical and cultural standards, and compensation for participation varied by site as described above. Collaborations between institutions in the United States and Latin America were established to ensure equitable contributions across sites. Local investigators in Brazil, Colombia, Mexico and Perú were involved in data collection and authorship.

Sex was recorded based on self-report at enrollment and confirmed with genetic information. Both male and female participants were included; however, sex-stratified analyses were not conducted, as the primary focus of this study was on de novo and rare variant burden across ancestry groups rather than sex differences. Participant ages varied by cohort, with probands typically enrolled during childhood or adolescence and parents as adults.

Statistics and reproducibility

All statistical analyses were performed using R (version 4.3.3), Hail (version 2.0) and Python (version 3.8). Statistical methods are described in detail in the relevant sections of Methods. Two-sided tests were used throughout unless otherwise specified. Multiple hypothesis testing was corrected using the Benjamini–Hochberg FDR procedure or Bonferroni correction as appropriate. Sample sizes were determined by the number of available participants meeting inclusion criteria in the ASC, GALA and SPARK cohorts; no statistical method was used to predetermine sample size. All available samples passing relatedness and quality control thresholds were included in the analyses. No data were otherwise excluded from the analyses.

Because this study involved secondary analysis of existing human genomic data, randomization and blinding were not applicable. The investigators were not blinded to sample status during analyses. Scripts for computational analyses performed were deposited in a GitHub repository (https://github.com/buxbaum-lab/GALA) to ensure reproducibility. Key results were independently replicated using validation datasets as described.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Sequencing data for the ASC and GALA samples are available through controlled access via the Database of Genotypes and Phenotypes (accession number phs002502) and on the National Human Genome Research Institute Genomic Data Science Analysis, Visualization and Informatics Lab-space (AnVIL) under accession number phs002502.v1.p1 (https://anvilproject.org/data). SPARK phenotype and sequencing data are available to authorized users through the SFARI Base (https://www.sfari.org/resource/sfari-base/).

Individual-level data from the ASC and GALA cohorts are not publicly available due to participant privacy restrictions. Researchers may request access by contacting J.D.B. (joseph.buxbaum@mssm.edu). All requests will be reviewed by the Mount Sinai Institutional Data Access Committee to ensure compliance with participant consent and IRB protocols. Reasonable requests will receive a response within 2−4 weeks. Summary variant counts, gene-level burden statistics and figure source data are available in the accompanying Supplementary Tables and at https://github.com/buxbaum-lab/GALA.

Code availability

All software used in this study is publicly available at the cited references. The R code used to generate the TADA analysis and figures is available under the MIT license at https://github.com/buxbaum-lab/GALA.

References

Lord, C. et al. Autism spectrum disorder. Nat. Rev. Dis. Primers 6, 5 (2020).
Article PubMed PubMed Central Google Scholar
Klei, L. et al. Common genetic variants, acting additively, are a major source of risk for autism. Mol. Autism 3, 9 (2012).
Article PubMed PubMed Central Google Scholar
Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fu, J. M. et al. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat. Genet. 54, 1320–1331 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. et al. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat. Genet. 54, 1305–1319 (2022).
Article CAS PubMed PubMed Central Google Scholar
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584 (2020).
Article CAS PubMed PubMed Central Google Scholar
Davidson, B. L. et al. Gene-based therapeutics for rare genetic neurodevelopmental psychiatric disorders. Mol. Ther. 30, 2416–2428 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243–250 (2022).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Venner, E. et al. The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities. Commun. Biol. 7, 174 (2024).
Article PubMed PubMed Central Google Scholar
Abul-Husn, N. S. et al. Molecular diagnostic yield of genome sequencing versus targeted gene panel testing in racially and ethnically diverse pediatric patients. Genet. Med. 25, 100880 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wright, C. F. et al. Genomic diagnosis of rare pediatric disease in the United Kingdom and Ireland. N. Engl. J. Med. 388, 1559–1571 (2023).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Moreno-Estrada, A. et al. Human genetics. The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science 344, 1280–1285 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ongaro, L. et al. The genomic impact of European colonization of the Americas. Curr. Biol. 29, 3974–3986 (2019).
Article CAS PubMed Google Scholar
Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).
Article CAS PubMed PubMed Central Google Scholar
DeFelice, M. et al. Blended genome exome (BGE) as a cost efficient alternative to deep whole genomes or arrays. Preprint at bioRxiv https://doi.org/10.1101/2024.04.03.587209 (2024).
Buxbaum, J. D. et al. The autism sequencing consortium: large-scale, high-throughput sequencing in autism spectrum disorders. Neuron 76, 1052–1056 (2012).
Article CAS PubMed PubMed Central Google Scholar
SPARK Consortium. SPARK: a US cohort of 50,000 families to accelerate autism research. Neuron 97, 488–493 (2018).
Article Google Scholar
Abul-Husn, N. S. et al. Implementing genomic screening in diverse populations. Genome Med. 13, 17 (2021).
Article CAS PubMed PubMed Central Google Scholar
Belbin, G. M. et al. Toward a fine-scale population health monitoring system. Cell 184, 2068–2083 (2021).
Article CAS PubMed Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2023).
Article PubMed PubMed Central Google Scholar
Babadi, M. et al. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat. Genet. 55, 1589–1597 (2023).
Article CAS PubMed PubMed Central Google Scholar
Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at bioRxiv https://doi.org/10.1101/148353 (2017).
Browning, S. R. et al. Ancestry-specific recent effective population size in the Americas. PLoS Genet. 14, e1007385 (2018).
Article PubMed PubMed Central Google Scholar
He, X. et al. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9, e1003671 (2013).
Article CAS PubMed PubMed Central Google Scholar
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).
Article PubMed PubMed Central Google Scholar
Kaplanis, J. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020).
Article CAS PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article PubMed PubMed Central Google Scholar
Kopanos, C. et al. VarSome: the human genomic variant search engine. Bioinformatics 35, 1978–1980 (2018).
Article Google Scholar
Eric, V. et al. Neptune: an environment for the delivery of genomic medicine. Genet. Med. 23, 1838–1846 (2021).
Article PubMed PubMed Central Google Scholar
Miller, D. T. et al. ACMG SF v3.0 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 1381–1390 (2021).
Article PubMed Google Scholar
Arriaga-MacKenzie, I. S. et al. Summix: a method for detecting and adjusting for population structure in genetic summary data. Am. J. Hum. Genet. 108, 1270–1282 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gudmundsson, S. et al. Variant interpretation using population databases: lessons from gnomAD. Hum. Mutat. 43, 1012–1030 (2022).
Article PubMed Google Scholar
Hou, K. et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 55, 549–558 (2023).
Article CAS PubMed PubMed Central Google Scholar
Saitou, M., Dahl, A., Wang, Q. & Liu, X. Allele frequency impacts the cross-ancestry portability of gene expression prediction in lymphoblastoid cell lines. Am. J. Hum. Genet. 111, 2814–2825 (2024).
Article CAS PubMed PubMed Central Google Scholar
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
Article CAS PubMed PubMed Central Google Scholar
Honorato-Mauer, J. et al. Characterizing features affecting local ancestry inference performance in admixed populations. Am. J. Hum. Genet. 112, 224–234 (2025).
Article CAS PubMed PubMed Central Google Scholar
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).
Article PubMed PubMed Central Google Scholar
Ciesielski, T. H., Sirugo, G., Iyengar, S. K. & Williams, S. M. Characterizing the pathogenicity of genetic variants: the consequences of context. npj Genom. Med. 9, 3 (2024).
Article PubMed PubMed Central Google Scholar
Sharo, A. G., Zou, Y., Adhikari, A. N. & Brenner, S. E. ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden. Genome Med. 15, 51 (2023).
Article PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Jun, G. et al. Structural variation across 138,134 samples in the TOPMed consortium. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-2515453/v1 (2023).
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Article CAS PubMed PubMed Central Google Scholar
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yilmaz, F. et al. Genome-wide copy number variations in a large cohort of bantu African children. BMC Med. Genom. 14, 129 (2021).
Article CAS Google Scholar
Pereira, L., Mutesa, L., Tindana, P. & Ramsay, M. African genetic diversity and adaptation inform a precision medicine agenda. Nat. Rev. Genet. 22, 284–306 (2021).
Article CAS PubMed Google Scholar
Gomez, F., Hirbo, J. & Tishkoff, S. A. Genetic variation and adaptation in Africa: implications for human evolution and disease. Cold Spring Harb. Perspect. Biol. 6, a008524 (2014).
Article PubMed PubMed Central Google Scholar
Yu, N. et al. Larger genetic differences within Africans than between Africans and Eurasians. Genetics 161, 269–274 (2002).
Article CAS PubMed PubMed Central Google Scholar
Schaaf, C. P. et al. A framework for an evidence-based gene list relevant to autism spectrum disorder. Nat. Rev. Genet. 21, 367–376 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jay, J. J. & Brouwer, C. Lollipops in the clinic: information dense mutation plots for precision medicine. PLoS ONE 11, e0160519 (2016).
Article PubMed PubMed Central Google Scholar
McInnes, L. A. et al. A genetic study of autism in Costa Rica: multiple variables affecting IQ scores observed in a preliminary sample of autistic cases. BMC Psychiatry 5, 15 (2005).
Article PubMed PubMed Central Google Scholar
McInnes, L. A. et al. The NRG1 exon 11 missense variant is not associated with autism in the Central Valley of Costa Rica. BMC Psychiatry 7, 21 (2007).
Article PubMed PubMed Central Google Scholar
Buxbaum, J. et al. The Autism Simplex Collection: an international, expertly phenotyped autism sample for genetic and phenotypic analyses. Mol. Autism 5, 34 (2014).
Article PubMed PubMed Central Google Scholar
Naslavsky, M. S. et al. Exomic variants of an elderly cohort of Brazilians in the ABraOM database. Hum. Mutat. 38, 751–763 (2017).
Article CAS PubMed Google Scholar
Hertz-Picciotto, I. et al. The CHARGE study: an epidemiologic investigation of genetic and environmental factors contributing to autism. Environ. Health Perspect. 114, 1119–1125 (2006).
Article PubMed PubMed Central Google Scholar
Mathews, C. A. et al. Genetic studies of neuropsychiatric disorders in Costa Rica: a model for the use of isolated populations. Psychiatr. Genet. 14, 13–23 (2004).
Article PubMed Google Scholar
Purcell, S. M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).
PubMed PubMed Central Google Scholar
Hansen, B. B. & Olsen Klopfer, S. Optimal full matching and related designs via network flows. J. Comput. Graph. Stat. 15, 609–627 (2006).
Article Google Scholar
Miller, D. T. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2021 update: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 1391–1398 (2021).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

GALA is currently supported by the National Institutes of Health (grant MH128813, J.D.B.), the Seaver Autism Center for Research and Treatment and the SWT and Seaver Foundations. GALA originated with sites from, and with support of, the ASC (MH129724, J.D.B.; MH129722, M.D.; MH129725, K.R; MH129751, S.S.; and prior ASC funding—for example, MH100233 and MH111661). ASC sites continue to support analyses of GALA studies, with additional analyses supported by MH128813. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by Clinical and Translational Science Awards grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this paper was also supported by the Office of Research Infrastructure of the National Institutes of Health under award numbers S10OD026880 and S10OD030463. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This study makes use of data generated by the DECIPHER community. A full list of centers that contributed to the generation of the data is available from https://deciphergenomics.org/about/stats and via email from contact@deciphergenomics.org. DECIPHER is hosted by the EMBL-EBI, and funding for the DECIPHER project was provided by the Wellcome Trust (grant no. WT223718/Z/21/Z).

Author information

Authors and Affiliations

Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Marina Natividad Avila, Seulgi Jung, Tess Levy, Laura G. Sloofman, Thariana Pichardo, Dalia Marquez, Alexander Kolevzon, Paige M. Siper, Silvia De Rubeis, Jennifer Foss-Feig, Erina Hara, Danielle Halpern, Yi Li, Catherine Sancimino, Renee Soufer, Jessica Zweifach, Brett Collins, Abraham Reichenberg, Sven Sandin, Laura Sloofman, Behrang Mahjani, Silvia De Rubeis & Joseph D. Buxbaum
Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Marina Natividad Avila, Seulgi Jung, Tess Levy, Laura G. Sloofman, Thariana Pichardo, Dalia Marquez, Alexander Kolevzon, Paige M. Siper, Silvia De Rubeis, Jennifer Foss-Feig, Erina Hara, Danielle Halpern, Yi Li, Catherine Sancimino, Renee Soufer, Jessica Zweifach, Brett Collins, Abraham Reichenberg, Sven Sandin, Laura Sloofman, Behrang Mahjani, Silvia De Rubeis & Joseph D. Buxbaum
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Marina Natividad Avila, Seulgi Jung, Laura G. Sloofman, Mafalda Barbosa, Laura Sloofman, Behrang Mahjani & Joseph D. Buxbaum
Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Marina Natividad Avila, Seulgi Jung, Tess Levy, Laura G. Sloofman, Thariana Pichardo, Silvia De Rubeis, Laura Sloofman, Silvia De Rubeis & Joseph D. Buxbaum
Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Marina Natividad Avila, Seulgi Jung, Laura G. Sloofman, Laura Sloofman & Joseph D. Buxbaum
The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Marina Natividad Avila, Seulgi Jung, Tess Levy, Laura G. Sloofman, Thariana Pichardo, Paige M. Siper, Silvia De Rubeis, Jennifer Foss-Feig, Mafalda Barbosa, Brett Collins, Abraham Reichenberg, Laura Sloofman, Behrang Mahjani, Silvia De Rubeis & Joseph D. Buxbaum
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
F. Kyle Satterstrom, Jack M. Fu, Christine R. Stevens, Mykyta Artomov, Harrison Brand, Ryan L. Collins, Sherif Gerges, Aarno Palotie, Michael E. Talkowski & Mark J. Daly
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
F. Kyle Satterstrom, Christine R. Stevens, Caroline M. Cusick, Mykyta Artomov, Sherif Gerges, Michael E. Talkowski & Mark J. Daly
Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
F. Kyle Satterstrom, Christine R. Stevens, Aarno Palotie & Mark J. Daly
Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
Jack M. Fu, Michael E. Talkowski & Mark J. Daly
Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
Jack M. Fu, Mykyta Artomov, Harrison Brand, Ryan L. Collins, Sherif Gerges & Michael E. Talkowski
Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
Lambertus Klei, Nancy Minshew & Bernie Devlin
Division of Research, Kaiser Permanente Northern, Pleasanton, CA, USA
Jennifer L. Ames, Hilda Cerros & Lisa A. Croen
Centro de Estudos do Genoma Humano e Células-Tronco, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brasil
Gabriele S. Campos, Claudia I. S. Costa, Ana Cristina D. E. S. Girardi, Naila Lourenço, Jaqueline Y. T. Wang & Maria Rita Passos-Bueno
Facultad de Medicina, Universidad de los Andes, Bogotá, Colombia
Roberto Chaskel, Andrea del Pilar Lopez & Andrea del Pilar Lopez
Instituto Colombiano del Sistema Nervioso, Clínica Montserrat, Bogotá, Colombia
Roberto Chaskel, Magdalena Fernandez & Eugenio Ferro
John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
Michael L. Cuccaro, Anthony J. Griswold, Margaret A. Pericak-Vance & Margaret Pericak-Vance
The Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
Michael L. Cuccaro, Anthony J. Griswold, Margaret A. Pericak-Vance & Margaret Pericak-Vance
Facultad de Ciencias, Universidad de los Andes, Bogotá, Colombia
Liliana Galeano, Luis C. Hernandez, Katherine P. Peña & Maria Claudia Lattig
MIND (Medical Investigation of Neurodevelopmental Disorders) Institute, University of California, Davis, Davis, CA, USA
Yunin Ludena, Isaac Pessah, Rebecca Schmidt, Irva Hertz-Picciotto, Flora Tassone, Isaac N. Pessah & Rebecca J. Schmidt
Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
Diana Núñez-Ríos
National Center of Posttraumatic Stress Disorders, VA CT Healthcare Center, West Haven, CT, USA
Diana Núñez-Ríos
Centro Ann Sullivan del Peru, Lima, Peru
Rosa Oyama, Lizbeth Tolentino & Liliana Mayo
Center Ann Sullivan International, Lawrence, KS, USA
Holly M. Sweeney
Hospital Psiquiátrico Infantil Dr. Juan N. Navarro, Mexico City, Mexico
Lilia Albores-Gallo
Universidad Nacional Autónoma de México, Mexico City, Mexico
Lilia Albores-Gallo
Kaiser Permanente School of Medicine, Pasadena, CA, USA
Lisa A. Croen
Departamento de Genética, Subdirección de Investigaciones Clínicas, Instituto Nacional de Psiquiatría Ramón de la Fuente Muñiz México, Ciudad de México, Mexico
Carlos S. Cruz-Fuentes
Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Alexander Kolevzon
Department of Biochemistry and Molecular Medicine, University of California, Davis, School of Medicine, Davis, CA, USA
Flora Tassone
Psychiatry and Behavioral Sciences, Boston Children’s Hospital, Boston, MA, USA
M. Pilar Trelles
Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA
Ryan L. Collins & Michael E. Talkowski
Department of Medicine, Harvard Medical School, Boston, MA, USA
Mark J. Daly
Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
Aarno Palotie & Mark J. Daly
Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Behrang Mahjani
Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
Behrang Mahjani
The Alper Center for Neural Development and Regeneration, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Silvia De Rubeis & Silvia De Rubeis
Department of Psychiatry, University of Illinois Chicago, Chicago, IL, USA
Edwin H. Cook
Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA
Kathryn Roeder
Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
Kathryn Roeder
Sorbonne Université, INSERM, CNRS, Institut de Biologie Paris Seine, Center for Neuroscience at Sorbonne Université, Paris, France
Catalina Betancur
Department of Psychiatry, Graduate School of Medicine, Nagoya University, Nagoya, Japan
Branko Aleksic, Andreas G. Chiocchetti, Christine M. Freitag, Sabine Schlitt, Katja Schneider-Momm & Karoline Teufel
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Mykyta Artomov, Harrison Brand, Ryan L. Collins & Sherif Gerges
Med Biotech Hub and Competence Center, Department of Medical Biotechnologies, University of Siena, Siena, Italy
Elisa Benetti, Chiara Fallerini, Caterina Lo Rizzo & Alessandra Renieri
Medical Genetics, University of Siena, Siena, Italy
Elisa Benetti, Chiara Fallerini, Caterina Lo Rizzo, Marianna Manara & Alessandra Renieri
Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Goethe University Frankfurt, Frankfurt, Germany
Monica Biscaldi-Schafer & David M. Hougaard
The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
Anders D. Børglum, Itaru Kushima & Norio Ozaki
Department of Biomedicine—Human Genetics, Aarhus University, Aarhus, Denmark
Anders D. Børglum
Center for Genomics and Personalized Medicine, Aarhus, Denmark
Anders D. Børglum
Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
Anders D. Børglum
Pediatric Surgical Research Laboratories, Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
Harrison Brand
Department of Medical Sciences, University of Torino, Turin, Italy
Alfredo Brusco, Elisa Giorgio, Lisa Pavinato & Slavica Trajkova
Medical Genetics Unit, ‘Città della Salute e della Scienza’ University Hospital, Turin, Italy
Alfredo Brusco
Department of Public Health and Pediatrics, University of Torino, Turin, Italy
Simona Cardaropoli, Diana Carli & Giovanni Battista Ferrero
Grupo de Medicina Xenómica, Centro de Investigación en Red de Enfermedades Raras (CIBERER), CIMUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Angel Carracedo & Montserrat Fernández-Prieto
Fundación Pública Galega de Medicina Xenómica, Servicio Galego de Saúde (SERGAS), Santiago de Compostela, Spain
Angel Carracedo
Department of Pediatrics and Adolescent Medicine, Duchess of Kent Children’s Hospital, The University of Hong Kong, Hong Kong Special Administrative Region, Hong Kong, China
Marcus C. Y. Chan, Brian H. Y. Chung, So Lun Lee & Mullin H. C. Yu
Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
Hilary Coon
Department of Psychiatry, Huntsman Mental Health Institute, University of Utah, Salt Lake City, UT, USA
Hilary Coon
Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, USA
David J. Cutler
Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, USA
Ryan N. Doan
Department of Cellular, Computational and Integrative Biology, University of Trento, Trento, Italy
Enrico Domenici
Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
Shan Dong, Lindsay Liang, Alicia Ljungdahl & Lauren A. Weiss
Neurogenetics group, Instituto de Investigación Sanitaria de Santiago (IDIS-SERGAS), Santiago de Compostela, Spain
Montserrat Fernández-Prieto
Center for Autism Research and Translation, University of California, Irvine, Irvine, CA, USA
J. Jay Gargus, Rachel Nguyen & Moyra Smith
Institute for Juvenile Research, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA
Stephen Guter, Suma Jacob, Nell Maltman & Lauren Schmitt
Department of Diagnostic and Biomedical Sciences, The University of Texas Health Science Center at Houston, School of Dentistry, Houston, TX, USA
Emily Hansen-Kiss
The Research Institute at Nationwide Children’s Hospital, Columbus, OH, USA
Gail E. Herman
Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
David M. Hougaard
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Christina M. Hultman & Sven Sandin
Department of Child Psychiatry, Tampere University and Tampere University Hospital, Tampere, Finland
Miia Kaartinen & Kaija Puura
Medical Genomics Center, Nagoya University Hospital, Nagoya, Japan
Itaru Kushima
Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
Terho Lehtimäki
Service for Neurodevelopmental Disorders, University Campus Bio-medico of Rome, Rome, Italy
Carla Lintas
Life and Health Sciences Research Institute, School of Medicine, University of Minho, Braga, Portugal
Patricia Maciel
Genetica Medica, Azienda Ospedaliera Universitaria Senese, Siena, Italy
Marianna Manara & Alessandra Renieri
Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
Dara S. Manoach
The Azrieli National Center for Autism and Neurodevelopment Research, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
Gal Meiri
Pre-School Psychiatry Unit, Soroka University Medical Center, Be’er-Sheva, Israel
Gal Meiri
Department of Public Health, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
Idan Menashe
National Autism Research Center of Israel, Ben-Gurion University of the Negev, Be’er-Sheva, Israel
Idan Menashe
Children’s Center for Autism Research and Training, University of Kansas, Lawrence, KS, USA
Judith Miller
Department of Psychiatry, University of Utah, Salt Lake City, UT, USA
Judith Miller
Life Span Institute and Kansas Center for Autism Research and Training, University of Kansas, Lawrence, KS, USA
Matthew Mosconi
Institute for Glyco-core Research (iGCORE), Nagoya University, Nagoya, Japan
Norio Ozaki
Psychiatric & Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
Aarno Palotie
Department of Child and Adolescent Psychiatry, Hospital General Universitario Gregorio Marañón, IiSGM, CIBERSAM, School of Medicine Complutense University, Madrid, Spain
Mara Parellada
Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
Minshi Peng
Interdepartmental Program ‘Autism 0-90’, ‘Gaetano Martino’ University Hospital, University of Messina, Messina, Italy
Antonio M. Persico
Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Abraham Reichenberg
Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, UK
Stephan J. Sanders
Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
Stephan J. Sanders
New York Genome Center, New York, NY, USA
Stephan J. Sanders
Program in Genetics and Genome Biology, The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada
Stephen W. Scherer & Ryan Yuen
Department of Molecular Genetics and McLaughlin Centre, University of Toronto, Toronto, Ontario, Canada
Stephen W. Scherer
Norwegian Institute of Public Health, Oslo, Norway
Pål Suren
Department of Molecular Physiology & Biophysics and Psychiatry, Vanderbilt University School of Medicine, Nashville, TN, USA
James S. Sutcliffe
Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, USA
James S. Sutcliffe
Department of Psychiatry, University of Cincinnati, Cincinnati, OH, USA
John A. Sweeney
Department of Neurosciences, Biomedicine and Movement Sciences, Section of Biology and Genetics, University of Verona, Verona, Italy
Elisabetta Trabetti
Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Brie Wamsley

Authors

Marina Natividad Avila
View author publications
Search author on:PubMed Google Scholar
Seulgi Jung
View author publications
Search author on:PubMed Google Scholar
F. Kyle Satterstrom
View author publications
Search author on:PubMed Google Scholar
Jack M. Fu
View author publications
Search author on:PubMed Google Scholar
Tess Levy
View author publications
Search author on:PubMed Google Scholar
Laura G. Sloofman
View author publications
Search author on:PubMed Google Scholar
Lambertus Klei
View author publications
Search author on:PubMed Google Scholar
Thariana Pichardo
View author publications
Search author on:PubMed Google Scholar
Dalia Marquez
View author publications
Search author on:PubMed Google Scholar
Christine R. Stevens
View author publications
Search author on:PubMed Google Scholar
Caroline M. Cusick
View author publications
Search author on:PubMed Google Scholar
Jennifer L. Ames
View author publications
Search author on:PubMed Google Scholar
Gabriele S. Campos
View author publications
Search author on:PubMed Google Scholar
Hilda Cerros
View author publications
Search author on:PubMed Google Scholar
Roberto Chaskel
View author publications
Search author on:PubMed Google Scholar
Claudia I. S. Costa
View author publications
Search author on:PubMed Google Scholar
Michael L. Cuccaro
View author publications
Search author on:PubMed Google Scholar
Andrea del Pilar Lopez
View author publications
Search author on:PubMed Google Scholar
Magdalena Fernandez
View author publications
Search author on:PubMed Google Scholar
Eugenio Ferro
View author publications
Search author on:PubMed Google Scholar
Liliana Galeano
View author publications
Search author on:PubMed Google Scholar
Ana Cristina D. E. S. Girardi
View author publications
Search author on:PubMed Google Scholar
Anthony J. Griswold
View author publications
Search author on:PubMed Google Scholar
Luis C. Hernandez
View author publications
Search author on:PubMed Google Scholar
Naila Lourenço
View author publications
Search author on:PubMed Google Scholar
Yunin Ludena
View author publications
Search author on:PubMed Google Scholar
Diana Núñez-Ríos
View author publications
Search author on:PubMed Google Scholar
Rosa Oyama
View author publications
Search author on:PubMed Google Scholar
Katherine P. Peña
View author publications
Search author on:PubMed Google Scholar
Isaac Pessah
View author publications
Search author on:PubMed Google Scholar
Rebecca Schmidt
View author publications
Search author on:PubMed Google Scholar
Holly M. Sweeney
View author publications
Search author on:PubMed Google Scholar
Lizbeth Tolentino
View author publications
Search author on:PubMed Google Scholar
Jaqueline Y. T. Wang
View author publications
Search author on:PubMed Google Scholar
Lilia Albores-Gallo
View author publications
Search author on:PubMed Google Scholar
Lisa A. Croen
View author publications
Search author on:PubMed Google Scholar
Carlos S. Cruz-Fuentes
View author publications
Search author on:PubMed Google Scholar
Irva Hertz-Picciotto
View author publications
Search author on:PubMed Google Scholar
Alexander Kolevzon
View author publications
Search author on:PubMed Google Scholar
Maria Claudia Lattig
View author publications
Search author on:PubMed Google Scholar
Liliana Mayo
View author publications
Search author on:PubMed Google Scholar
Maria Rita Passos-Bueno
View author publications
Search author on:PubMed Google Scholar
Margaret A. Pericak-Vance
View author publications
Search author on:PubMed Google Scholar
Paige M. Siper
View author publications
Search author on:PubMed Google Scholar
Flora Tassone
View author publications
Search author on:PubMed Google Scholar
M. Pilar Trelles
View author publications
Search author on:PubMed Google Scholar
Michael E. Talkowski
View author publications
Search author on:PubMed Google Scholar
Mark J. Daly
View author publications
Search author on:PubMed Google Scholar
Behrang Mahjani
View author publications
Search author on:PubMed Google Scholar
Silvia De Rubeis
View author publications
Search author on:PubMed Google Scholar
Edwin H. Cook
View author publications
Search author on:PubMed Google Scholar
Kathryn Roeder
View author publications
Search author on:PubMed Google Scholar
Catalina Betancur
View author publications
Search author on:PubMed Google Scholar
Bernie Devlin
View author publications
Search author on:PubMed Google Scholar
Joseph D. Buxbaum
View author publications
Search author on:PubMed Google Scholar

Consortia

GALA Consortium

Lilia Albores-Gallo
, Jennifer L. Ames
, Catalina Betancur
, Joseph D. Buxbaum
, Gabriele S. Campos
, Hilda Cerros
, Roberto Chaskel
, Edwin H. Cook
, Claudia I. S. Costa
, Lisa A. Croen
, Carlos S. Cruz-Fuentes
, Michael L. Cuccaro
, Silvia De Rubeis
, Bernie Devlin
, Magdalena Fernandez
, Eugenio Ferro
, Jennifer Foss-Feig
, Liliana Galeano
, Ana Cristina D. E. S. Girardi
, Anthony J. Griswold
, Erina Hara
, Danielle Halpern
, Luis C. Hernandez
, Irva Hertz-Picciotto
, Seulgi Jung
, Lambertus Klei
, Alexander Kolevzon
, Maria Claudia Lattig
, Tess Levy
, Yi Li
, Andrea del Pilar Lopez
, Naila Lourenço
, Yunin Ludena
, Behrang Mahjani
, Dalia Marquez
, Liliana Mayo
, Marina Natividad Avila
, Diana Núñez-Ríos
, Rosa Oyama
, Maria Rita Passos-Bueno
, Katherine P. Peña
, Margaret A. Pericak-Vance
, Isaac Pessah
, Thariana Pichardo
, Kathryn Roeder
, Catherine Sancimino
, Rebecca Schmidt
, Paige M. Siper
, Laura G. Sloofman
, Renee Soufer
, Holly M. Sweeney
, Flora Tassone
, Lizbeth Tolentino
, M. Pilar Trelles
, Jaqueline Y. T. Wang
& Jessica Zweifach

The Autism Sequencing Consortium (ASC)

Branko Aleksic
, Mykyta Artomov
, Mafalda Barbosa
, Elisa Benetti
, Catalina Betancur
, Monica Biscaldi-Schafer
, Anders D. Børglum
, Harrison Brand
, Alfredo Brusco
, Joseph D. Buxbaum
, Gabriele S. Campos
, Simona Cardaropoli
, Diana Carli
, Angel Carracedo
, Marcus C. Y. Chan
, Andreas G. Chiocchetti
, Brian H. Y. Chung
, Brett Collins
, Ryan L. Collins
, Edwin H. Cook
, Hilary Coon
, Claudia I. S. Costa
, Michael L. Cuccaro
, David J. Cutler
, Mark J. Daly
, Silvia De Rubeis
, Bernie Devlin
, Ryan N. Doan
, Enrico Domenici
, Shan Dong
, Chiara Fallerini
, Magdalena Fernandez
, Montserrat Fernández-Prieto
, Giovanni Battista Ferrero
, Eugenio Ferro
, Jennifer Foss-Feig
, Christine M. Freitag
, Jack M. Fu
, Liliana Galeano
, J. Jay Gargus
, Sherif Gerges
, Elisa Giorgio
, Ana Cristina D. E. S. Girardi
, Stephen Guter
, Emily Hansen-Kiss
, Erina Hara
, Gail E. Herman
, Luis C. Hernandez
, Irva Hertz-Picciotto
, David M. Hougaard
, Christina M. Hultman
, Suma Jacob
, Miia Kaartinen
, Lambertus Klei
, Alexander Kolevzon
, Itaru Kushima
, Maria Claudia Lattig
, So Lun Lee
, Terho Lehtimäki
, Lindsay Liang
, Carla Lintas
, Alicia Ljungdahl
, Andrea del Pilar Lopez
, Caterina Lo Rizzo
, Yunin Ludena
, Patricia Maciel
, Behrang Mahjani
, Nell Maltman
, Marianna Manara
, Dara S. Manoach
, Gal Meiri
, Idan Menashe
, Judith Miller
, Nancy Minshew
, Matthew Mosconi
, Marina Natividad Avila
, Rachel Nguyen
, Norio Ozaki
, Aarno Palotie
, Mara Parellada
, Maria Rita Passos-Bueno
, Lisa Pavinato
, Katherine P. Peña
, Minshi Peng
, Margaret Pericak-Vance
, Antonio M. Persico
, Isaac N. Pessah
, Thariana Pichardo
, Kaija Puura
, Abraham Reichenberg
, Alessandra Renieri
, Kathryn Roeder
, Catherine Sancimino
, Stephan J. Sanders
, Sven Sandin
, F. Kyle Satterstrom
, Stephen W. Scherer
, Sabine Schlitt
, Rebecca J. Schmidt
, Lauren Schmitt
, Katja Schneider-Momm
, Paige M. Siper
, Laura Sloofman
, Moyra Smith
, Renee Soufer
, Christine R. Stevens
, Pål Suren
, James S. Sutcliffe
, John A. Sweeney
, Michael E. Talkowski
, Flora Tassone
, Karoline Teufel
, Elisabetta Trabetti
, Slavica Trajkova
, M. Pilar Trelles
, Brie Wamsley
, Jaqueline Y. T. Wang
, Lauren A. Weiss
, Mullin H. C. Yu
& Ryan Yuen

Contributions

K.R., B.D., C.B. and J.D.B. conceived and designed the study. T.L., T.P., C.R.S., C.M.C., J.L.A., G.S.C., H.C., R.C., C.I.S.C., M.L.C., A.D.P.L., M.F., E.F., L.G., A.C.D.E.S.G., A.J.G., L.C.H., N.L., Y.L., D.N.-R., R.O., K.P.P., I.P., R.S., H.M.S., L.T., J.Y.T.W., L.A.-G., L.A.C., C.S.C.-F., I.H.-P., A.K., M.C.L., L.M., M.R.P.-B., M.A.P.-V., P.S., F.T., M.P.T., M.E.T., M.J.D. and J.D.B. contributed samples and generated data. J.D.B., C.B., B.D., B.M., S.D.R., L.K., L.S., J.M.F., F.K.S., S.J. and M.N.A. developed methodology and performed data analyses. J.D.B., B.D., C.B., E.H.C., T.L., L.S. and M.N.A. drafted and revised the paper. All authors reviewed and approved the final version of the paper. J.D.B. supervised the study.

Corresponding author

Correspondence to Joseph D. Buxbaum.

Ethics declarations

Competing interests

L.A.-G. is the main author of the CRIDI-ASD interview; she teaches the training course for the aforementioned instrument and receives payment for the training. The other authors declare no conflicts of interest.

Peer review

Peer review information

Nature Medicine thanks Andres Moreno-Estrada and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Anna Ranzoni, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Data processing for samples from three different data sources.

The figure describes the variant, genotype, and sample quality control steps that were implemented to process the raw, joint-genotyped VCFs and generate the de novo and inherited calls used for downstream analyses. Sample counts are tabulated before downstream ancestry filtering.

Extended Data Fig. 2 Comparison of rare de novo variant counts per sample between ASD probands and unaffected siblings across different ancestries, normalized to synonymous variant rates.

The average number of rare variants per sample –normalized by the synonymous de novo variant rate– is compared between ASD probands and their unaffected siblings for all ancestries (ALL: 17,480 probands and 6,208 siblings), Admixed American (AMR: 4,450 probands and 1,459 siblings), and non-Admixed American (Fu_COMP: 13,030 probands and 4,749 siblings). The analysis considers: (a) protein truncating variants (PTVs) in highly constrained genes (LOEUF deciles 1–3, 5,363 genes) and less constrained genes (LOEUF deciles 4–10, 12,765 genes); (b) missense variants categorized by predicted functional severity (MPC ≥ 2 for high severity, 1 ≤ MPC < 2 for moderate severity); and (c) MPC < 1 (for low severity) and synonymous missense variants. Data are presented as mean values ± 95% confidence intervals. Statistical significance was assessed using two-sided z-tests comparing normalized de novo mutation rates between probands and siblings. P values were adjusted for multiple comparisons using the Benjamini–Hochberg false discovery rate (FDR) method, and exact adjusted P values are shown above the bars.

Extended Data Fig. 3 Genic burden of PTVs across different ancestries in gnomAD v2.1.1 as a function of gene constraint.

The sum of observed PTVs per ancestry is plotted, scaled to each population’s size and total gene coding sequence length within gnomAD LOEUF deciles. The plot includes African African American (AFR, nN = 8,128), Admixed American (AMR, nN = 17,296), East Asian (EAS, n = 9,197), Non-Finnish European (NFE, n = 56,885), and South Asian (SAS, n = 15,308) ancestries. LOEUF deciles represent levels of gene constraint, with lower deciles indicating more constrained genes.

Extended Data Fig. 4 Relative contribution to TADA signal by mode of inheritance.

The proportional impact of each inheritance mode on the ASD-associated genes is shown at three false discovery rate (FDR) thresholds: ≤0.1 (a, d), ≤0.05 (b, e), and ≤0.01 (c, f). Panels (a–c) display results for the GALA cohort, while panels (d–f) show results for the Fu_COMP subset from Fu et al.⁴. BF, Bayes Factor.

Extended Data Fig. 5 Relative contribution to TADA signal by variant type.

The proportional impact of each variant type on the ASD-associated genes is shown at three false discovery rate (FDR) thresholds: ≤0.1 (a, d), ≤0.05 (b, e), and ≤0.01 (c, f). Panels (a–c) display results for the GALA cohort, while panels (d–f) show results for the Fu_COMP subset from Fu et al.⁴. BF, Bayes Factor.

Extended Data Fig. 6 Classification rates and proportions of P/LP variants across AMR and non-AMR populations using Neptune.

The figure compares the classification rates and proportions of P/LP variants in the indicated subsamples. Left: The ratio of (upper) classified variants (by Neptune) to total variants, (middle) P/LP variants to total variants, and (lower) P/LP variants to Neptune classified variants is shown for AMR, non-AMR, non-European (non-EUR) and EUR ancestries. Right: Comparisons include (upper) the total number of variants, (middle) the number of classified variants, and (lower) the number of P/LP variants, all expressed per proband. AMR participants have more variants per individual (both total and Neptune-classified) compared to non-AMR participants, but a reduced ability of Neptune to classify variants in AMR contributes to a slightly lower proportion of P/LP variants per individual. Similar results are seen for non-EUR versus EUR. Data are presented as mean values ± 95% confidence intervals (error bars show the plotted CI bounds). Statistical analysis: pairwise two-sided z-tests were used to compare groups within each panel; P values were adjusted for multiple comparisons using the Benjamini–Hochberg FDR procedure. Asterisks indicate adjusted P < 0.05.

Extended Data Fig. 7 Lollipop diagrams illustrating variants identified in emerging autism-associated genes.

Variants observed in GALA analyses of AMR individuals are marked with pink circles, those found in Fu_COMP individuals are marked with green, and variants found in DECIPHER are in purple. Figures were generated using the lollipop software package⁵².

Extended Data Fig. 8 Evaluation of ancestry composition and variant burden among GALA probands.

(a) Ancestry proportions for all GALA individuals (left) and among carriers of damaging rare variants (right) inferred using a Random Forest classifier trained on 1000 Genomes + Human Genome Diversity Project (HGDP) reference populations. Most individuals display majority Admixed American (AMR) ancestry. (b) Ternary plots showing the distribution of ancestry proportions among all GALA individuals (left) and among carriers of damaging rare variants (right).

Extended Data Table 1 Observed and expected values for 17 genes identified in both GALA and Fu_COMP

Full size table

Extended Data Table 2 Summary of emerging and notable gene-level findings in GALA

Full size table

Supplementary information

Reporting Summary (download PDF )

Supplementary Tables (download XLSX )

Supplementary Tables workbook.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Natividad Avila, M., Jung, S., Satterstrom, F.K. et al. Deleterious coding variation associated with autism is shared across ancestries. Nat Med (2026). https://doi.org/10.1038/s41591-026-04228-6

Download citation

Received: 20 December 2024
Accepted: 14 January 2026
Published: 30 March 2026
Version of record: 30 March 2026
DOI: https://doi.org/10.1038/s41591-026-04228-6

Subjects

Abstract

Similar content being viewed by others

Main

Results

Rare variant landscape in Latin Americans diagnosed with ASD

Autism gene discovery in Latin Americans

Implications for clinical genetics

Discussion

Methods

Cohort description

Description of GALA sites

New York, USA

São Paulo, Brazil

Bogotá, Colombia

Mexico City, Mexico

Lima, Peru

California, USA (CHARGE)

Florida, USA

Central Valley, Costa Rica

USA and Europe (TASC)

California, USA (Kaiser Permanente)

Ancestry determination and sample-level quality control

De novo variants

Inherited variants

Case−control variants

CNV analysis

Genetic association analyses

Expected versus observed mutations in GALA

Clinical genetics analyses

ACMG interpretation of variants

Inclusion and ethics statement

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

GALA Consortium

The Autism Sequencing Consortium (ASC)

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links