Abstract
To better understand large-effect pathogenic variation associated with autism, we generated long-read sequencing (LRS) data to construct phased and near-complete genome assemblies (average contig N50 = 43 Mbp, QV = 56) for 189 individuals from 51 families with unsolved cases. We applied read- and assembly-based strategies to facilitate comprehensive characterization of de novo mutations, structural variants (SVs), and DNA methylation. Using LRS pangenome controls, we efficiently filtered >97% of common SVs exclusive to 87 offspring. We find no evidence of increased autosomal SV burden for probands when compared to unaffected siblings yet observe a suggestive trend toward an increased SV burden on the X chromosome among affected females. We establish a workflow to prioritize potential pathogenic variants by integrating autism risk genes and putative noncoding regulatory elements defined from ATAC-seq and CUT&Tag data from the developing cortex. In total, we identified three pathogenic variants in TBL1XR1, MECP2, and SYNGAP1, as well as nine candidate de novo and biallelic inherited homozygous SVs, most of which were missed by short-read sequencing. Our work highlights the potential of phased genomes to discover complex more pathogenic mutations and the power of the pangenome to restrict the focus on an increasingly smaller number of SVs for clinical evaluation.
Similar content being viewed by others
Data availability
The underlying sequencing data, as well as the processed assembly and alignment files used for analysis in this study for the SSC samples (n = 168) and the complete sample set (n = 189), are available to approved researchers through SFARI Base under Dataset ID DS0000104 and through the National Institute of Mental Health Data Archive (NDA) under Collection ID 3780. Source data are provided with this paper.
Code availability
The code used to perform the analyses and generate results in this study is publicly available and has been deposited in GitHub at https://github.com/EichlerLab/asap, under the MIT license. The specific version of the code associated with this publication is archived in Zenodo and is accessible via https://doi.org/10.5281/zenodo.1814964470. A full list of all software used, along with references, is provided in the Supplementary Notes and Supplementary References. Any additional information required to reanalyze the data reported in this work is available from the lead contact upon request.
References
Zeidan, J. et al. Global prevalence of autism: A systematic review update. Autism Res. 15, 778–790 (2022).
Wilfert, A. B. et al. Recent ultra-rare inherited variants implicate new autism candidate risk genes. Nat. Genet 53, 1125–1134 (2021).
Zhou, X. et al. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat. Genet 54, 1305–1319 (2022).
Coe, B. P. et al. Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity. Nat. Genet 51, 106–116 (2019).
Wang, T. et al. Integrated gene analyses of de novo variants from 46,612 trios with autism and developmental disorders. Proc. Natl. Acad. Sci. 119, e2203491119 (2022).
Fu, J. M. et al. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat. Genet 54, 1320–1331 (2022).
Satterstrom, F. K. et al. Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell 180, 568–584.e23 (2020).
Yoon, S. et al. Rates of contributory de novo mutation in high and low-risk autism families. Commun. Biol. 4, 1026 (2021).
Sebat, J. et al. Strong Association of De Novo Copy Number Mutations with Autism. Science 316, 445–449 (2007).
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Scott, A. J., Chiang, C. & Hall, I. M. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res 31, 2249–2257 (2021).
Olivucci, G. et al. Long read sequencing on its way to the routine diagnostics of genetic diseases. Front. Genet. 15, (2024).
Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet 17, 224–238 (2016).
Collins, R. L. & Talkowski, M. E. Diversity and consequences of structural variation in the human genome. Nat Rev Genet 1–20 https://doi.org/10.1038/s41576-024-00808-9 (2025).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet 21, 597–614 (2020).
Noyes, M. D. et al. Familial long-read sequencing increases yield of de novo mutations. Am. J. Hum. Genet. 109, 631–646 (2022).
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Zhao, X. et al. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am. J. Hum. Genet. 108, 919–928 (2021).
Wojcik, M. H. et al. Beyond the exome: What’s next in diagnostic testing for Mendelian conditions. Am. J. Hum. Genet. 110, 1229–1248 (2023).
Mastrorosa, F. K., Miller, D. E. & Eichler, E. E. Applications of long-read sequencing to Mendelian genetics. Genome Med. 15, 42 (2023).
Hiatt, S. M. et al. Long-read genome sequencing for the molecular diagnosis of neurodevelopmental disorders. Hum. Genet. Genomics Adv. 2, 100023 (2021).
Hiatt, S. M. et al. Long-read genome sequencing and variant reanalysis increase diagnostic yield in neurodevelopmental disorders. Genome Res 34, 1747–1762 (2024).
Sanchis-Juan, A. et al. Genome sequencing and comprehensive rare-variant analysis of 465 families with neurodevelopmental disorders. Am. J. Hum. Genet. 110, 1343–1355 (2023).
Pauper, M. et al. Long-read trio sequencing of individuals with unsolved intellectual disability. Eur. J. Hum. Genet 29, 637–648 (2021).
Sinha, S. et al. Long read sequencing enhances pathogenic and novel variation discovery in patients with rare diseases. Nat. Commun. 16, 2500 (2025).
Steyaert, W. et al. Unraveling undiagnosed rare disease cases by HiFi long-read genome sequencing. Genome Res 35, 755–768 (2025).
Paschal, C. R. et al. Concordance of Whole-Genome Long-Read Sequencing with Standard Clinical Testing for Prader-Willi and Angelman Syndromes. J. Mol. Diagnostics 27, 166–176 (2025).
Logsdon, G. A. et al. The variation and evolution of complete human centromeres. Nature 629, 136–145 (2024).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
Porubsky, D. et al. Human de novo mutation rates from a four-generation pedigree reference. Nature 1–10 https://doi.org/10.1038/s41586-025-08922-2 (2025).
Sanders, S. J. et al. Multiple Recurrent De Novo CNVs, Including Duplications of the 7q11.23 Williams Syndrome Region, Are Strongly Associated with Autism. Neuron 70, 863–885 (2011).
Turner, T. N. et al. Genomic Patterns of De Novo Mutation in Simplex Autism. Cell 171, 710–722.e12 (2017).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. 201178 Preprint at https://doi.org/10.1101/201178 (2018).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Noyes, M. D. et al. Long-read sequencing of trios reveals increased germline and postzygotic mutation rates in repetitive DNA. 2025.07.18.665621 Preprint at https://doi.org/10.1101/2025.07.18.665621 (2025).
Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
Gustafson, J. A. et al. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. 2024.03.05.24303792 Preprint at https://doi.org/10.1101/2024.03.05.24303792 (2024).
Schloissnig, S. et al. Structural variation in 1,019 diverse humans based on long-read sequencing. Nature 644, 442–452 (2025).
Hennick, K. et al. Sex differences in the developing human cortex intersect with genetic risk of neurodevelopmental disorders. 2025.09.04.674293 Preprint at https://doi.org/10.1101/2025.09.04.674293 (2025).
Banerjee-Basu, S. & Packer, A. SFARI Gene: an evolving database for the autism research community. Dis. Models Mechanisms 3, 133–135 (2010).
Birtele, M. et al. Non-synaptic function of the autism spectrum disorder-associated gene SYNGAP1 in cortical neurogenesis. Nat. Neurosci. 26, 2090–2103 (2023).
Hardwick, S. A. et al. Delineation of large deletions of the MECP2 gene in Rett syndrome patients, including a familial case with a male proband. Eur. J. Hum. Genet 15, 1218–1229 (2007).
Poleg, T. et al. Unraveling MECP2 structural variants in previously elusive Rett syndrome cases through IGV interpretation. npj Genom. Med. 10, 23 (2025).
Zaghlula, M. et al. Current clinical evidence does not support a link between TBL1XR1 and Rett syndrome: Description of one patient with Rett features and a novel mutation in TBL1XR1, and a review of TBL1XR1 phenotypes. Am. J. Med. Genet. Part A 176, 1683–1687 (2018).
Tillotson, R. & Bird, A. The molecular basis of MeCP2 function in the brain. J. Mol. Biol. 432, 1602–1623 (2020).
Stessman, H. A. F. et al. Disruption of POGZ is associated with intellectual disability and autism spectrum disorders. Am. J. Hum. Genet. 98, 541–552 (2016).
Rosa, E., Silva, I., Smetana, J. H. C. & de Oliveira, J. F. A comprehensive review on DDX3X liquid phase condensation in health and neurodevelopmental disorders. Int. J. Biol. Macromolecules 259, 129330 (2024).
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 21, 974–984 (2011).
Roller, E., Ivakhno, S., Lee, S., Royce, T. & Tanner, S. Canvas: versatile and scalable detection of copy number variants. Bioinformatics 32, 2375–2377 (2016).
Chen, S. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol. 20, 291 (2019).
Sierra, A. Y. et al. CPT1c Is Localized in endoplasmic reticulum of neurons and has carnitine palmitoyltransferase activity*. J. Biol. Chem. 283, 6878–6885 (2008).
Kępka, A. et al. Potential Role of L-Carnitine in Autism Spectrum Disorder. J. Clin. Med. 10, 1202 (2021).
Reid, K. M. & Brown, G. C. LRPAP1 is released from activated microglia and inhibits microglial phagocytosis and amyloid beta aggregation. Front. Immunol. 14, 1286474 (2023).
Prjibelski, A. D. et al. Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 41, 915–918 (2023).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Miller, D. E. et al. Targeted long-read sequencing identifies missing disease-causing variation. Am. J. Hum. Genet. 108, 1436–1449 (2021).
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Guo, H. et al. Genome sequencing identifies multiple deleterious variants in autism patients with more severe phenotypes. Genet. Med. 21, 1611–1620 (2019).
Trost, B. et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature 586, 80–86 (2020).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. 42, 1606–1614 (2024).
Sui, Y. et al. Using the linear references from the pangenome to discover missing autism variants. GitHub/Zenodo repository. https://doi.org/10.5281/zenodo.18149644 (2026).
Rhie, A. et al. The complete sequence of a human Y chromosome. Nature 621, 344–354 (2023).
Acknowledgements
We thank all of the individuals who participated in this research. We also thank all contributing investigators to the consortia datasets used here from SSC, SAGE, and Baylor College of Medicine, and the families who participated in this research, without whose contributions, genetic studies would be impossible. We thank Tonia Brown for assistance in editing this manuscript. We thank Tom Mokveld (PacBio) for helpful discussions. This work was supported, in part, by the US National Institutes of Health (NIH R01MH101221 to E.E.E.; R01NS057819 to H.Y.Z.; 1F32HD116501-01 to R.M-S.; and DP5OD033357 to D.E.M.) and the Simons Foundation (SFARI #810018 to E.E.E., H.Y.Z., T.J.N., and A.C.). E.E.E. and H.Y.Z. are investigators of the Howard Hughes Medical Institute. This work was also supported, in part, by the National Natural Science Foundation of China (82201314 and 82471194) to T.W. We would like to acknowledge the National Genome Research Institute (NHGRI) for funding the following grants supporting the creation of the human pangenome reference: U41HG010972, U01HG010971, U01HG013760, U01HG013755, U01HG013748, U01HG013744, R01HG011274, and the Human Pangenome Reference Consortium (BioProject ID: PRJNA730823). This article is subject to HHMI’s Open Access to Publications policy. HHMI lab heads have previously granted a nonexclusive CC BY 4.0 license to the public and a sublicensable license to HHMI in their research articles. Pursuant to those licenses, the author-accepted manuscript of this article can be made freely available under a CC BY 4.0 license immediately upon publication.
Author information
Authors and Affiliations
Consortia
Contributions
Y.S. and E.E.E. conceptualized the study. K. Ho., R.A.P.P., D.P., and B.S. collected samples. K.M.M., K. Ho., G.H.G., and J.K. generated the data. Y.S., Y.K., I.W., and N.K. performed data quality control. Y.S., J.L., M.D.N., I.W., and N.K. conducted the formal analyses. Y.S. and I.W. created the visualizations. Y.S., J.L., W.T.H., and M.W. developed the methodology. K. He., D.K., J.A.G., D.E.M., and the HPRC provided resources. Y.K., I.W., N.K., and J.W. developed software. Y.S., J.L., I.W., T.W., R.A.P.P., R.M.-S., and F.C. performed validation. Y.S. wrote the original draft. Y.S., E.E.E., H.Y.Z., J.L., M.D.N., Y.K., I.W., N.K., W.T.H., M.W., J.W., K. Ho., K.M.M., G.H.G., J.K., T.W., K. He., D.K., R.A.P.P., R.M.-S., F.C., D.P., B.S., J.A.G., D.E.M., T.J.N., A.C., H.B.-R. and the HPRC reviewed and edited the manuscript. E.E.E., H.Y.Z., A.C., and T.J.N. supervised the study.
Corresponding author
Ethics declarations
Competing interests
E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. D.E.M. is on SABs at Oxford Nanopore Technologies (ONT) and Basis Genetics, is engaged in research agreements with ONT and PacBio, has received research and travel support from ONT and PacBio, holds stock options in MyOme and Basis Genetics, and is a consultant for MyOme. J.A.G. has received travel support from ONT. H.Y.Z. is a member of the Regeneron Board of Directors and an advisory board member to The Column Group, Cajal Therapeutics (also co-founder), and Lyterian. D.P. provides consulting service to Ionis Pharmaceuticals, M2DS Therapeutics and Acadia Pharmaceuticals. All other authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Olaf Riess and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sui, Y., Lin, J., Noyes, M.D. et al. Using the linear references from the pangenome to discover missing autism variants. Nat Commun (2026). https://doi.org/10.1038/s41467-026-68378-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-68378-4


