Using the linear references from the pangenome to discover missing autism variants

Sui, Yang; Lin, Jiadong; Noyes, Michelle D.; Kwon, Youngjun; Wong, Isaac; Koundinya, Nidhi; Harvey, William T.; Wu, Mei; Hoekzema, Kendra; Munson, Katherine M.; Garcia, Gage H.; Knuth, Jordan; Wertz, Julie; Wang, Tianyun; Hennick, Kelsey; Karunakaran, Druha; Polo Prieto, Rafael A.; Meyer-Schuman, Rebecca; Cherry, Fisher; Pehlivan, Davut; Suter, Bernhard; Gustafson, Jonas A.; Miller, Danny E.; Berk-Rauch, Hanna; Nowakowski, Tomasz J.; Chakravarti, Aravinda; Zoghbi, Huda Y.; Eichler, Evan E.

doi:10.1038/s41467-026-68378-4

Download PDF

Article
Open access
Published: 23 January 2026

Using the linear references from the pangenome to discover missing autism variants

Nature Communications , Article number: (2026) Cite this article

3788 Accesses
1 Citations
21 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

To better understand large-effect pathogenic variation associated with autism, we generated long-read sequencing (LRS) data to construct phased and near-complete genome assemblies (average contig N50 = 43 Mbp, QV = 56) for 189 individuals from 51 families with unsolved cases. We applied read- and assembly-based strategies to facilitate comprehensive characterization of de novo mutations, structural variants (SVs), and DNA methylation. Using LRS pangenome controls, we efficiently filtered >97% of common SVs exclusive to 87 offspring. We find no evidence of increased autosomal SV burden for probands when compared to unaffected siblings yet observe a suggestive trend toward an increased SV burden on the X chromosome among affected females. We establish a workflow to prioritize potential pathogenic variants by integrating autism risk genes and putative noncoding regulatory elements defined from ATAC-seq and CUT&Tag data from the developing cortex. In total, we identified three pathogenic variants in TBL1XR1, MECP2, and SYNGAP1, as well as nine candidate de novo and biallelic inherited homozygous SVs, most of which were missed by short-read sequencing. Our work highlights the potential of phased genomes to discover complex more pathogenic mutations and the power of the pangenome to restrict the focus on an increasingly smaller number of SVs for clinical evaluation.

Pangenome graphs improve the analysis of structural variants in rare genetic diseases

Article Open access 22 January 2024

Systematic analysis and prediction of genes associated with monogenic disorders on human chromosome X

Article Open access 02 November 2022

Shared rare genetic variants in multiplex autism families suggest a social memory gene under selection

Article Open access 03 January 2025

Data availability

The underlying sequencing data, as well as the processed assembly and alignment files used for analysis in this study for the SSC samples (n = 168) and the complete sample set (n = 189), are available to approved researchers through SFARI Base under Dataset ID DS0000104 and through the National Institute of Mental Health Data Archive (NDA) under Collection ID 3780. Source data are provided with this paper.

Code availability

The code used to perform the analyses and generate results in this study is publicly available and has been deposited in GitHub at https://github.com/EichlerLab/asap, under the MIT license. The specific version of the code associated with this publication is archived in Zenodo and is accessible via https://doi.org/10.5281/zenodo.18149644⁷⁰. A full list of all software used, along with references, is provided in the Supplementary Notes and Supplementary References. Any additional information required to reanalyze the data reported in this work is available from the lead contact upon request.

References

Zeidan, J. et al. Global prevalence of autism: A systematic review update. Autism Res. 15, 778–790 (2022).
Google Scholar
Wilfert, A. B. et al. Recent ultra-rare inherited variants implicate new autism candidate risk genes. Nat. Genet 53, 1125–1134 (2021).
Google Scholar
Zhou, X. et al. Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat. Genet 54, 1305–1319 (2022).
Google Scholar
Coe, B. P. et al. Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity. Nat. Genet 51, 106–116 (2019).
Google Scholar
Wang, T. et al. Integrated gene analyses of de novo variants from 46,612 trios with autism and developmental disorders. Proc. Natl. Acad. Sci. 119, e2203491119 (2022).
Google Scholar
Fu, J. M. et al. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat. Genet 54, 1320–1331 (2022).
Google Scholar
Satterstrom, F. K. et al. Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell 180, 568–584.e23 (2020).
Google Scholar
Yoon, S. et al. Rates of contributory de novo mutation in high and low-risk autism families. Commun. Biol. 4, 1026 (2021).
Google Scholar
Sebat, J. et al. Strong Association of De Novo Copy Number Mutations with Autism. Science 316, 445–449 (2007).
Google Scholar
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Google Scholar
Scott, A. J., Chiang, C. & Hall, I. M. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res 31, 2249–2257 (2021).
Google Scholar
Olivucci, G. et al. Long read sequencing on its way to the routine diagnostics of genetic diseases. Front. Genet. 15, (2024).
Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet 17, 224–238 (2016).
Google Scholar
Collins, R. L. & Talkowski, M. E. Diversity and consequences of structural variation in the human genome. Nat Rev Genet 1–20 https://doi.org/10.1038/s41576-024-00808-9 (2025).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
Google Scholar
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet 21, 597–614 (2020).
Google Scholar
Noyes, M. D. et al. Familial long-read sequencing increases yield of de novo mutations. Am. J. Hum. Genet. 109, 631–646 (2022).
Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Google Scholar
Zhao, X. et al. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am. J. Hum. Genet. 108, 919–928 (2021).
Google Scholar
Wojcik, M. H. et al. Beyond the exome: What’s next in diagnostic testing for Mendelian conditions. Am. J. Hum. Genet. 110, 1229–1248 (2023).
Google Scholar
Mastrorosa, F. K., Miller, D. E. & Eichler, E. E. Applications of long-read sequencing to Mendelian genetics. Genome Med. 15, 42 (2023).
Google Scholar
Hiatt, S. M. et al. Long-read genome sequencing for the molecular diagnosis of neurodevelopmental disorders. Hum. Genet. Genomics Adv. 2, 100023 (2021).
Google Scholar
Hiatt, S. M. et al. Long-read genome sequencing and variant reanalysis increase diagnostic yield in neurodevelopmental disorders. Genome Res 34, 1747–1762 (2024).
Google Scholar
Sanchis-Juan, A. et al. Genome sequencing and comprehensive rare-variant analysis of 465 families with neurodevelopmental disorders. Am. J. Hum. Genet. 110, 1343–1355 (2023).
Google Scholar
Pauper, M. et al. Long-read trio sequencing of individuals with unsolved intellectual disability. Eur. J. Hum. Genet 29, 637–648 (2021).
Google Scholar
Sinha, S. et al. Long read sequencing enhances pathogenic and novel variation discovery in patients with rare diseases. Nat. Commun. 16, 2500 (2025).
Google Scholar
Steyaert, W. et al. Unraveling undiagnosed rare disease cases by HiFi long-read genome sequencing. Genome Res 35, 755–768 (2025).
Google Scholar
Paschal, C. R. et al. Concordance of Whole-Genome Long-Read Sequencing with Standard Clinical Testing for Prader-Willi and Angelman Syndromes. J. Mol. Diagnostics 27, 166–176 (2025).
Google Scholar
Logsdon, G. A. et al. The variation and evolution of complete human centromeres. Nature 629, 136–145 (2024).
Google Scholar
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Google Scholar
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Google Scholar
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
Google Scholar
Porubsky, D. et al. Human de novo mutation rates from a four-generation pedigree reference. Nature 1–10 https://doi.org/10.1038/s41586-025-08922-2 (2025).
Sanders, S. J. et al. Multiple Recurrent De Novo CNVs, Including Duplications of the 7q11.23 Williams Syndrome Region, Are Strongly Associated with Autism. Neuron 70, 863–885 (2011).
Google Scholar
Turner, T. N. et al. Genomic Patterns of De Novo Mutation in Simplex Autism. Cell 171, 710–722.e12 (2017).
Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Google Scholar
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. 201178 Preprint at https://doi.org/10.1101/201178 (2018).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Google Scholar
Noyes, M. D. et al. Long-read sequencing of trios reveals increased germline and postzygotic mutation rates in repetitive DNA. 2025.07.18.665621 Preprint at https://doi.org/10.1101/2025.07.18.665621 (2025).
Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).
Google Scholar
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
Google Scholar
Gustafson, J. A. et al. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. 2024.03.05.24303792 Preprint at https://doi.org/10.1101/2024.03.05.24303792 (2024).
Schloissnig, S. et al. Structural variation in 1,019 diverse humans based on long-read sequencing. Nature 644, 442–452 (2025).
Google Scholar
Hennick, K. et al. Sex differences in the developing human cortex intersect with genetic risk of neurodevelopmental disorders. 2025.09.04.674293 Preprint at https://doi.org/10.1101/2025.09.04.674293 (2025).
Banerjee-Basu, S. & Packer, A. SFARI Gene: an evolving database for the autism research community. Dis. Models Mechanisms 3, 133–135 (2010).
Google Scholar
Birtele, M. et al. Non-synaptic function of the autism spectrum disorder-associated gene SYNGAP1 in cortical neurogenesis. Nat. Neurosci. 26, 2090–2103 (2023).
Google Scholar
Hardwick, S. A. et al. Delineation of large deletions of the MECP2 gene in Rett syndrome patients, including a familial case with a male proband. Eur. J. Hum. Genet 15, 1218–1229 (2007).
Google Scholar
Poleg, T. et al. Unraveling MECP2 structural variants in previously elusive Rett syndrome cases through IGV interpretation. npj Genom. Med. 10, 23 (2025).
Google Scholar
Zaghlula, M. et al. Current clinical evidence does not support a link between TBL1XR1 and Rett syndrome: Description of one patient with Rett features and a novel mutation in TBL1XR1, and a review of TBL1XR1 phenotypes. Am. J. Med. Genet. Part A 176, 1683–1687 (2018).
Google Scholar
Tillotson, R. & Bird, A. The molecular basis of MeCP2 function in the brain. J. Mol. Biol. 432, 1602–1623 (2020).
Google Scholar
Stessman, H. A. F. et al. Disruption of POGZ is associated with intellectual disability and autism spectrum disorders. Am. J. Hum. Genet. 98, 541–552 (2016).
Google Scholar
Rosa, E., Silva, I., Smetana, J. H. C. & de Oliveira, J. F. A comprehensive review on DDX3X liquid phase condensation in health and neurodevelopmental disorders. Int. J. Biol. Macromolecules 259, 129330 (2024).
Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Google Scholar
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res 21, 974–984 (2011).
Google Scholar
Roller, E., Ivakhno, S., Lee, S., Royce, T. & Tanner, S. Canvas: versatile and scalable detection of copy number variants. Bioinformatics 32, 2375–2377 (2016).
Google Scholar
Chen, S. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol. 20, 291 (2019).
Google Scholar
Sierra, A. Y. et al. CPT1c Is Localized in endoplasmic reticulum of neurons and has carnitine palmitoyltransferase activity*. J. Biol. Chem. 283, 6878–6885 (2008).
Google Scholar
Kępka, A. et al. Potential Role of L-Carnitine in Autism Spectrum Disorder. J. Clin. Med. 10, 1202 (2021).
Google Scholar
Reid, K. M. & Brown, G. C. LRPAP1 is released from activated microglia and inhibits microglial phagocytosis and amyloid beta aggregation. Front. Immunol. 14, 1286474 (2023).
Prjibelski, A. D. et al. Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 41, 915–918 (2023).
Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Google Scholar
Miller, D. E. et al. Targeted long-read sequencing identifies missing disease-causing variation. Am. J. Hum. Genet. 108, 1436–1449 (2021).
Google Scholar
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Google Scholar
Guo, H. et al. Genome sequencing identifies multiple deleterious variants in autism patients with more severe phenotypes. Genet. Med. 21, 1611–1620 (2019).
Google Scholar
Trost, B. et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature 586, 80–86 (2020).
Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Google Scholar
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
Google Scholar
Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. 42, 1606–1614 (2024).
Google Scholar
Sui, Y. et al. Using the linear references from the pangenome to discover missing autism variants. GitHub/Zenodo repository. https://doi.org/10.5281/zenodo.18149644 (2026).
Rhie, A. et al. The complete sequence of a human Y chromosome. Nature 621, 344–354 (2023).
Google Scholar

Download references

Acknowledgements

We thank all of the individuals who participated in this research. We also thank all contributing investigators to the consortia datasets used here from SSC, SAGE, and Baylor College of Medicine, and the families who participated in this research, without whose contributions, genetic studies would be impossible. We thank Tonia Brown for assistance in editing this manuscript. We thank Tom Mokveld (PacBio) for helpful discussions. This work was supported, in part, by the US National Institutes of Health (NIH R01MH101221 to E.E.E.; R01NS057819 to H.Y.Z.; 1F32HD116501-01 to R.M-S.; and DP5OD033357 to D.E.M.) and the Simons Foundation (SFARI #810018 to E.E.E., H.Y.Z., T.J.N., and A.C.). E.E.E. and H.Y.Z. are investigators of the Howard Hughes Medical Institute. This work was also supported, in part, by the National Natural Science Foundation of China (82201314 and 82471194) to T.W. We would like to acknowledge the National Genome Research Institute (NHGRI) for funding the following grants supporting the creation of the human pangenome reference: U41HG010972, U01HG010971, U01HG013760, U01HG013755, U01HG013748, U01HG013744, R01HG011274, and the Human Pangenome Reference Consortium (BioProject ID: PRJNA730823). This article is subject to HHMI’s Open Access to Publications policy. HHMI lab heads have previously granted a nonexclusive CC BY 4.0 license to the public and a sublicensable license to HHMI in their research articles. Pursuant to those licenses, the author-accepted manuscript of this article can be made freely available under a CC BY 4.0 license immediately upon publication.

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Yang Sui, Jiadong Lin, Michelle D. Noyes, Youngjun Kwon, Isaac Wong, Nidhi Koundinya, William T. Harvey, Mei Wu, Kendra Hoekzema, Katherine M. Munson, Gage H. Garcia, Jordan Knuth, Julie Wertz, Marcelo Ayllon, Evan E. Eichler, Lingbin Ni, David Porubsky, Luyao Ren, Andrew B. Stergachis, DongAhn Yoo & Evan E. Eichler
Department of Medical Genetics, Center for Medical Genetics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
Tianyun Wang
Neuroscience Research Institute, Peking University; Key Laboratory for Neuroscience, Ministry of Education of China & National Health Commission of China, Beijing, China
Tianyun Wang
Autism Research Center, Peking University Health Science Center, Beijing, China
Tianyun Wang
Department of Neurological Surgery, University of California, San Francisco, CA, USA
Kelsey Hennick & Tomasz J. Nowakowski
Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
Kelsey Hennick & Tomasz J. Nowakowski
Center for Human Genetics and Genomics, New York University Grossman School of Medicine, New York, NY, USA
Druha Karunakaran, Hanna Berk-Rauch & Aravinda Chakravarti
Department of Human and Molecular Genetics, Baylor College of Medicine, Houston, TX, USA
Rafael A. Polo Prieto, Rebecca Meyer-Schuman, Fisher Cherry, Davut Pehlivan & Huda Y. Zoghbi
Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, TX, USA
Rafael A. Polo Prieto, Rebecca Meyer-Schuman, Fisher Cherry, Davut Pehlivan & Huda Y. Zoghbi
Texas Children’s Hospital, Houston, TX, USA
Davut Pehlivan & Bernhard Suter
Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
Davut Pehlivan & Bernhard Suter
Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
Jonas A. Gustafson, Danny E. Miller & Tina Graves-Lindsay
Molecular and Cellular Biology Program, University of Washington, Seattle, Washington, USA
Jonas A. Gustafson
Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
Danny E. Miller
Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY, USA
Aravinda Chakravarti
Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
Huda Y. Zoghbi
Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA
Huda Y. Zoghbi
Department of Neurology, Baylor College of Medicine, Houston, TX, USA
Huda Y. Zoghbi
Howard Hughes Medical Institute, Baylor College of Medicine, Houston, TX, USA
Huda Y. Zoghbi
Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
Evan E. Eichler, Luyao Ren & Evan E. Eichler
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, 63108, USA
Derek Albracht, Lucinda Antonacci-Fulton, Sarah Cody, Robert S. Fulton, John E. Garza, Edward A. Belter Jr, Milinn Kremitzki, Juan F. Macias-Velasco, Christopher Markovic, Chad Tomlinson & Ting Wang
Department of Human Molecular Genetics and Biochemistry, Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, 69978, Israel
Ivan A. Alexandrov
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Jamie Allen, Jitender Cheema, Adam Frankish, Mallory A. Freeberg, Leanne Haggerty, S. Nakib Hossain, Sarah E. Hunt, Toby Hunt, Jane E. Loveland, Fergal J. Martin, Jonathan M. Mudge, Swati Sinha, Marie-Marthe Suner, Jack A. S. Tierney & Francesca Floriana Tricomi
Center for Applied and Translational Genomics (CATG), Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai Health, Dubai, UAE
Alawi A. Alsheikh-Ali, Mohammad Amiruddin Hashmi, Nasna Nassir & Mohammed Uddin
Department of Genetics, Stanford University, Palo Alto, CA, 94304, USA
Nicolas Altemose, Danilo Dubocanin & Alexander G. Ioannidis
Department of Genetics, Washington University School of Medicine, St. Louis, MO, 63110, USA
Casey Andrews, Zheng Dong, Qichen Fu, Juan Jiang, Milinn Kremitzki, Heather A. Lawson, Daofeng Li, Tianjie Liu, Juan F. Macias-Velasco, Ting Wang, Zilan Xin, Zheng Xu, Wenjin Zhang & Xiaoyu Zhuo
Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
Dmitry Antipov, Nancy F. Hansen, Juhyun Kim, Sergey Koren, Adam M. Phillippy, Brandon D. Pickett, Arang Rhie, Steven J. Solar & Brian P. Walenz
UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, 95060, USA
Mobin Asri, Halle D. Bender, Andrew P. Blair, Ann M. Mc Cartney, Monika Cechova, Xian Chang, Hiram Clawson, Mark Diekhans, Jordan M. Eizenga, Parsa Eskandar, Joshua M. V. Gardner, Maximilian Haeussler, David Haussler, Prajna Hebbar, Glenn Hickey, Todd L. Hillaker, Alexander G. Ioannidis, Nafiseh Jafarzadeh, Ryan Lorig-Roach, Hailey Loucks, Julian K. Lucas, Mira Mastoras, Brandy McNulty, Julian M. Menendez, Karen H. Miga, Shloka Negi, Adam M. Novak, Benedict Paten, Anandi Radhakrishnan, Brian J. Raney, Jouni Sirén & Ivo Violich
The Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, 10065, USA
Jennifer R. Balacco, Giulio Formenti, Nivesh Jain, Erich D. Jarvis, Bonhwang Koo, Jack A. Medico, Sadye Paez, Marco Sollitto & Conor V. Whelan
Bioinnovation and Genome Sciences, The Translational Genomics Research Institute (TGen), Phoenix, AZ, 85004, USA
Floris P. Barthel, Andrea Guarracino, Yue Hao, Maryam Jehangir & T. Rhyker Ranallo-Benavidez
Human Technopole, Milan, Italy
Davide Bolognini, Clelia Peano, Alessandro Raveane, Nicole Soranzo & Giulia Zunino
Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
Katherine E. Bonini, Eimear E. Kenny, Ruhollah Shemirani & Lisa E. Wang
Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, 32611, USA
Christina Boucher, Eddie Ferro & Rahul Varki
Canadian Center for Computational Genomics, McGill University, Montréal, QC, H3A 0G1, Canada
Guillaume Bourque
Department of Human Genetics, McGill University, Montréal, QC, H3A 0G1, Canada
Guillaume Bourque
Victor Phillip Dahdaleh Institute of Genomic Medicine, Montréal, QC, H3A 0G1, Canada
Guillaume Bourque
Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, 38163, USA
Silvia Buonaiuto, Shuo Cao, Vincenza Colonna, Erik Garrison, Andrea Guarracino, Franco L. Marsico, Laura Pignata, Pjotr Prins, Farnaz Salehi & Flavia Villani
Google LLC, Mountain View, CA, 94043, USA
Andrew Carroll, Pi-Chuan Chang & Kishwar Shafin
Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
Haoyu Cheng
Department of Biology, University of Florence, Sesto Fiorentino, FI, 50019, Italy
Claudio Ciofi, Maria Angela Diroma, Chiara Natali & Marco Sollitto
Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, 95060, USA
Holland C. Conwell, Sarah M. Ford, Richard E. Green, Samuel Sacco, William E. Seligmann & Matteo Tommaso Ungaro
Arizona State University, Consortium for Science, Policy & Outcomes, Washington, DC, 20006, USA
Robert Cook-Deegan
Center for Digital Medicine, Heinrich Heine University Düsseldorf NRW, Düsseldorf, DE, Germany
Daniel Doerr, Jana Ebler, Peter Heringer, Tobias Marschall & Arda Söylev
Department for Endocrinology and Diabetology at the Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf NRW, Düsseldorf, DE, Germany
Daniel Doerr & Peter Heringer
Paul-Langerhans-Group Computational Diabetology, German Diabetes Center (DDZ) and Leibniz Institute for Diabetes Research NRW, Düsseldorf, DE, Germany
Daniel Doerr & Peter Heringer
Wellcome Sanger Institute, Genome Campus, Hinxton, CB10 1RQ, UK
Richard Durbin & Nicole Soranzo
Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
Richard Durbin
Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University NRW, Düsseldorf, DE, Germany
Jana Ebler, Tobias Marschall & Arda Söylev
Howard Hughes Medical Institute, Chevy Chase, MD, 20815, USA
Evan E. Eichler & Erich D. Jarvis
ISEM, Univ Montpellier, CNRS, IRD, Montpellier, France
Anna-Sophie Fiston-Lavier, Capucine Mayoud & Shadi Shahatit
Institut Universitaire de France, Paris, France
Anna-Sophie Fiston-Lavier
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, 92093, USA
Willard W. Ford & Aarushi Sehgal
Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, 98195, USA
Stephanie M. Fullerton
Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
Yan Gao, Neng Huang, Heng Li, Maximillian G. Marin & Ying Zhou
Department of Anthropology, University of Kansas, Lawrence, KS, 66045, USA
Obed A. Garcia
School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK
Shilpa Garg
Traditional, ancestral and unceded territory of the Gabrielino/Tongva peoples, Institute for Society & Genetics, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Nanibaa’ A. Garrison
Traditional, ancestral and unceded territory of the Gabrielino/Tongva peoples, Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Nanibaa’ A. Garrison
Traditional, ancestral and unceded territory of the Gabrielino/Tongva peoples, Division of General Internal Medicine & Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
Nanibaa’ A. Garrison
Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, 94720, USA
Margarita Geleta
Medical and Population Genomics Lab, Sidra Medicine, Doha, Qatar
Mohammadmersad Ghorbani, Younes Mokrab & Shabir Moosa
Montreal Heart Institute, Montréal, QC, Canada
Cristian Groza
Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA
Melissa Gymrek
Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
Ira M. Hall, Wen-Wei Liao & Shuangjia Lu
Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
Ira M. Hall, Wen-Wei Liao & Shuangjia Lu
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Neng Huang, Heng Li & Maximillian G. Marin
The Center for Bio- and Medical Technologies, Moscow, Russia
Jonathan LoTempio Jr & Fedor Ryabov
Department of Evolution and Ecology and the Center for Population Biology, University of California, One Shields, Davis, CA, 95616, USA
Charles H. Langley
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Ben Langmead, Michael C. Schatz & Vikram S. Shivakumar
Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Glennis A. Logsdon
Sun Yat-sen University, Guangzhou, China
Jianguo Lu
Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, 63110, USA
Juan F. Macias-Velasco & Ting Wang
Department of Biology and Center for Medical Genomics, Penn State University, University Park, PA, 16802, USA
Kateryna D. Makova
Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, 98195, USA
Anna Minkina, Andrew B. Stergachis & Mitchell R. Vollger
Coriell Institute for Medical Research, Camden, NJ, 08103, USA
Matthew W. Mitchell & Laura B. Scheinfeldt
Department of Biology, Penn State University, University Park, PA, 16802, USA
Saswat K. Mohanty & Linnéa Smeds
Department of Biomedical Science, College of Health Sciences, Qatar University, Doha, Qatar
Younes Mokrab
Department of Genetic Medicine, Weill Cornell Medicine-Qatar, Doha, Qatar
Younes Mokrab
IRSD - Digestive Health Research Institute, University of Toulouse, INSERM, INRAE, ENVT, UPS, Toulouse, France
Jean Monlong
MATCH biosystems, S.L, Elche, Spain
Avelina Moreno-Ochando
Universidad Miguel Hernández de Elche, Elche, Spain
Avelina Moreno-Ochando
Department of Computational Biology and Medical Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8561, Japan
Shinichi Morishita, Chie Owa & Yoshihiko Suzuki
Department of Computer Science, University of Pisa, Pisa, Italy
Njagi Mwaniki & Nadia Pisanti
Law School, University of Wisconsin-Madison, Madison, WI, 53706, USA
Pilar N. Ossorio
Institute of Genetics and Biomedical Research, UoS of Milan, National Research Council, Milan, Italy
Clelia Peano
Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, DE, Germany
David Porubsky
Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
Mikko Rautiainen
Centre for Biomedical Research and Technology, HSE University, Moscow, Russia
Fedor Ryabov
Department of Biology, Johns Hopkins University, Baltimore, MD, USA
Michael C. Schatz
University of Amsterdam, Amsterdam, Netherlands
Mahsa Shabani
School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
Nicole Soranzo
Center for Genomic Discovery, Mohammed Bin Rashid University, Dubai Health, UAE
Ahmad Abou Tayoun
Dubai Health Genomic Medicine Center, Dubai Health, UAE
Ahmad Abou Tayoun
GenomeArc Inc, Mississauga, ON, Canada
Mohammed Uddin
Department of Biology and Biotechnologies “Charles Darwin”, University of Rome “La Sapienza”, Piazzale Aldo Moro, 00185 RM, Italy
Matteo Tommaso Ungaro
Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, 92350, USA
Charles Wang
PacBio, Menlo Park, CA, 94025, USA
Aaron M. Wenger
The first affiliated hospital of Xi’an Jiaotong University, Xi’an Jiaotong University, Xi’an, Shaanxi, 710049, China
Kai Ye

Authors

Yang Sui
View author publications
Search author on:PubMed Google Scholar
Jiadong Lin
View author publications
Search author on:PubMed Google Scholar
Michelle D. Noyes
View author publications
Search author on:PubMed Google Scholar
Youngjun Kwon
View author publications
Search author on:PubMed Google Scholar
Isaac Wong
View author publications
Search author on:PubMed Google Scholar
Nidhi Koundinya
View author publications
Search author on:PubMed Google Scholar
William T. Harvey
View author publications
Search author on:PubMed Google Scholar
Mei Wu
View author publications
Search author on:PubMed Google Scholar
Kendra Hoekzema
View author publications
Search author on:PubMed Google Scholar
Katherine M. Munson
View author publications
Search author on:PubMed Google Scholar
Gage H. Garcia
View author publications
Search author on:PubMed Google Scholar
Jordan Knuth
View author publications
Search author on:PubMed Google Scholar
Julie Wertz
View author publications
Search author on:PubMed Google Scholar
Tianyun Wang
View author publications
Search author on:PubMed Google Scholar
Kelsey Hennick
View author publications
Search author on:PubMed Google Scholar
Druha Karunakaran
View author publications
Search author on:PubMed Google Scholar
Rafael A. Polo Prieto
View author publications
Search author on:PubMed Google Scholar
Rebecca Meyer-Schuman
View author publications
Search author on:PubMed Google Scholar
Fisher Cherry
View author publications
Search author on:PubMed Google Scholar
Davut Pehlivan
View author publications
Search author on:PubMed Google Scholar
Bernhard Suter
View author publications
Search author on:PubMed Google Scholar
Jonas A. Gustafson
View author publications
Search author on:PubMed Google Scholar
Danny E. Miller
View author publications
Search author on:PubMed Google Scholar
Hanna Berk-Rauch
View author publications
Search author on:PubMed Google Scholar
Tomasz J. Nowakowski
View author publications
Search author on:PubMed Google Scholar
Aravinda Chakravarti
View author publications
Search author on:PubMed Google Scholar
Huda Y. Zoghbi
View author publications
Search author on:PubMed Google Scholar
Evan E. Eichler
View author publications
Search author on:PubMed Google Scholar

Consortia

Human Pangenome Reference Consortium (HPRC)

Derek Albracht
, Ivan A. Alexandrov
, Jamie Allen
, Alawi A. Alsheikh-Ali
, Nicolas Altemose
, Casey Andrews
, Dmitry Antipov
, Lucinda Antonacci-Fulton
, Mobin Asri
, Marcelo Ayllon
, Jennifer R. Balacco
, Floris P. Barthel
, Halle D. Bender
, Andrew P. Blair
, Davide Bolognini
, Katherine E. Bonini
, Christina Boucher
, Guillaume Bourque
, Silvia Buonaiuto
, Shuo Cao
, Andrew Carroll
, Ann M. Mc Cartney
, Monika Cechova
, Pi-Chuan Chang
, Xian Chang
, Jitender Cheema
, Haoyu Cheng
, Claudio Ciofi
, Hiram Clawson
, Sarah Cody
, Vincenza Colonna
, Holland C. Conwell
, Robert Cook-Deegan
, Mark Diekhans
, Maria Angela Diroma
, Daniel Doerr
, Zheng Dong
, Danilo Dubocanin
, Richard Durbin
, Jana Ebler
, Evan E. Eichler
, Jordan M. Eizenga
, Parsa Eskandar
, Eddie Ferro
, Anna-Sophie Fiston-Lavier
, Sarah M. Ford
, Willard W. Ford
, Giulio Formenti
, Adam Frankish
, Mallory A. Freeberg
, Qichen Fu
, Stephanie M. Fullerton
, Robert S. Fulton
, Yan Gao
, Gage H. Garcia
, Obed A. Garcia
, Joshua M. V. Gardner
, Shilpa Garg
, Erik Garrison
, Nanibaa’ A. Garrison
, John E. Garza
, Margarita Geleta
, Mohammadmersad Ghorbani
, Tina Graves-Lindsay
, Richard E. Green
, Cristian Groza
, Andrea Guarracino
, Melissa Gymrek
, Maximilian Haeussler
, Leanne Haggerty
, Ira M. Hall
, Nancy F. Hansen
, Yue Hao
, Mohammad Amiruddin Hashmi
, David Haussler
, Prajna Hebbar
, Peter Heringer
, Glenn Hickey
, Todd L. Hillaker
, S. Nakib Hossain
, Neng Huang
, Sarah E. Hunt
, Toby Hunt
, Alexander G. Ioannidis
, Nafiseh Jafarzadeh
, Nivesh Jain
, Erich D. Jarvis
, Maryam Jehangir
, Juan Jiang
, Edward A. Belter Jr
, Jonathan LoTempio Jr
, Eimear E. Kenny
, Juhyun Kim
, Bonhwang Koo
, Sergey Koren
, Milinn Kremitzki
, Charles H. Langley
, Ben Langmead
, Heather A. Lawson
, Daofeng Li
, Heng Li
, Wen-Wei Liao
, Jiadong Lin
, Tianjie Liu
, Glennis A. Logsdon
, Ryan Lorig-Roach
, Hailey Loucks
, Jane E. Loveland
, Jianguo Lu
, Shuangjia Lu
, Julian K. Lucas
, Juan F. Macias-Velasco
, Kateryna D. Makova
, Maximillian G. Marin
, Christopher Markovic
, Tobias Marschall
, Franco L. Marsico
, Fergal J. Martin
, Mira Mastoras
, Capucine Mayoud
, Brandy McNulty
, Jack A. Medico
, Julian M. Menendez
, Karen H. Miga
, Anna Minkina
, Matthew W. Mitchell
, Saswat K. Mohanty
, Younes Mokrab
, Jean Monlong
, Shabir Moosa
, Avelina Moreno-Ochando
, Shinichi Morishita
, Jonathan M. Mudge
, Katherine M. Munson
, Njagi Mwaniki
, Nasna Nassir
, Chiara Natali
, Shloka Negi
, Lingbin Ni
, Adam M. Novak
, Pilar N. Ossorio
, Chie Owa
, Sadye Paez
, Benedict Paten
, Clelia Peano
, Adam M. Phillippy
, Brandon D. Pickett
, Laura Pignata
, Nadia Pisanti
, David Porubsky
, Pjotr Prins
, Anandi Radhakrishnan
, T. Rhyker Ranallo-Benavidez
, Brian J. Raney
, Mikko Rautiainen
, Alessandro Raveane
, Luyao Ren
, Arang Rhie
, Fedor Ryabov
, Samuel Sacco
, Farnaz Salehi
, Michael C. Schatz
, Laura B. Scheinfeldt
, Aarushi Sehgal
, William E. Seligmann
, Mahsa Shabani
, Kishwar Shafin
, Shadi Shahatit
, Ruhollah Shemirani
, Vikram S. Shivakumar
, Swati Sinha
, Jouni Sirén
, Linnéa Smeds
, Steven J. Solar
, Marco Sollitto
, Nicole Soranzo
, Andrew B. Stergachis
, Marie-Marthe Suner
, Yoshihiko Suzuki
, Arda Söylev
, Ahmad Abou Tayoun
, Jack A. S. Tierney
, Chad Tomlinson
, Francesca Floriana Tricomi
, Mohammed Uddin
, Matteo Tommaso Ungaro
, Rahul Varki
, Flavia Villani
, Ivo Violich
, Mitchell R. Vollger
, Brian P. Walenz
, Charles Wang
, Lisa E. Wang
, Ting Wang
, Aaron M. Wenger
, Conor V. Whelan
, Zilan Xin
, Zheng Xu
, Kai Ye
, DongAhn Yoo
, Wenjin Zhang
, Ying Zhou
, Xiaoyu Zhuo
& Giulia Zunino

Contributions

Y.S. and E.E.E. conceptualized the study. K. Ho., R.A.P.P., D.P., and B.S. collected samples. K.M.M., K. Ho., G.H.G., and J.K. generated the data. Y.S., Y.K., I.W., and N.K. performed data quality control. Y.S., J.L., M.D.N., I.W., and N.K. conducted the formal analyses. Y.S. and I.W. created the visualizations. Y.S., J.L., W.T.H., and M.W. developed the methodology. K. He., D.K., J.A.G., D.E.M., and the HPRC provided resources. Y.K., I.W., N.K., and J.W. developed software. Y.S., J.L., I.W., T.W., R.A.P.P., R.M.-S., and F.C. performed validation. Y.S. wrote the original draft. Y.S., E.E.E., H.Y.Z., J.L., M.D.N., Y.K., I.W., N.K., W.T.H., M.W., J.W., K. Ho., K.M.M., G.H.G., J.K., T.W., K. He., D.K., R.A.P.P., R.M.-S., F.C., D.P., B.S., J.A.G., D.E.M., T.J.N., A.C., H.B.-R. and the HPRC reviewed and edited the manuscript. E.E.E., H.Y.Z., A.C., and T.J.N. supervised the study.

Corresponding author

Correspondence to Evan E. Eichler.

Ethics declarations

Competing interests

E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. D.E.M. is on SABs at Oxford Nanopore Technologies (ONT) and Basis Genetics, is engaged in research agreements with ONT and PacBio, has received research and travel support from ONT and PacBio, holds stock options in MyOme and Basis Genetics, and is a consultant for MyOme. J.A.G. has received travel support from ONT. H.Y.Z. is a member of the Regeneron Board of Directors and an advisory board member to The Column Group, Cajal Therapeutics (also co-founder), and Lyterian. D.P. provides consulting service to Ionis Pharmaceuticals, M2DS Therapeutics and Acadia Pharmaceuticals. All other authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Olaf Riess and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sui, Y., Lin, J., Noyes, M.D. et al. Using the linear references from the pangenome to discover missing autism variants. Nat Commun (2026). https://doi.org/10.1038/s41467-026-68378-4

Download citation

Received: 11 August 2025
Accepted: 06 January 2026
Published: 23 January 2026
DOI: https://doi.org/10.1038/s41467-026-68378-4