Abstract
An inherited, expanded CAG repeat in HTT undergoes further somatic expansion to cause Huntington’s disease (HD). To gain insights into this molecular mechanism, we compared genome-wide association studies of somatic expansion in blood and somatic expansion-driven HD clinical phenotypes. Here, we show that somatic expansion is driven by a mismatch repair-related process whose genetic modification and consequences show unexpected complexity, including cell-type specificity. The HD clinical trajectory is further modified by non-DNA repair genes that differentially influence measures of cognitive and motor dysfunction. In addition to shared (DNA repair genes MSH3, PMS2 and FAN1) and distinct trans-modifiers, a synonymous CAG-adjacent variant in HTT dramatically hastens motor onset without increasing somatic expansion, while a cis-acting 5′-untranslated region variant promotes blood repeat expansion without influencing clinical HD. Our findings are directly relevant to the therapeutic suppression of expansion in DNA repeat disorders and provide additional clues to HD pathogenic mechanisms beyond somatic expansion.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
Summary GWAS results are available at the GWAS Catalog (https://www.ebi.ac.uk/gwas) with accession numbers GCST90565346–GCST90565359. Two separate somatic expansion GWAS based on MiSeq (GCST90565346) and fragment sizing of the CAG repeat (GCST90565347) and multiple clinical GWAS files are provided. These include age at TFC-6 (GCST90565351), age at TFC-6–European ancestry only (GCST90565350), age at motor onset (GCST90565349) and age at the quantile landmarks for Total Motor Score (GCST90565359), Symbol Digit Modalities Test (GCST90565357), Stroop Word (GCST90565358), Bradykinesia (GCST90565352) and Oculomotor Function (GCST90565355), which were included in the data presentation. We also provide summary results from GWAS of age at DCL-4 (which is highly correlated with age at TFC-6; GCST90565348) and age at quantile landmarks for three additional subscores of the Total Motor Score—Chorea (GCST90565353), Rigidity (GCST90565356) and Dystonia (GCST90565354)—which proved less informative and were not discussed here. Each of 14 GWAS summary files contains chromosome, base_pair_location, effect_allele, other_allele, beta, standard_error, effect_allele_frequency and p_value. Data underlying Fig. 2c–f are provided here as Source Data. The individual-level genetic and phenotypic data from this study cannot be deposited into a public or controlled repository because of confidentiality restrictions imposed by the General Data Protection Regulation (GDPR) of the European Union and the CHDI Foundation, sponsor of the Enroll-HD Platform. The individual-level genotype data, phenotype data and MiSeq sequence data are freely available to qualified investigators given their institutional assurance of subject confidentiality and compliance with GDPR requirements regarding personal data. Investigators requesting individual-level data should email info@chdifoundation.org with the words ‘GWAS123456 data’ in the subject line. Source data are provided with this paper.
Code availability
Code used in this project is available at Zenodo (https://doi.org/10.5281/zenodo.14885069)41, (https://doi.org/10.5281/zenodo.14865581)43, (https://doi.org/10.5281/zenodo.14860920)47, (https://doi.org/10.5281/zenodo.10825847)48 and in the Supplementary Information.
References
Huntington’s Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72, 971–983 (1993).
Genetic Modifiers of Huntington’s Disease Consortium. CAG repeat not polyglutamine length determines timing of Huntington’s disease onset. Cell 178, 887–900.e14 (2019).
Wright, G. E. B. et al. Length of uninterrupted CAG, independent of polyglutamine size, results in increased somatic instability, hastening onset of Huntington disease. Am. J. Hum. Genet. 104, 1116–1126 (2019).
Ciosi, M. et al. A genetic association study of glutamine-encoding DNA sequence structures, somatic CAG expansion, and DNA repair gene variants, with Huntington disease clinical outcomes. EBioMedicine 48, 568–580 (2019).
Genetic Modifiers of Huntington’s Disease Consortium. Identification of genetic factors that modify clinical onset of Huntington’s disease. Cell 162, 516–526 (2015).
Lee, J. M., MacDonald, M. E. & Gusella, J. F. Inherited HTT CAG repeat length does not have a major impact on Huntington disease duration. Am. J. Hum. Genet. 109, 1338–1340 (2022).
Kennedy, L. et al. Dramatic tissue-specific mutation length increases are an early molecular event in Huntington disease pathogenesis. Hum. Mol. Genet. 12, 3359–3367 (2003).
Matlik, K. et al. Cell-type-specific CAG repeat expansions and toxicity of mutant Huntingtin in human striatum and cerebellum. Nat. Genet. 56, 383–394 (2024).
Pressl, C. et al. Selective vulnerability of layer 5a corticostriatal neurons in Huntington’s disease. Neuron 112, 924–941.e10 (2024).
Swami, M. et al. Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onset. Hum. Mol. Genet. 18, 3039–3047 (2009).
Hong, E. P. et al. Huntington’s disease pathogenesis: two sequential components. J. Huntingtons Dis. 10, 35–51 (2021).
McAllister, B. et al. Exome sequencing of individuals with Huntington’s disease implicates FAN1 nuclease activity in slowing CAG expansion and disease onset. Nat. Neurosci. 25, 446–457 (2022).
McLean, Z. L. et al. Splice modulators target PMS1 to reduce somatic expansion of the Huntington’s disease-associated CAG repeat. Nat. Commun. 15, 3182 (2024).
Kristmundsdottir, S. et al. Sequence variants affecting the genome-wide rate of germline microsatellite mutations. Nat. Commun. 14, 3855 (2023).
Huntington Study Group. Unified Huntington’s Disease Rating Scale: reliability and consistency. Mov. Disord. 11, 136–142 (1996).
Shoulson, I. & Fahn, S. Huntington disease: clinical care and evaluation. Neurology 29, 1–3 (1979).
Marder, K. et al. Rate of functional decline in Huntington’s disease. Huntington Study Group. Neurology 54, 452–458 (2000).
Lee, J. M. et al. Genetic modifiers of Huntington disease differentially influence motor and cognitive domains. Am. J. Hum. Genet. 109, 885–899 (2022).
Nicolas, E., Golemis, E. A. & Arora, S. POLD1: central mediator of DNA replication and repair, and implication in cancer and other pathologies. Gene 590, 128–141 (2016).
Nakatsubo, T. et al. Human mediator subunit MED15 promotes transcriptional activation. Drug Discov. Ther. 8, 212–217 (2014).
Yang, F. et al. An ARC/Mediator subunit required for SREBP control of cholesterol and lipid homeostasis. Nature 442, 700–704 (2006).
Sandhu, H. K., Hollenbeck, N., Wassink, T. H. & Philibert, R. A. An association study of PCQAP polymorphisms and schizophrenia. Psychiatr. Genet. 14, 169–172 (2004).
Lobanov, S. V. et al. Huntington’s disease age at motor onset is modified by the tandem hexamer repeat in TCERG1. NPJ Genom. Med. 7, 53 (2022).
Yuan, Z. Q. et al. Polymorphisms and HNPCC: PMS2–MLH1 protein interactions diminished by single nucleotide polymorphisms. Hum. Mutat. 19, 108–113 (2002).
Kim, K. H. et al. Genetic and functional analyses point to FAN1 as the source of multiple Huntington disease modifier effects. Am. J. Hum. Genet. 107, 96–110 (2020).
Kim, K. H. et al. Posttranscriptional regulation of FAN1 by miR-124-3p at rs3512 underlies onset-delaying genetic modification in Huntington’s disease. Proc. Natl Acad. Sci. USA 121, e2322924121 (2024).
Wheeler, V. C. & Dion, V. Modifiers of CAG/CTG repeat instability: insights from mammalian models. J. Huntingtons Dis. 10, 123–148 (2021).
Mouro Pinto, R. et al. In vivo CRISPR–Cas9 genome editing in mice identifies genetic modifiers of somatic CAG repeat instability in Huntington’s disease. Nat. Genet. 57, 314–322 (2025).
Lee, J. et al. An upstream open reading frame impedes translation of the huntingtin gene. Nucleic Acids Res. 30, 5110–5119 (2002).
Handsaker, R. E. et al. Long somatic DNA-repeat expansion drives neurodegeneration in Huntington disease. Cell 188, 623–639.e19 (2025).
Dawson, J. et al. A probable cis-acting genetic modifier of Huntington disease frequent in individuals with African ancestry. HGG Adv. 3, 100130 (2022).
Gipson, T. A., Neueder, A., Wexler, N. S., Bates, G. P. & Housman, D. Aberrantly spliced HTT, a new player in Huntington’s disease pathogenesis. RNA Biol. 10, 1647–1652 (2013).
Hoschek, F. et al. Huntingtin HTT1a is generated in a CAG repeat-length-dependent manner in human tissues. Mol. Med. 30, 36 (2024).
Bruneau, B. G. & Nora, E. P. Chromatin domains go on repeat in disease. Cell 175, 38–40 (2018).
Rudich, P., Watkins, S. & Lamitina, T. PolyQ-independent toxicity associated with novel translational products from CAG repeat expansions. PLoS ONE 15, e0227464 (2020).
Schwartz, J. L., Jones, K. L. & Yeo, G. W. Repeat RNA expansion disorders of the nervous system: post-transcriptional mechanisms and therapeutic strategies. Crit. Rev. Biochem. Mol. Biol. 56, 31–53 (2021).
Depienne, C. & Mandel, J. L. 30 years of repeat expansion disorders: What have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764–785 (2021).
Sathe, S. et al. Enroll-HD: an integrated clinical research platform and worldwide observational study for Huntington’s disease. Front. Neurol. 12, 667420 (2021).
Langbehn, D. R., Sathe, S. S., Loy, C., Sampaio, C. & McCusker, E. A. A phenotypic atlas for Huntington disease based on data from the Enroll-HD Cohort Study. Neurol. Genet. 9, e200111 (2023).
Warner, J. P., Barron, L. H. & Brock, D. J. A new polymerase chain reaction (PCR) assay for the trinucleotide repeat that is unstable and expanded on Huntington’s disease chromosomes. Mol. Cell. Probes 7, 235–239 (1993).
Correia, K. HTT-Characterization: htt_trinucleotide_characterize. Zenodo https://doi.org/10.5281/zenodo.14885069 (2025).
Ciosi, M. et al. Library preparation and MiSeq sequencing for the genotyping-by-sequencing of the Huntington disease HTT exon one trinucleotide repeat and the quantification of somatic mosaicism. Protoc. Exch. https://doi.org/10.21203/rs.2.1581/v2 (2018).
Loay, H. et al. RGT: V1.0. Zenodo https://doi.org/10.5281/zenodo.14865581 (2025).
Schobel, S. A. et al. Motor, cognitive, and functional declines contribute to a single progressive factor in early HD. Neurology 89, 2495–2502 (2017).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Turner, S. qqman: an R package for visualizing GWAS results using Q–Q and Manhattan plots. J. Open Source Softw. 3, 731 (2018).
Ciosi, M. GWA6_BloodSomaticExpansionRatioPhenotype. Zenodo https://doi.org/10.5281/zenodo.14860920 (2025).
McLean, Z. H. instability: Natcomms. Zenodo https://doi.org/10.5281/zenodo.10825847 (2024).
Bellenguez, C. et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 54, 412–436 (2022).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Network Pathway Analysis Subgroup of Psychiatric Genomics Consortium. Psychiatric genome-wide association study analyses implicate neuronal, immune and histone pathways. Nat. Neurosci. 18, 199–209 (2015).
Lee, J. M. et al. A modifier of Huntington’s disease onset at the MLH1 locus. Hum. Mol. Genet. 26, 3859–3867 (2017).
Acknowledgements
This work was supported by the CHDI Foundation (J.F.G., M.E.M., D.G.M. and P.H.), National Institutes of Health (NS082079 (J.F.G.), NS091161 (J.F.G.), NS016367 (J.F.G.), NS049206 (V.C.W.), NS105709 (J.-M.L.), NS119471 (J.-M.L.), NS114065 (I.S.S.), NS127866 (V.C.W., I.S.S.) and NS126420 (R.M.P.)), the Hereditary Disease Foundation (Z.L.M., R.M.P.) and the Medical Research Council Centre for Neuropsychiatric Genetics and Genomics (MR/L010305/1 (P.H.)). T.H.M. was supported by a Clinician Scientist Fellowship from the Medical Research Council, UK (MR/X018253/1). The funders had no role in data collection and analysis or the decision to publish. Two scientific collaborators who are advisors to the CHDI Foundation were involved in the conceptualization of the study or editing the final manuscript. We thank the Harvard Tissue Bank at McLean’s Hospital, the National Neurological Research Bank at the Veterans Administration in Los Angeles, the New York Brain Bank at Columbia University and the Massachusetts Alzheimer’s Disease Resource Center Bank at the Massachusetts General Hospital for postmortem brain tissue from HD patients. We also thank the Simons Foundation Powering Autism Research study for nuclear family sequencing data. Biosamples and data used in this work were also generously provided by the participants in the Enroll-HD study and made available by the CHDI Foundation. Enroll-HD is a clinical research platform and longitudinal observational study for families of individuals with HD that is intended to accelerate progress towards therapeutics; it is sponsored by the CHDI Foundation, a nonprofit biomedical research organization exclusively dedicated to collaboratively developing therapeutics for HD. Enroll-HD and the previous contributing HD studies of the Huntington Study Group, the European Huntington Disease Network and the Massachusetts HD Center Without Walls would not be possible without the vital contribution of the research participants and their families. We also thank those individuals who contributed to the collection of the Enroll-HD data, listed at https://enroll-hd.org/enrollhd_documents/ENROLL-HD_AcknowledgementsListPDS6_v1.0_20230119.pdf, and to previous HD studies, listed in their supplementary material sections53,5.
Author information
Authors and Affiliations
Consortia
Contributions
J.F.G., M.E.M., J.-M.L., V.C.W., D.G.M., L.J., P.H. and S.K. conceived the project. J.-M.L., Z.L.M., K.C., J.D.L., T.G., V.C.W., A.G., M.C., V.L. and P.H. developed the methodology. D.L., M.O., G.B.L., J.S.P., E.R.D. and R.H.M. acquired the resources. J.-M.L., V.C.W., R.M.P., Z.L.M., K.C., J.W.S., S.L., J.-H.J., Y.L., K.-H.K., D.E.C., J.D.L., I.S.S., R.M.P., J.V.G., J.S.M., J.S., E.E., J.R., T.G., A.G., M.C., V.L. and C.W. performed the investigations. J.-M.L., Z.L.M., K.C., V.C.W., R.M.P., A.G., M.C. and V.L. handled data curation. J.-M.L., Z.L.M., K.C., J.W.S., S.L., J.-H.J., Y.L., A.G., M.C., V.L. and C.W. performed the formal analysis. J.F.G., M.E.M., D.G.M., L.J. and P.H. were responsible for project administration. J.F.G. and M.E.M. wrote the original draft of the manuscript; J.F.G., M.E.M., J.-M.L., Z.L.M., J.D.L., V.C.W., D.G.M., A.G., M.C., V.L., C.W., T.H.M., P.H., S.K., C.S., M.O., G.B.L., J.S.P., E.R.D. and R.H.M. reviewed and edited the final version. J.-M.L., Z.L.M., V.C.W., A.G., M.C. and V.L. visualized the project. H.L., M.C., Z.L.M. and J.-M.L. acquired the software J.F.G., M.E.M., J.-M.L., V.C.W., D.G.M., L.J. and P.H. supervised the work. J.F.G., M.E.M., J.-M.L., V.C.W., D.G.M., L.J. and P.H. acquired funding.
Corresponding author
Ethics declarations
Competing interests
J.F.G. and V.C.W. were founding scientific advisory board members with a financial interest in Triplet Therapeutics. Their financial interests were reviewed and are managed by Massachusetts General Hospital (MGH) and MGB in accordance with their conflict of interest policies. J.F.G. consults for Transine Therapeutics (dba Harness Therapeutics), has previously provided paid consulting services to Wave Therapeutics USA, Biogen and Pfizer, and receives research funding from Pfizer. V.C.W. is a scientific advisory board member of LoQus23 Therapeutics and has provided paid consulting services to Acadia Pharmaceuticals, Alnylam, Biogen, Passage Bio and Rgenta Therapeutics. V.C.W. and R.M.P. have received research support from Pfizer. J.-M.L. consults for GenKOre and serves on the scientific advisory board of GenEdit. Within the last 36 months, D.G.M. has been a scientific consultant and/or received an honoraria or grants from AMO Pharma, Dyne, F. Hoffman–La Roche, LoQus23, MOMA Therapeutics, Novartis, Ono Pharmaceuticals, Pfizer Pharmaceuticals, Rgenta Therapeutics, Sanofi, Sarepta Therapeutics, Script Biosciences, Triplet Therapeutics and Vertex Pharmaceuticals. D.G.M. also had research contracts with AMO Pharma and Vertex Pharmaceuticals. J.D.L. is a paid Advisory Board member for F. Hoffmann–La Roche and uniQure biopharma and is a paid consultant for Vaccinex, Wave Life Sciences USA, Genentech, Triplet and PTC Therapeutics. T.H.M. is an associate member of the scientific advisory board of LoQus23 Therapeutics and has consulted for Transine Therapeutics (dba Harness Therapeutics). P.H. is a member of the Enroll-HD Scientific Review Committee. S.K. and C.S. are employed by CHDI Management as advisors to the CHDI Foundation. E.R.D. has provided consulting services to Abbott, Abbvie, Acadia, Acorda Therapeutics, Biogen, Biohaven Pharmaceuticals, BioSensics, Boehringer Ingelheim, Caraway Therapeutics, Cerevance, CuraSen, DConsult2, Denali Therapeutics, Eli Lilly, Genentech, Health & Wellness Partners, HMP Education, Karger, KOL groups, Life Sciences Consultant, Mediflix, Medrhythms, Merck, Mitsubishi Tanabe Pharma America, MJH Holdings, NACCME Novartis, Otsuka, Praxis Medicine, Sanofi, Seelos Therapeutics, Spark Therapeutics, Springer Healthcare, Theravance Biopharmaceuticals and WebMD. He has received grant support from Averitas Pharma, Biogen, Burroughs Wellcome Fund, Pfizer, Photopharmics and Roche and has an ownership interest in Included Health, Mediflix, SemCap and Synapticure. G.B.L. has provided consulting services, scientific advice, scientific advisory board functions, independent data monitoring committee services and/or lectures for Acadia Pharmaceuticals, Affiris, Allergan, Alnylam, Amarin, AOP Orphan Pharmaceuticals, Bayer Pharma, Boehringer Ingelheim, CHDI Foundation, Deutsche Huntington-Hilfe, Desitin, Genentech, Genzyme, GlaxoSmithKline, F. Hoffmann–La Roche, Ipsen, ISIS Pharma (IONIS), Lilly, Lundbeck, Medesis, Medivation, Medtronic, NeuraMetrix, Neurosearch, Novartis, Pfizer, Prana Biotechnology, Prilenia, PTC Therapeutics, Raptor, Remix Therapeutics, Rhône-Poulenc Rorer, Roche Pharma AG Deutschland, Sage Therapeutics, Sanofi-Aventis, Sangamo/Shire, Siena Biotech, Takeda, Temmler Pharma GmbH, Teva, Triplet Therapeutics, Trophos, UniQure and Wave Life Sciences. R.H.M. is a paid advisory board member for Rgenta Therapeutics and is on the scientific advisory board with financial interests in Gatehouse Bio. All other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Matt Danzi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Infrequent SNVs capture an extended haplotype tagging some CAA/CCA-loss alleles.
a. For all HD participants with HTT alleles defined by sequence analysis (n = 7,553), age at motor onset association results are plotted for all SNVs > 0.1% MAF in the terminal 5 Mb of chromosome 4p16.3, containing HTT. Each triangle represents a test SNV with significance shown as two-sided p-values from linear mixed effect models with 5.0E-08 considered genome-wide significant (dashed line). The triangle orientation reflects the direction of effect (upward pointing = delaying; downward pointing = hastening). For each of two significant modifier effects, the top SNV and all other SNVs showing r2 > 0.8 with the top SNV are shown as filled triangles: red = motor onset-hastening effect; blue = motor onset-delaying effect. Because the age at TFC-6 (and all other algorithmically predicted landmarks) dataset was based on CAG genotyping in the clinical studies and not on sequencing, the algorithmically predicted phenotypes could not be analyzed in this region without bias due to differences with sequence-determined CAG lengths for non-canonical alleles. b. Association results for age at motor onset are plotted as in panel A, after conditioning on all non-canonical sequences, which removed all significant signals. c. Association results after conditioning on all non-canonical sequences except the CAACAG-dup allele, revealing surviving signal for SNVs shown as blue triangles that tag HD chromosomes with the CAACAG-dup sequence. d. Association results after conditioning on all non-canonical sequences except the CAA/CCA-loss allele, revealing surviving signal for SNVs shown as an extended haplotype of red triangles that tag some HD chromosomes with the CAA/CCA-loss sequence. Conditioning on all non-canonical alleles except CAA-loss (e) or CCA-loss (f) removed all significant SNV signals indicating no shared ancestral haplotype.
Extended Data Fig. 2 GWAS of somatic CAG expansion based on the Peak Proportional Sum (PPS) method from ABI fragment sizing traces.
In parallel to the MiSeq-based somatic expansion GWAS, we carried out a GWAS for this molecular phenotype using 5,342 ABI DNA sizing traces collected from standard PCR fragment-length genotyping of study participants. In an approach analogous to the SER, we assessed CAG expansion using the Peak Proportional Sum (PPS) method, which takes the ratio of summed peak heights longer than the modal CAG length divided by the height of the modal repeat signal. Correcting for CAG length, age at collection, and their interaction term, we performed a PPS GWAS. a. GWAS of somatic CAG expansion measured by the PPS method applied to ABI blood CAG sizing traces. Each point represents a test SNV at >1% MAF with significance based on two-sided p-values from linear mixed effect models with 5.0E-08 considered genome-wide significant (dashed black line) after correcting for the approximate number of multiple independent tests in a standard GWAS. b. An example of ABI traces from a CAACAG-dup allele (top) and a canonical allele (bottom), both carrying uninterrupted repeat lengths of 44 CAGs. The top panel reveals a mispriming artefact that scores as CAG expansion in the PPS method, leading to spurious signal at HTT in the GWAS of panel A. c. Association analysis of the PPS CAG expansion phenotype conditioned on rs183415333, which tags the non-canonical CAACAG-dup allele on HD chromosomes (Extended Data Fig. 1), is shown for SNVs with MAF > 1% in the HTT region with each triangle representing a test SNV at >1% MAF with significance based on two-sided p-values from linear mixed effect models with 5.0E-08 considered genome-wide significant (dashed red line). Triangle size reflects the relative MAF and orientation reflects the direction of effect (upward pointing=increased expansion; downward pointing=decreased expansion). Conditioning removes the spurious signal and leaves rs146151652 (red triangle) as the top SNV, the same HTT 5’-UTR SNV detected in the MiSeq somatic CAG expansion GWAS (Fig. 3b, Extended Data Table 1).
Extended Data Fig. 3 Comparison of GWAS results across ancestry and low MAF.
For comparison with each other and with Fig. 4a, which presented GWAS across all ancestries at MAF > 1%, results are plotted similarly with each point representing a test SNV and significance shown as -log10(p-value) based on two-sided p-values from linear mixed effect models with 5.0E-08 considered genome-wide significant (dashed line) after correcting for the approximate number of multiple independent tests in a standard GWAS. a. GWAS of age at TFC-6 only in individuals of European ancestry (n = 11,185) at SNV MAF > 1%. b. GWAS of age at TFC-6 only in individuals of European ancestry (n = 11,185) at SNV MAF > 0.1%. c. GWAS of age at TFC-6 in all individuals regardless of ancestry (n = 11,698) at SNV MAF > 0.1%. The overall pattern of results did not differ substantially across these analyses except for infrequent SNVs (MAF < 1%) in Europeans. Although a few infrequent SNVs have previously detected strong, reproducible, significant effects at established loci with multiple modifier haplotypes such as MSH3, FAN1, and LIG1 and at HTT, most isolated genome-wide significant modifier signals have proved inconsistent across previous studies. We proposed that infrequent SNVs can produce spurious signals when present in a few phenotypic outliers. In the current GWAS, infrequent SNVs that newly achieved significance in the European-only analysis (panel B) had a higher reported MAF in African-ancestry individuals. These SNVs were no longer significant after the inclusion of the non-Europeans, suggesting that they were due to phenotypic outliers among the Europeans. Beyond the infrequent SNVs reported previously at MSH3, FAN1 and LIG1 and at HTT (Extended Data Table 2 and Extended Data Fig. 1), no infrequent SNVs achieved genome-wide significance in the all-population age at TFC-6 GWAS. Because the age at TFC-6 (and all other algorithmically predicted landmarks) dataset was based on CAG genotyping in the clinical studies and not on sequencing, the algorithmically predicted phenotypes could not be analyzed without bias in the HTT region due to differences with sequence-determined CAG lengths for non-canonical alleles.
Extended Data Fig. 4 Loci with multiple clinical modifier effects.
Regional association results are shown for the age at TFC-6 GWAS (Fig. 4a) in the regions of PMS1 (a), MSH3 (b), PMS2 (c), FAN1 (d), and LIG1 (e), with genes in each region shown below the plot. Each triangle represents a test SNV at MAF > 1% with significance shown as two-sided p-values from linear mixed effect models with 5.0E-08 considered genome-wide significant (dashed line). For each distinguishable modifier effect detected by conditional analyses, the top SNV and all other SNVs showing r2 > 0.8 with the top SNV are shown as filled triangles while all other SNVs are represented by unfilled triangles. The size of each triangle reflects the SNV MAF, which can be judged by comparison with the triangles representing top tag SNVs for distinguishable modifier effects in Extended Data Table 2. The triangle orientation reflects the direction of effect (upward pointing = delaying; downward pointing = hastening).
Extended Data Fig. 5 GWAS of other clinical landmarks.
GWAS results for HD clinical landmarks are plotted with each point representing a test SNV at > 1% MAF and significance shown as two-sided p-values from linear mixed effect models with 5.0E-08 considered genome-wide significant (dashed line) after correcting for the approximate number of multiple independent tests in a standard GWAS. Clinical landmarks defined as described in Methods) for Bradykinesia (11 items; n = 11,704) and Oculomotor Function (6 items; n = 11,308), subscores of the Total Motor Score and the Stroop Word score (n = 11,373), exhibited relative differences in detection of modifier loci. Inflation factors were 0.997, 0.993 and 1.01, respectively. We also performed GWAS for three other Total Motor Score subscores: Chorea (7 items), Rigidity (2 items), and Dystonia (5 items), but these proved to be relatively uninformative, detecting only the FAN1 locus as genome-wide significant in the first two instances. For comparison with these phenotypes and those in Fig. 4a, we also performed GWAS for age at motor onset (n = 12,892) and algorithmically predicted age at Diagnostic Confidence Level 4 (DCL-4; n = 11,408) corresponding to >99 % certainty of clinical manifestations being those of HD. These plots emphasize the increased power provided by algorithmic prediction of landmarks from longitudinal data compared with expert rater assessments. Summary statistics for all phenotypes are available in the publicly available dataset from this paper.
Supplementary information
Supplementary Information
Four items of code/settings.
Supplementary Table 1
(Table 1) Pathway analyses for clinical and somatic expansion GWAS. Supplementary Data Table 1 shows enrichment of gene-wide P values (18,919 genes) for all gene sets (10,043 sets) for age at DCL-4 (0kb-DCL-4), age at TFC-6 (0kb-TFC-6), age at onset (0kb-onset), blood somatic expansion ratio (0kb-SomExp), along with ages at the late clinical landmarks for Bradykinesia (brady_0kb), Total Motor Score (motscore_0kb), Oculomotor Function (oculo_0kb), Symbol Digit Modalities Test (sdmt_0kb), Stroop Word (stroopwo_0kb), Chorea (chorea_0kb), Dystonia (dystonia_0kb) and Rigidity (rigidity_0kb). P values achieving nominal significance of P < 0.05 are denoted by red font and those achieving P < 0.05 after Bonferroni correction for 10,043 pathways are highlighted in yellow. Columns with ‘_nomismatch’ indicate the same analysis after removing the DNA repair genes identified as modifiers in Extended Data Table 2. Columns with ‘3510’ indicate that a window of 35 kb upstream/10 kb downstream was used to assign SNPs to genes. Gene sets involving DNA mismatch repair were the only Bonferroni significant results (based on correction for 10,043 pathways tested), consistent with the timing of these phenotypes being driven by the somatic expansion of the HTT CAG repeat. When the significant DNA maintenance genes from Extended Data Table 2 (FAN1, MSH3, PMS1, PMS2, MLH1, LIG1 and POLD1) were excluded from the analysis to focus on pathways that could reflect aspects of the subsequent cellular toxicity mechanism, the top results, GO:0032371 “regulation of sterol transport” and GO:0032374 “regulation of cholesterol transport” (both P = 7.7 × 10−6 with age at TFC-6) and GO:0010644 “cell communication by electrical coupling (P = 2.1 × 10−5 with age at motor onset), did not survive Bonferroni correction. (Table 2) MiSeqEAoligos. A total of 40 different MiSeq-compatible PCR primers containing full-length Illumina adapters indexed with Nextera XT Index Kit v2 indexes. In the oligonucleotide sequence, the full-length Illumina indexed adapter is italicized with Nextera XT Index in bold, the heterogeneity spacer is in unformatted text in smaller letters and the HTT locus-specific primer sequence is underlined.
Source data
Source Data Fig. 2
Source data for Fig. 2c and Fig. 2d–f.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Genetic Modifiers of Huntington’s Disease (GeM-HD) Consortium. Genetic modifiers of somatic expansion and clinical phenotypes in Huntington’s disease highlight shared and tissue-specific effects. Nat Genet 57, 1426–1436 (2025). https://doi.org/10.1038/s41588-025-02191-5
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02191-5