Somatic CAG repeat expansion in blood associates with biomarkers of neurodegeneration in Huntington’s disease decades before clinical motor diagnosis

Scahill, Rachael I.; Farag, Mena; Murphy, Michael J.; Hobbs, Nicola Z.; Leocadi, Michela; Langley, Christelle; Knights, Harry; Ciosi, Marc; Fayer, Kate; Nakajima, Mitsuko; Thackeray, Olivia; Gobom, Johan; Rönnholm, John; Weiner, Sophia; Hassan, Yara R.; Ponraj, Nehaa K. P.; Estevez-Fraga, Carlos; Parker, Christopher S.; Malone, Ian B.; Hyare, Harpreet; Long, Jeffrey D.; Heslegrave, Amanda; Sampaio, Cristina; Zhang, Hui; Robbins, Trevor W.; Zetterberg, Henrik; Wild, Edward J.; Rees, Geraint; Rowe, James B.; Sahakian, Barbara J.; Monckton, Darren G.; Langbehn, Douglas R.; Tabrizi, Sarah J.

doi:10.1038/s41591-024-03424-6

Download PDF

Article
Open access
Published: 17 January 2025

Somatic CAG repeat expansion in blood associates with biomarkers of neurodegeneration in Huntington’s disease decades before clinical motor diagnosis

Nature Medicine volume 31, pages 807–818 (2025)Cite this article

41k Accesses
41 Citations
302 Altmetric
Metrics details

Subjects

Abstract

Huntington’s disease (HD) is an autosomal dominant neurodegenerative disease with the age at which characteristic symptoms manifest strongly influenced by inherited HTT CAG length. Somatic CAG expansion occurs throughout life and understanding the impact of somatic expansion on neurodegeneration is key to developing therapeutic targets. In 57 HD gene expanded (HDGE) individuals, ~23 years before their predicted clinical motor diagnosis, no significant decline in clinical, cognitive or neuropsychiatric function was observed over 4.5 years compared with 46 controls (false discovery rate (FDR) > 0.3). However, cerebrospinal fluid (CSF) markers showed very early signs of neurodegeneration in HDGE with elevated neurofilament light (NfL) protein, an indicator of neuroaxonal damage (FDR = 3.2 × 10⁻¹²), and reduced proenkephalin (PENK), a surrogate marker for the state of striatal medium spiny neurons (FDR = 2.6 × 10⁻³), accompanied by brain atrophy, predominantly in the caudate (FDR = 5.5 × 10⁻¹⁰) and putamen (FDR = 1.2 × 10⁻⁹). Longitudinal increase in somatic CAG repeat expansion ratio (SER) in blood was a significant predictor of subsequent caudate (FDR = 0.072) and putamen (FDR = 0.148) atrophy. Atypical loss of interruption HTT repeat structures, known to predict earlier age at clinical motor diagnosis, was associated with substantially faster caudate and putamen atrophy. We provide evidence in living humans that the influence of CAG length on HD neuropathology is mediated by somatic CAG repeat expansion. These critical mechanistic insights into the earliest neurodegenerative changes will inform the design of preventative clinical trials aimed at modulating somatic expansion. ClinicalTrials.gov registration: NCT06391619.

Huntington disease: somatic expansion, pathobiology and therapeutics

Article 13 November 2025

Somatic CAG repeat instability in intermediate alleles of the HTT gene and its potential association with a clinical phenotype

Article Open access 04 March 2024

Longitudinal investigation of changes in resting-state co-activation patterns and their predictive ability in the zQ175 DN mouse model of Huntington’s disease

Article Open access 23 June 2023

Main

Huntington’s disease (HD) is a devastating condition characterized by loss of striatal medium spiny neurons (MSNs) and striatal neurodegeneration¹ leading to impaired motor, cognitive and neuropsychiatric function which typically manifests in middle age, with clinical diagnosis defined by the appearance of unequivocal HD-related motor signs. There are currently no disease-modifying treatments².

HD is an autosomal dominant disorder and is caused by an expanded CAG repeat ≥40 in the huntingtin gene (HTT) coding for polyglutamine in the mutant huntingtin protein (mHTT), which is the presumed toxic entity leading to neuronal dysfunction and death. It is well established that inherited CAG repeat length has a strong influence on age at clinical motor diagnosis³. Notably, the HTT repeat is somatically unstable⁴ and expansion of tens or even hundreds of repeats are observed in the most vulnerable striatal neurons^5,6,7,8; greater somatic expansion occurs with longer initial CAG length. Evidence indicating that faster individual-specific rates of somatic expansion in brain are associated with earlier clinical motor diagnosis and faster disease progression⁹ strongly suggests that somatic expansion is a key mechanism explaining the CAG effect on disease progression. Indeed, it has been suggested that somatic expansion is required to generate pathology, and that HD involves two thresholds as follows: first, the inherited CAG length that leads to further somatic expansion, and second, the intracellular pathogenic threshold above which neuronal dysfunction and death occur^10,11,12,13. Consistent with this, a recent postmortem study suggests that neurons may experience decades of ‘biologically quiet’ somatic CAG repeat expansion with neuronal damage triggered by a cascade of repeat-length dependent transcriptional dysregulation events only when the CAG reaches a threshold of ~150 repeats⁸. Further understanding the dynamics of somatic expansion directly in the brain is hampered by the nonavailability of brain biopsy material from young living participants. Although somatic CAG expansion is clearly cell-type dependent^6,7,8, faster individual-specific rates of somatic expansion in blood DNA are also associated with earlier clinical motor diagnosis¹⁴, suggesting that individual-specific somatic expansion rates in blood DNA are at least partially predictive of individual-specific somatic expansion rates in the brain. This hypothesis is supported by genetic modifier studies that reveal a panoply of DNA repair gene variants as modifiers of both HTT somatic expansion and HD clinical phenotypes^{13,14,15,16,17}.

The polyglutamine-encoding CAG repeat tract in HTT is followed just downstream with a polymorphic polyproline-encoding CCG repeat. Typically, the intervening sequence between the CAG and CCG repeat tracts is comprised of a glutamine-encoding CAACAG cassette and a proline-encoding CCGCCA cassette. However, a number of atypical HTT repeat structures have been identified with loss of either or both of the intervening CAACAG or CCGCCA cassettes associated with an earlier age at clinical motor diagnosis; conversely, duplication of the CAACAG cassette delays this milestone^{13,14,17,18,19}. These data reveal that both HD age at clinical motor diagnosis and the somatic expansion potential of the repeat are best predicted by pure CAG repeat length, rather than encoded polyglutamine length, providing additional support for a key role for somatic expansion in driving disease onset^13,14,18.

The monogenic nature of HD and the existence of diagnostic and predictive testing for at-risk family members makes it a tractable disease and much progress has been made towards developing disease modification treatments². The first phase 1/2 trial of an antisense oligonucleotide (ASO), tominersen, showed dose-dependent lowering of mutant huntingtin levels²⁰. Although the subsequent phase 3 trial was halted early due to adverse safety concerns²¹, a phase 2 study to better establish safety and tolerability earlier in disease progression is ongoing (ClinicalTrials.gov registration: NCT05686551). Alternative approaches such as allele-specific huntingtin-lowering, protein splicing modulation, and gene therapy are also currently being trialed (reviewed in ref. ²²). Additionally, somatic expansion and proteins, such as MSH3 and FAN1, are now being actively pursued as therapeutic targets in HD. A key question in using such therapies will be determining the optimal timing for treatment. The appearance of HD motor signs is already accompanied by substantial striatal neurodegeneration, and earlier treatment seems likely to produce greater clinical benefit. However, all the studies to date have relied on postmortem brain analyses to model the link between CAG repeat expansion to the earliest pathological progression of the disease. Understanding the triggers of the neurodegenerative process is vital in the search for future therapies and identifying the best time to treat to provide therapeutic intervention.

The greatest opportunity to influence disease progression lies in early treatment, with the goal of delaying or preventing clinical motor diagnosis. Numerous large observational studies show that brain changes occur decades from predicted clinical motor diagnosis^23,24,25 and that subtle cognitive and motor signs emerge as HD gene expanded (HDGE) individuals approach clinical motor diagnosis. The recent introduction of the HD Integrated Staging System (HD-ISS) provides a new empirical framework for classifying people with HD throughout life²⁶, with stage 0 being the HDGE group with striatal volumes within the general population range, stage 1 being the presence of a biomarker of pathogenesis (caudate and/or putamen volume change), stage 2 being the presence of motor and/or cognitive signs and stage 3 being marked by the onset of functional impairment²⁶. Cohorts in the earliest stages will likely gain the most benefit from preventative therapies.

A key challenge in delivering preventative treatments is to identify and validate robust measures in HD-ISS stages 0 and 1, where the absence of outward signs of impairment renders established motor and cognitive testing batteries insensitive. HD Young Adult Study (HD-YAS) is a unique cohort, ~23 years from predicted clinical motor diagnosis at baseline with deep phenotyping including biofluid, imaging, clinical, cognitive and motor assessments. Our cross-sectional baseline data demonstrated subtle elevations in biofluid biomarkers, such as cerebrospinal fluid (CSF) neurofilament light (NfL), accompanied by slightly smaller putamen volumes in the HDGE group compared to unaffected controls²⁵. Despite this, there was no difference in functional performance between the groups. This cohort, therefore, spans an optimum window for investigating the potential of interventions to delay or prevent symptoms.

Here we present 4.5-year follow-up data from HD-YAS, a deep-phenotyped longitudinal study of young stages 0 and 1 HDGE adults, ~19 years before clinical motor diagnosis. We hypothesized that the effects of somatic expansion in the brain might be detected long before clinical motor onset and tested this hypothesis through detailed longitudinal analysis of preclinical HD phenotypes, biomarkers of neurodegeneration and somatic expansion in blood DNA. We examined change over time in a range of assessments with the aim of identifying ongoing neuropathology and associations with somatic CAG expansion in blood DNA and HTT repeat structures, decades before predicted clinical motor diagnosis, and biomarkers of disease progression, which may have utility in future prevention trials.

Results

Participant characteristics

A total of 131 (64 HDGE and 67 controls) participants attended at baseline and 103 (57 HDGE and 46 controls) returned for follow-up ~4.5 years later (see Extended Data Fig. 1 for reasons for dropout). To account for those not returning, we recruited 23 new participants (9 HDGE and 14 controls) giving a total of 154 participants (73 HDGE and 81 controls). At baseline, 44 (81%) participants of the cohort were in HD-ISS stage 0, 9 (17%) in stage 1 and 1 (2%) in stage 2 (Fig. 1a). Over 4.5 years, 10 (~23%) participants moved from stage 0 to stage 1; there was no progression to stage 2. The transition in staging within the HD-YAS cohort is depicted by overlaying the probability matrix for each HD-ISS stage across different ages for individuals with a mean CAG repeat length of 42, comparable to the mean CAG repeat length of our cohort (Fig. 1b). Here we describe further longitudinal results from the participants; cross-sectional results, updated from the original baseline study, are provided in Supplementary Results and Discussion.

**Fig. 1: Longitudinal change in clinical, cognitive and neuropsychiatric measures.**

There were no significant differences (false discovery rate (FDR) < 0.15) between the HDGE and control groups in age, sex, interval between visits, education score or National Adult Reading Test (a measure of premorbid intelligence; Extended Data Table 1).

Cognitive and neuropsychiatric assessments

There was no significant longitudinal disease-related decline in any of the comprehensive cognitive (FDR > 0.8; Fig. 1c) or neuropsychiatric (FDR > 0.3; Fig. 1d) assessments, demonstrating that change in the HDGE group was no different from matched controls. Cross-sectional results are shown in Supplementary Fig. 1 and summary statistics are provided in Supplementary Tables 1 and 2 for longitudinal and Supplementary Tables 3 and 4 for cross-sectional results.

Neuroimaging

After quality control, longitudinal data were available for 88 (54 HDGE and 34 controls) participants for volumetric imaging, 83 (50 HDGE and 33 controls) for diffusion-weighted imaging (DWI) and 75 (43 HDGE and 32 controls) for multiparametric mapping (MPM). As left-handed participants were excluded, 70 (43 HDGE and 27 controls) participants were available for the structural connectivity analysis. See Supplementary Table 5 and Supplementary Methods for further details.

The HDGE group showed significantly greater rates of atrophy in putamen (P = 4.0 × 10⁻¹⁰, FDR = 1.2 × 10⁻⁹) and caudate (P = 1.1 × 10⁻¹⁰, FDR = 5.5 × 10⁻¹⁰). There were also significant group differences for gray matter (P = 7.5 × 10⁻³, FDR = 9.4 × 10⁻³), white matter (P = 1.4 × 10⁻², FDR = 1.4 × 10⁻²) and whole brain (P = 7.1 × 10⁻⁴, FDR = 1.2 × 10⁻³) with associated ventricular expansion (P = 3.9 × 10⁻⁵, FDR = 9.8 × 10⁻⁵; Fig. 2). Caudate, putamen and white matter loss were significantly predicted by age and CAG (P = 2.1 × 10⁻⁷, FDR = 1.0 × 10⁻⁶; P = 1.5 × I0⁻⁸, FDR = 8.9 × 10⁻⁸; P = 0.01, FDR = 0.012, respectively).

**Fig. 2: Annualized changes in volumetric measures longitudinally.**

DWI demonstrated elevated rates of longitudinal change in all diffusion and neurite orientation and dispersion density imaging metrics across multiple regions of interest in the HDGE group compared to controls (FDR < 0.15). The splenium of the corpus callosum, the anterior capsule and the external capsule showed associations with age and CAG (FDR < 0.15). There were no significant between-group differences in the rate of change for any of the structural connectivity (all FDR > 0.4) or MPM measures (all FDR > 0.3), nor any evidence of an influence of age and CAG (all FDR > 0.15).

Neuroimaging results suggest that across HD-ISS stages 0 and 1, there are already elevated rates of brain atrophy accompanied by subtle microstructural white matter changes. See Extended Data Tables 2 and 3 for summary statistics for longitudinal volumetric and diffusion results, respectively. Summary statistics for remaining longitudinal metrics are provided in Supplementary Tables 6 and 7 and cross-sectional data in Supplementary Tables 8–11.

Biofluids

A total of 216 biofluid samples were collected across baseline and follow-up visits over the 4.5-year interval. Paired fasting CSF and plasma samples were acquired in 86 (53 HDGE and 33 controls) of the 103 (83.5%) longitudinal participants.

From a significantly increased baseline, CSF NfL (Fig. 3a) and CSF YKL-40 (also known as chitinase-3 like-protein-1 (CHI3L1)) (Fig. 3c) rose more rapidly in HDGE compared to controls (P = 3.2 × 10⁻¹³, FDR = 3.2 × 10⁻¹² and P = 0.01, FDR = 0.056, respectively). New to this timepoint, proenkephalin (PENK), a surrogate marker for striatal MSN state, measured in CSF, showed a significant longitudinal reduction in HDGE individuals compared to controls (P = 4.4 × 10⁻⁴, FDR = 2.6 × 10⁻³; Fig. 3b). An increase in plasma NfL was nonsignificant (P = 0.336, FDR = 0.669; Extended Data Fig. 2).

**Fig. 3: Annualized changes in biofluid markers longitudinally.**

Cross-sectionally, log concentrations of both CSF NfL (P = 5.4 × 10⁻³⁰, FDR = 6.5 × 10⁻²⁹) and PENK (P = 1.7 × 10⁻⁷, FDR = 1.0 × 10⁻⁶) were highly associated with age, CAG length and their interaction. There was also evidence for an influence on longitudinal change in CSF NfL (P = 0.027, FDR = 0.322) and PENK (P = 0.0547, FDR = 0.328). Plasma NfL had a similar cross-sectional association (P = 7.2 × 10⁻⁷, FDR = 2.9 × 10⁻⁶) but no significant longitudinal association with age and CAG. Regression coefficients are reported in Supplementary Tables 12–14.

Slightly higher annualized rates of change in NfL in CSF and plasma were observed in the HDGE group at stage 0 compared to stage 1 on follow-up but did not reach the threshold of significance (FDR > 0.15). Mean CSF NfL levels (across both visits) were higher in HD-ISS progressor (stages 0 to 1—mean = 6.89 pg ml⁻¹, log scale) compared to nonprogressors (stage 0 to 0—mean = 6.11 pg ml⁻¹, log scale; stage 1 to 1—mean = 6.37 pg ml⁻¹, log scale; Supplementary Table 15). After adjusting for age, sex and their interaction, the difference between stage 0 to 1 progressors and stage 0 nonprogressors was statistically significant (P = 0.0004). Similarly, the difference between stage 0 to 1 progressors and stage 1 nonprogressors was significant (P = 0.045) when controlling for age and sex, but nonsignificant without these adjustments. No significant differences were observed for plasma NfL levels (Supplementary Table 16).

CSF mHTT levels were notably very low than later disease stages²⁷, with only 38.3% (n = 41/107) of samples exceeding the lower limit of quantification and demonstrating an acceptable coefficient of variation below 30% (Supplementary Fig. 2).

The rate of change in other biofluid markers, including plasma NfL, CSF and plasma tau, CSF and plasma glial fibrillary acidic protein (GFAP), CSF and plasma ubiquitin carboxyl-terminal hydrolase L1 (UCH-L1), and CSF interleukin-6 (IL-6) and IL-8, showed no significant differences between groups (Extended Data Fig. 2). Additionally, none of the fluid biomarkers, including NfL, had an association with age, CAG or age-by-CAG interaction (FDR > 0.15). See Supplementary Table 17 for longitudinal and Supplementary Table 18 for cross-sectional summary statistics.

Somatic expansion ratios in blood

Significant longitudinal increases in the somatic expansion ratio (SER) were detected in blood DNA in the HDGE group over 4.5 years (P = 2.0 × 10⁻⁸), with SER clearly increasing as early as HD-ISS stage 0 (Fig. 4a). SER rates of change were strongly influenced by an accelerating effect of CAG repeat length (P = 3.0 × 10⁻⁵).

**Fig. 4: Effects of somatic expansion.**

HTT allele structures

The majority of the HDGE group exhibited the typical HTT repeat structure on their expanded allele (n = 66, 91.6%), while a small subset (n = 6) showed atypical allelic variations (Fig. 5a). Specifically, the CAACAG duplication was observed in 1 (1.4%) participant, the CAACAGCCGCCA double loss was found in 4 (5.6%) and 1 (1.4%) had the CCGCCA loss.

**Fig. 5: Effects of CAG architecture and allelic variants.**

Predictors of progression

Baseline NfL, both plasma and CSF, and CSF PENK were predictors of atrophy over time in all brain regions (all FDR < 0.04), even after controlling for the effect of age and CAG (all FDR < 0.12; Extended Data Table 4). Rate of change in caudate and putamen was most strongly associated with change in CSF NfL (P = 3.0 × 10⁻⁴, FDR = 0.003 and P = 2.2 × 10⁻⁴, FDR = 0.003, respectively) and plasma NfL (P = 0.002, FDR = 0.01 and P = 0.03, FDR = 0.06, respectively) and the association remained after controlling for age and CAG effects (all FDR < 0.09). Rates of change in caudate and putamen were also associated with longitudinal change in CSF PENK before (P = 2.0 × 10⁻⁴, FDR = 0.003 and P = 1.0 × 10⁻⁴, FDR = 0.001, respectively) and after (P = 0.002, FDR = 0.021 and P = 9.0 × 10⁻⁴, FDR = 0.011, respectively) age-by-CAG correction.

Longitudinal increase in SER was a significant predictor of the rate of subsequent caudate volume change before (P = 0.01, FDR = 0.04) and after age-by-CAG correction (P = 0.03, FDR = 0.07; Fig. 4b). Longitudinal increase in SER was also a significant predictor of the rate of subsequent putamen volume change before (P = 0.02, FDR = 0.07) and after (P = 0.049, FDR = 0.148) age-by-CAG correction (Fig. 4c). Baseline SER was strongly associated with cross-sectional levels of CSF NfL (P = 2.5 × 10⁻¹², FDR = 2.9 × 10⁻¹¹; Fig. 4d) and CSF PENK (P = 8.4 × 10⁻⁵, FDR = 3.4 × 10⁻⁴; Fig. 4e) before age-by-CAG correction. However, these associations did not remain significant after the correction (CSF NfL—P = 0.827, FDR = 0.956; CSF PENK—P = 0.908, FDR = 0.956).

After controlling for CAG, age, age-by-CAG, sex and SER effects, compared to typical allele structure, the loss of CAACAG CCGCCA atypical allele had significant effects on rates of caudate (P = 1.90 × 10⁻⁵; Fig. 5b.i) and putamen (P = 0.007; Fig. 5b.ii) atrophy as well as cross-sectional CSF NfL (P = 0.002; Fig. 5b.iii) and CSF PENK (P = 0.001; Fig. 5b.iv) levels, with the loss of the intervening CAACAG CCGCCA associated with an accelerated neurodegenerative course (Extended Data Fig. 3). Notably, after correction for pure CAG length, there was no detectable association between atypical allele structure and SER (FDR > 0.15).

Sample size calculations

Extended Data Table 5 shows hypothetical sample size calculations for those variables with significant longitudinal effects in the HDGE group. For a 50% treatment effect over 2 years in stages 0 and 1, total sample sizes would be 232, 282 and 326 for rates of change in CSF NfL levels, caudate and putamen volume, respectively. For a 3-year trial, these numbers would be reduced to 104, 126 and 146, respectively.

Discussion

We have used state-of-the-art multimodal measures of cognition, neuroimaging, genetics and biofluid markers in a new assessment battery to study a unique cohort of young adult HDGE who were at baseline, on average, approximately 23 years before predicted clinical motor diagnosis, comparing them to matched controls in an unprecedented level of detail. Our baseline cross-sectional data identified early signs of neurodegeneration despite the maintenance of intact brain function²⁵ and here we present 4.5-year follow-up data with important new mechanistic insights into what drives neurodegeneration in humans carrying the HD mutation (Fig. 6).

Our data highlight the role of inherited CAG repeat length and somatic expansion on neurodegeneration, decades before clinical motor diagnosis. We identify brain atrophy, elevated levels of CSF NfL, a marker of neuronal damage, and reduced levels of CSF PENK, a marker of striatal MSN state, in the earliest adult HD cohort studied to date. Despite evidence for the start of the neurodegenerative process, there is an absence of any decline in cognitive, motor or neuropsychiatric function at HD-ISS stages 0 and 1. Notably, we show that somatic CAG repeat expansion measured longitudinally in blood, a validated measure of somatic expansion in living patients^14,17, is a predictor of the effect of CAG repeat length on striatal markers of very early neurodegeneration.

Consistent with the elevated levels of CSF NfL we reported at baseline²⁵, we now show substantially greater rates of increase in CSF NfL in HDGE compared to controls, indicating accelerating neuroaxonal injury from the earliest stages. Most notably, the rate of change in CSF NfL in HD-ISS stage 0 was at least as fast as in stage 1, suggesting rapid neuroaxonal injury increases even before reaching the threshold of caudate or putamen volumetric loss cutoff for stage 1. Interestingly, mean CSF NfL levels were higher in HD-ISS stage 0 to 1 progressors compared to nonprogressors in both stage 0 and stage 1. The annualized rates of increase in CSF NfL across the whole HDGE group (mean = 63.38 pg ml⁻¹ yr⁻¹) are slightly lower than those reported in the previous HD-CSF cohort (mean = 79.16 pg ml⁻¹ yr⁻¹)²⁷, which is consistent with the HD-YAS cohort being towards the beginning of the neurodegenerative process.

Axonal damage and injury lead to leakage of NfL into the CSF^28,29,30 and are elevated in active inflammation³¹. NfL is a nonspecific marker of neuronal injury, and elevated levels have been reported in other neurodegenerative conditions^{32,33,34,35,36,37,38,39}. Increases in CSF NfL are not necessarily attributable to neuronal death and could result from other degenerative processes such as leaky axons. Nevertheless, it is a clear marker of neuroaxonal pathology and therefore understanding CSF NfL temporal dynamics and kinetics can provide valuable insights into mechanisms in neurodegenerative diseases²⁹.

Previously, cross-sectional studies have revealed lower levels of CSF PENK in manifest HD compared to other neurodegenerative conditions⁴⁰, as well as compared to HDGE before clinical motor diagnosis and controls^41,42. Our longitudinal findings in a larger cohort, and our demonstration of a significant association between PENK levels and striatal imaging measures, serve to substantially strengthen the rationale for using PENK as a surrogate marker for striatal MSN state.

Astrocytes are implicated in disease processes through both cell-autonomous and non-cell-autonomous mechanisms^43,44, with one key study identifying a core signature of astrocyte genes with expression altered by mHTT in both humans and mouse models⁴⁴. A recent study provided the first evidence of mHTT-induced alterations in basal pro-inflammatory cytokine production in microglia without immune stimulation, along with a reduction in endocytic and phagocytic activity in mHTT-bearing microglia under basal conditions, suggesting a possible role for microglial cell-autonomous inflammation and activity in the early stages of HD⁴⁵. Consistent with our previous findings of elevated microglial marker CSF YKL-40 levels at baseline²⁵, we now show greater rates of increase in CSF YKL-40 longitudinally in the HDGE group compared to controls. However, we do not observe significant longitudinal changes in pro-inflammatory cytokine markers IL-6 and IL-8, which are components of the innate immune system, nor in GFAP, an intermediate filament protein of astrocytes associated with astroglial activation⁴⁶. It is known that mHTT is expressed in microglia⁴⁷ and that microglial activation correlates with severity later in the disease⁴⁸, where mHTT-induced dysfunction of central nervous system (CNS) immune cells is closely linked to pathogenesis⁴⁹. We postulate that the isolated elevation of YKL-40 may be due to both cell-autonomous and non-cell-autonomous mechanisms at play with activation driven by mHTT dysregulation of astrocytes, rather than general gliosis, which would be additionally indicated by a concomitant rise in GFAP. Our findings suggest that astrocytic dysfunction is more prominent than any abnormal innate immune response at this stage of the disease, as IL-6 and IL-8 levels, which are upregulated in HD and correlate with disease progression^49,50, remained unchanged longitudinally, reinforcing the importance of treating early at this stage, before widespread neuroinflammation occurs.

The presence of neuronal damage within HD-ISS stage 0 is further supported by the evidence of substantially elevated rates of brain atrophy and a corresponding reduction in CSF PENK levels. Stages 0 to 1 progressors also had substantially higher elevations in CSF NfL than stage 0 nonprogressors. The substantially higher rates of caudate and putamen atrophy and global brain measures and their association with disease burden suggest that neurodegenerative processes are already occurring across our cohort and at the earliest ages observed in this study. This atrophy was measurable in those with basal ganglia volumes distributed throughout the volume range observed in unaffected controls, implying the beginning of detectable neurodegeneration. In addition to these changes seen at the macrostructural level, diffusion imaging provides evidence that there is ongoing very early microstructural white matter damage. The strong predictive power of baseline NfL (in both plasma and CSF) for subsequent atrophy in all brain regions further supports the suggestion that there is early neuroaxonal damage which leads to macroscopic effects such as brain atrophy.

Despite the evidence of ongoing pathological changes in our stages 0 and 1 cohort, neurodegeneration is not yet impacting measurable function as we saw no significant disease-related decline in any of the cognitive, neuropsychiatric or functional measures. Previous work has shown that such changes only become evident from HD-ISS stage 2 (ref. ⁵¹).

We demonstrate the accumulation of somatic expansion of the HTT CAG repeat in blood DNA over time in HD-ISS stages 0 and 1 and, critically, show that it is associated with both brain atrophy and CSF NfL, a marker of neuronal–axonal injury, and CSF PENK, a surrogate marker of striatal MSN state. A higher inherited CAG length was associated with a faster increase in SER over time. SER was associated with caudate and putamen atrophy, both cross-sectionally and longitudinally, even after controlling for age-by-CAG interactions. Baseline SER was strongly associated with cross-sectional levels of CSF NfL and CSF PENK before age-by-CAG correction; however, these associations did not remain significant after the correction. We postulate that bioassay measurements demonstrate higher variability and noise compared to striatal volume measurements. Therefore, the lack of significance in associations with CSF NfL and PENK does not undermine the significant association between the longitudinal increase in SER and volumetric changes in the caudate and putamen. Additionally, the statistical strength of the influence of CAG length on atrophy was weakened in models also controlling for blood SER. Assuming that, via the common baseline CAG length effects and shared genetic modifiers, SER measured in blood is an indirect quantifiable indicator of the greater somatic expansion occurring in neurons, these results may be seen as providing in vivo evidence for the key role of somatic CAG repeat expansion in very early HD pathology in humans (Extended Data Fig. 4), reinforcing the putative pathological role of somatic expansion as a critical factor in disease progression^5,6,7,8,9.

If the recent suggestion from HD postmortem brains that asynchronous somatic expansion leads to asynchronous stochastic crossing of the transcriptional dysregulation threshold and asynchronous neuronal death⁸ is correct, then our data would support the hypothesis that somatic expansion is already an active process in the brain and that some neurons have already crossed a critical repeat length threshold ~20 years before clinical motor diagnosis. Indeed, this phenomenon is both predicted by the stochastic models and consistent with autopsy observations of early neuronal loss⁸. This would suggest that suppressing CAG repeat somatic expansion from this point in the disease process could prevent additional neurons from passing the neuronal toxicity threshold and reduce neurodegeneration before functional deficits are manifest. Therapeutic agents targeting DNA repair proteins that modify somatic expansion show great potential, with MSH3 as a particularly attractive target for HD and other repeat expansion disorders⁵², and various MSH3-targeting therapeutics are currently under development^53,54. To this end, somatic expansion of CAG repeats in blood DNA could be a useful biomarker to demonstrate target engagement of somatic expansion-suppressing therapies with peripheral exposure.

Within our cohort, a small number of individuals carried atypical CCGCCA or CAACAG CCGCCA loss of intervening sequence HTT alleles. These atypical structures have a high potential to cause mid-estimation of the CAG repeat length^{13,14,17,18,19}, and using the MiSeq-derived CAG lengths changed the mean baseline years to predicted clinical motor diagnosis in HD-YAS from 24 to 23 years. After correcting for pure CAG length, these structures have previously been associated with earlier clinical motor diagnosis^{13,14,17,18,19}. Consistent with this, we find those participants with the loss of intervening sequence structures exhibit higher rates of caudate and putamen atrophy, and have some of the greatest elevations in CSF NfL and reductions in CSF PENK, which together suggest an acceleration of the degenerative process (Fig. 5b). Detecting these effects in such small numbers so early in the course of disease suggests these synonymous DNA structural differences are exerting a substantial influence on the rate of neuropathological change. Interestingly, after correcting for pure inherited CAG there was no residual association between these allele structure variants and SER. This is consistent with previous work in other cohorts in blood, postmortem brains and cell lines^14,17,19 showing that the loss of the intervening CAACAG CCGCCA does not increase the rate of CAG expansion over and above the effects of pure CAG length. Relevant available brain data is limited so it is still possible that the CAACAG CCGCCA loss increases CAG expansions in brain but not blood. An alternative hypothesis is that, after correcting for pure CAG, the residual disease-modifying mechanism of the CAACAG CCGCCA loss is independent of somatic expansion of the HTT repeat via effects on RNA transcription, RNA stability, or canonical or repeat-associated non-ATG translation (Extended Data Fig. 3). Regardless, these variants clearly have a profound impact on the disease course.

This work not only provides evidence to support the potential of therapies targeting somatic expansion but also identifies robust markers of disease progression, which may have utility as likely surrogates for future preventative clinical trials. CSF NfL, PENK and brain atrophy measures have the potential to monitor disease progression in HD-ISS stages 0 and 1, where clinical endpoints are not applicable. Change in CSF NfL level has previously been used as an outcome measure for a trial of the ASO nusinersen⁵⁵ in children with spinal muscular atrophy. Earlier treatment initiation was also associated with a larger decrease in CSF NfL levels, underscoring the importance of early intervention to preserve neuronal health.

At this stage of the disease, CSF mHTT levels are very low, with only 38.3% of samples in the HDGE group exceeding the detection level. These findings underscore the limitations of available CSF mHTT assays and confirm there is an urgent need for a reliable assay capable of detecting very low concentrations of mHTT in HDGE, ideally at attomolar levels, if HTT-lowering therapies are to be pursued in stage 0 and 1 HDGE cohorts.

Our extensive phenotypic characterization of HD-ISS stages 0 and 1 may allow us to enrich recruitment for future preventative trials. For example, we demonstrate that baseline NfL and PENK levels predict subsequent brain atrophy, and the potential to establish cutoffs for enriching HD-ISS stage 0 based on these biofluids holds significant promise. Harmonization of HD-YAS with existing cohorts across the disease spectrum such as HD-CSF and HDClarity (ClinicalTrials.gov: NCT02855476) will help to establish reliable cutoffs for inclusion. Another important consideration in clinical trial design is that atypical repeat structures, although infrequent, substantially affect disease progression and may additionally impact therapeutic efficacy. Identification of these rare cases through MiSeq will be important to control for these effects and more accurately assess treatment efficacy.

If these biomarkers can serve as likely surrogate outcomes, sample size calculations suggest feasible numbers for clinical trials in an HD-ISS stage 0/1 cohort given sufficiently large treatment effects. For example, in a clinical trial over 3 years with a 50% treatment effect, 104 participants would be required with CSF NfL as an outcome measure, with 126 for caudate and 146 for putamen atrophy. Notably, the caudate boundary-shift integral measure of change we use here is already well-validated and has previously been used in the laquinimod trial in HDGE with a clinical motor diagnosis⁵⁶.

In summary, the results presented strongly support the hypotheses that individual-specific somatic expansion in blood DNA predicts individual-specific somatic expansion in the brain. We show in living participants, decades before clinical motor diagnosis, that somatic expansion of the CAG repeat appears to be an important driver of the earliest pathological disease processes, as evidenced by its association with striatal atrophy rates and CSF NfL and PENK levels. Somatic expansion of repeats underlying disease pathogenesis is likely relevant to many repeat expansion diseases, where similar DNA repair mechanisms may play a role. With new therapies in development to target the DNA repair proteins that are known to influence somatic expansion, our results are timely in demonstrating its association with measurable disease markers. By intervening with therapies targeting somatic CAG repeat expansion at the start of the neurodegenerative process, that is, HD-ISS stages 0 and 1 decades before clinical motor diagnosis, while function remains intact, there is the very real possibility that treatments can delay or even prevent the appearance of clinical signs. To this end, we have identified robust measures of early pathology with potential to act as possible biomarker surrogates of disease progression, and identified the ideal cohort for intervention to delay or prevent clinical motor diagnosis.