Fig. 3: Rare deleterious FAN1 variants are associated with altered HD onset and cluster in functional protein domains.

a, Rare, non-synonymous FAN1 variants identified through exome sequencing in the dichotomous HD cohort (n = 637), divided between early/more severe and late/less severe phenotype groups. MAF < 1%. A total of 65 such variants (28 different) were identified across 62 individuals. Three people carried two variants. CADD score is a measure of predicted deleteriousness of a coding variant. CADD ≥ 20 implies that a variant is in the 1% predicted most damaging substitutions in the human genome. A total of 43 individuals carried at least one such predicted damaging variant, with two people carrying two (although these could not be phased). b, FAN1 variants identified in individuals with HD, plotted by CADD score over a cartoon of FAN1 protein. Variants associated mostly with early/more severe phenotype (orange triangles), late/less severe phenotype (green triangles) or neither phenotype group (gray squares, ‘neutral’) are shown. Variants above the CADD = 20 line are predicted to be in the top 1% most damaging variants in the human genome; those with CADD > 10 are predicted to be in the top 10%. Two likely damaging singleton variants lack CADD scores and so are plotted as CADD = 0. They are highlighted: loss-of-function (frameshift) variant ST186SX (*) and in-frame insertion variant V963W964insL (†). FAN1 domain coordinates as published51,52. c, Damaging FAN1 variants are enriched in individuals with earlier-onset HD after accounting for CAG length. Age at motor onset against CAG length is plotted for the continuous phenotype group (n = 558), with population predicted age at onset for each repeat length shown with horizontal lines26. No median onset is shown for CAG lengths of 38 and 39 as they are incompletely penetrant. Individuals with a damaging FAN1 variant (CADD ≥ 20 or loss of function) are shown as black dots; those without one are shown as open circles. d, Three-dimensional model highlighting FAN1 variants selected for downstream study. Note that D960A (*) is a synthetic variant lacking nuclease activity not found in our patient population. NA, not applicable.