Extended Data Fig. 1: Quality control and annotation pipelines for HD exome sequencing data.

a, Quality control pipeline showing where and why sequencing samples were removed from the dataset. From an initial 785 sequenced exomes, including some samples re-sequenced due to initial low quality, 683 passed all quality control steps (465 from REGISTRY-HD, 218 from PREDICT-HD). Subgroups of this population were used in downstream analyses: a continuous group (N = 558) containing all individuals with a known age at motor onset and a dichotomous group (N = 637) containing all individuals with an extreme phenotype, either early or late actual or predicted onset of symptoms, or more or less severe motor or cognitive symptom scores. See also Fig. 1 and Supplementary Fig. 1. b, Annotation pipeline indicating the pathway, databases (gnomAD & dbSNFP) and tools used to annotate individual variants across exomes. Key: VEP, variant effect predictor tool. See also Supplementary Fig. 2.