Fig. 1: Phenotypic definition of the cohort, diagnostic yield, identification of predictive factors, and gene-burden analysis.
From: Loss-of-function variants in the CAPN1 activator CD99L2 cause X-linked spastic ataxia

a Phenotypic similarities between patients, encoded by their HPO terms, were visualised using uniform manifold approximation and projection (UMAP). The clinical phenotype space was first defined by all OMIM diseases using their HPO annotations (grey dots). Each patient analysed by exome sequencing (ES) or genome sequencing (GS) is shown as a coloured dot, with colours indicating movement disorders subgroups. Triangles denote patients carrying CD99L2 loss-of-function variants. b The proportion of individuals with firm diagnoses (likely pathogenic or pathogenic variants) was slightly higher in the first-line GS (GS first) cohort (25.2%, n = 214) compared to the first-line ES (ES first) cohort (21.96%, n = 1557). The lowest diagnostic yield was seen in the second-line GS cohort after prior ES and optional fragment analysis (FA) (10.5%, n = 200). c Distribution of variant types among clinically relevant likely pathogenic and pathogenic variants in the ES first cohort (n = 342), GS first cohort (n = 54), and GS with prior ES ± FA (n = 21). d Visualisation of least absolute shrinkage and selection operator (LASSO) coefficients across different lambda values. e LASSO coefficients at minimum lambda. f Manhattan plot of P values from the gene-burden analysis assessing associations between high-impact variants and movement disorders. The x-axis indicates chromosomal positions, and the y-axis shows the −log10(P) values for the two-tailed association test per gene (log scale). For each gene, the lowest P value among three tested models (dominant, recessive, and recessive plus CNVs) is shown. The red dotted line marks the significance threshold (P = 4.9 × 10−6, Bonferroni correction for 10,165 genes affected by high-impact variation in the MD cohort), the dark red line corresponds to P = 0.0025. Labels in light red indicate established disease genes associated with neurological phenotypes with P < 4.9 × 10−6, dark red labels genes with P < 0.0025. The blue label marks the newly prioritised disease gene CD99L2. Source data are provided as a Source Data file.