Fig. 5: Clinical variables descriptive of RNA subgroups and used for classification of new patients.

a Comparison of clinical variables deemed most important from our univariate feature selection model across subgroups I (n = 129), II (n = 112) and V (n = 89). Vertical centre line represents the median, top and bottom bounds of the box represent the first and third quartile, while the tips of the whiskers represent min and max values. b Clinical variables selected by ensemble feature selection from models predictive of each subgroup. Coefficients shown for each variable are from the most predictive support vector machine classifiers. c Selected clinical features are used to classify 197 IPAH patients from an independent validation cohort. d Kaplan–Meier survival curves per predicted subgroup in the validation cohort confirming the difference in survival outcomes between subgroups along with log-rank test p values. e Gene and clinical variable correlation network. Diamond nodes represent clinical variables drawn from the clinical signatures. Round nodes represent genes drawn from the gene signature generated by our LASSO model. Edges denoted Spearman rank correlation and have been thresholded to 0.25 and two-tailed test p value < 1.11 × 10−5. Specifically, corrBMI-ALAS2 = 1.27 × 10−11, corrBMI-PI3 = 3.17 × 10−6, corrBMI-IGHG2 = 4.13 × 10−6, corrBMI-RP11.678G14.3 = 8.22 × 10−6, corrBMI-IGKV1.27 = 9.32 × 10−6, corrBMI-IGKV2.24 = 3.09 × 10−6, corrBMI-IGKV4.1 = 9.55 × 10−7, corr6MWD-IGKV4.1 = 2.83 × 10−6, corr6MWD-IGKJ4 = 2.08 × 10−6, corr6MWD-ALAS2 = 7.52 × 10−10, corrAoD-IGHV2.5 = 3.72 × 10−10, corrAoD-IGLV2.8 = 1.06 × 10−9, corrAoD-IGHM = 6.2 × 10−8, corrAoD-NOG = 3.18 × 10−17, corrAoD-IGHV3.48 = 7.7 × 10−7, corrAoD-IGLV7.43 = 1.04 × 10−6, corrAoD-IGKV4.1 = 6.35 × 10−10, corrAoD-IGKV2.24 = 4.19 × 10−6, corrAoD-IGKV1.27 = 3.93 × 10−7, corrOxygenSat-NOG = 1.11 × 10−6.