Fig. 5: Clustering analysis identified subgroups with distinct comorbidity patterns in colorectal cancer patients (n = 3312) from the UW cohort.

a TSNE plot of patient vectors colored by cluster groups defined using a Gaussian mixture model (GMM) with optimal Bayesian Information Criteria (BIC). b Box plot showing the distribution of age of onset for individual CRC cluster groups. c Comorbidity pattern enrichment plot grouped by disease classes (in the x-axis) within each cluster group (represented by color). The y-axis indicates the log odds ratio of the comorbidity enrichment. Only statistically significant results are shown (p < 2e−5) after Bonferroni correction. Colored texts are used to highlight the top results within each cluster group. d Kaplan–Meier curve showing 10-year overall survival differences across individual cluster groups.