Fig. 3: Clustering analysis identified subgroups with distinct comorbidity patterns in colorectal cancer patients (n = 2837) from the eMERGE cohort.

a TSNE plot of patient vectors colored by cluster groups defined using a Gaussian mixture model (GMM) with optimal Bayesian information criteria (BIC). b Box plot showing the distribution of age of onset for individual CRC cluster groups. c Comorbidity pattern enrichment plot grouped by disease classes (in the x-axis) within each cluster group (represented by color). The y-axis indicates the log odds ratio of the comorbidity enrichment. Only statistically significant results are shown (p < 2e−5) after the Bonferroni correction. Colored texts are used to highlight the top results within each cluster group.