Fig. 3: TockyKmeansRF: Combinatorial analysis using clustering and random forest.
From: Machine learning-assisted decoding of temporal transcriptional dynamics via fluorescent timer

a Schematic overview of the TockyKmeansRF framework combining k-means clustering and Random Forest (RF) classification to model Timer fluorescence in flow cytometry data. b Training and test datasets generated from lymph node samples of WT and CNS2 KO Foxp3-Tocky mice. c Area Under the Curve (AUC) analysis of model performance across varying numbers of clusters (top) and RF trees (bottom). d Visualisation of CNS2-dependent feature clusters in the test dataset. Timer Angle–Intensity and original Timer fluorescence spaces are shown. Top: feature importance by Mean Decrease Gini (MDG); bottom: feature cells defined as top 60th percentile by MDG. e Density-based clustering of feature cells. f Violin plots showing kernel density estimates of the percentage of cells in each cluster per sample (n = 22 KO, 27 WT). Each point represents a biological replicate. The box indicates the interquartile range (IQR; 25th–75th percentile), the centre line denotes the median, and whiskers extend to the most extreme values within 1.5× IQR; outliers beyond this are plotted. Samples lacking cells in Cluster 3 (KO) were excluded from that cluster’s plot. g Violin plots showing mean fluorescence intensity (MFI) of CD25, CD44, PD-1, and CD69 for two identified clusters and remaining Timer+ cells (“others”) in WT samples (n = 27). Statistical analysis used the Kruskal-Wallis test followed by Dunn’s test with Bonferroni correction. Timer-negative cells were included only as a baseline reference and excluded from statistical testing. The box shows the IQR, the centre line indicates the median, and whiskers extend to the most extreme values within 1.5× IQR. Exact p-values are provided in Supplementary Data 1. h Computational performance of TockyKmeansRF, showing runtime and memory usage with progressively increased CNS2 KO training data.