Supplementary Figure 3: Logistic regression classifier guided gating for 49f CB cells that are highly proliferative in LTC.

(a) A receiver-operator curve for a logistic regression classifier trained to detect 49f CB cells with high proliferative potential is shown (created using the R library ‘ROCR’). Classifier variables were selected using a step-wise procedure to minimize information loss while minimizing the number of markers (using the R function ‘step’). (b) Threshold selection based on the F-measure (harmonic mean of precision and recall). A red point indicates the selected threshold. (c) Plot coordinates of all analyzed 49f CB cells over all combinations of the 3 markers (CD34, CD90, and CD10) included in the final model. Marker values represent the mean-scaled fluorescence values (channel Z-score). Colour coding indicates proliferative capacity as in Figure 2 (n=538). Blue circles indicate cells selected based on the logistic regression model. (d) Mean-scaled surface marker intensities for each input cell from the barcode tracking experiments (n=61) are plotted as a function of the total number of cells it produced at the time of sacrifice of the primary mice. Colour coding of clone lineage composition is indicated in the key in Figure 4. (e) PCA performed based on surface marker intensities (mean-scaled) with point sizes proportional to corresponding clone sizes. (f) Mapping of the barcoded clones to the t-SNE mass cytometric distribution for the 49f CB compartment. As in Figure 2c, nearest neighbours for all members of a given initial cell type were pooled and used to generate a probability density, indicated by colour intensity. The lowest level contains 95% of the total probability density with levels at each 10% mark thereafter. Point estimates for each input cell represent the median t-SNE value of their nearest neighbours in each t-SNE dimension. Point size is shown as proportional to the size of the clone it identifies assessed at the 30+ week post-transplant time of sacrifice of the primary mice. A contour showing the boundary containing 50% of the probability density of the highly proliferative input cells detected in the LTC assays (Figure 2c) is shown in black.