Fig. 3: Post-machine learning validation using the proportions and expressed molecules of 6 cell populations. | Nature Communications

Fig. 3: Post-machine learning validation using the proportions and expressed molecules of 6 cell populations.

From: Reading the immune clock: a machine learning model predicts mouse immune age from cellular patterns

Fig. 3

a Summary of the learning process using ML algorithms. Created using icons sourced from Flaticon (www.flaticon.com), distributed under Flaticon’s Free License (https://www.flaticon.com/legal). b Using 3D PCA, we evaluated whether molecular expression levels in each cell showed age-related heterogeneity. Each “X” represents one mouse, and the colored surfaces indicate clustering by age group. Red arrows represent the molecular loadings driving PC separation. c, d Validation of ML results using cell pattern changes. Red dots represent test samples not used during training. Inset plots for cDC1, cDC2, and macrophages display zoomed-in views due to their lower abundance and narrower expression ranges, allowing clearer visualization of predicted trends and confidence intervals. e Cross-validation results using 5-fold cross-validation for performance evaluation of the multivariate support vector regression ML model. The x- and y-axes of the graph represent mean squared error and the types of immune-related target molecules, respectively (n = 5 per cell type). c, f, h, j, l, n, p Estimation of confidence intervals of predicted values through bootstrapping of an established multivariate SVR ML model and overlapping results of observations in the test set. The x- and y-axes of the graph represent age and the composition ratio of identifiable immune cell groups within the population, respectively. To simultaneously compare the overall compositional proportions within a population, the y-axes of all graphs of cell composition proportions were fixed to the same scale. d, g, i, k, m, o, q Analysis of the change pattern of predicted values using the established multivariate support vector regression ML model and overlapping results of observations in all data sets. The x- and y-axes of the graph represent age and the composition ratio of identifiable immune cell groups within the population, respectively. The red solid lines in the graph show the mean predicted patterns of the trained model. Cell frequencies were expressed as percentages based on 50,000 CD45⁺ cells. In box plots, the centre line represents the median, box limits represent the first and third quartiles, and whiskers extend to data points within 1.5× the interquartile range.

Back to article page