Fig. 2: Classification of myeloid blasts using machine learning.

a Overview of nested cross validation for model optimization and comparison. b Model performance in BLAST110 training set (n = 110 individuals, n = 403 measurements). c Concordance between manual and predicted blast counts in BLAST110 training set (n = 110 individuals, n = 403 measurements). d GMMclf performance in LAIP29 test set (n = 29 individuals, n = 82 measurements). e Concordance between manual and predicted blast counts in LAIP29 test set (n = 29 individuals, n = 82 measurements). f Concordance of leukemic blast count in manual and GMMclf predicted blast compartment in LAIP29 test set (n = 29 individuals, n = 82 measurements). In annotated points (<50% LAIP+ cells conserved), LAIPs were expressed on blast phenotypes not accounted for during model training. g Positions of GMMclf components for blasts (K = 2, red) and non-blasts (K = 3, blue). h Predicted blasts (red) and non-blasts (blue) for a single sample. BLAST110, training cohort, LAIP29 test cohort, LR logistic regression, SVM support vector machine, RF random forest, LightGBM light gradient-boosting machine, FlowSOMclf FlowSOM classifier, GMMclf Gaussian mixture model classifier, LAIP leukemia-associated immunophenotype.