Extended Data Fig. 3: Comparison of new and prior leukemia stem and progenitor cell annotations for discerning biological phenotypes. | Nature Medicine

Extended Data Fig. 3: Comparison of new and prior leukemia stem and progenitor cell annotations for discerning biological phenotypes.

From: A cellular hierarchy framework for understanding heterogeneity and predicting drug response in acute myeloid leukemia

Extended Data Fig. 3

A) Workflow to compare prior (HSC-like and Prog-like) annotations and new (Quiescent LSPC, Primed LSPC, and Cycling LSPC) annotations with regard to their utility in predicting important biological phenotypes in AML. This was measured through the performance of logistic regression and random forest models trained on the relative abundance of these populations. Models were trained using nested cross-validation wherein samples were subject to a 5-fold split, in which 80% of samples (white boxes) were used to train a model and 20% of samples (orange boxes) were used to evaluate the model. Within each training set, hyperparameter optimization was performed by grid search with 5-fold internal cross validation. The model AUC from each outer cross-validation split was averaged to estimate overall classifier performance. This nested cross-validation process was repeated over 1000 iterations, with samples being shuffled between each iteration, to generate a distribution of AUC metrics. B-F) Model performance for predicting key biological and clinical phenotypes from either HSC-like and Prog-like abundance or Quiescent, Primed, and Cycling LSPC abundance. Performance metrics are paired by iteration, wherein sample order and cross validation splits were identical for each model. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5*(interquartile range). Statistical significance was evaluated through a two-sided Wilcoxon signed-rank test. B) Prediction of functional LSC activity measured by leukemic engraftment from 72 LSC+ fractions and 38 LSC- fractions. C) Prediction of overall survival in the TCGA and BEAT-AML cohorts, evaluated through the likelihood ratio statistic from stratified cox regression (combined n = 454). In this case LASSO and Ridge regression was performed and these models were trained on splits of the TCGA and BEAT-AML cohorts, stratified by cohort. The repeated nested cross validation approach remained the same. D) Prediction of diagnosis vs relapse status from 44 relapsed and 44 diagnostic samples. E) Prediction of Adverse cytogenetic status in TCGA from 37 patients with Adverse cytogenetics and 131 patients with Intermediate or Favorable cytogenetics. F) Prediction of Adverse cytogenetic status in BEAT-AML from 53 patients with Adverse cytogenetics and 175 patients with Intermediate or Favorable cytogenetics.

Back to article page