Extended Data Fig. 2: Benchmarking gene expression deconvolution approaches for AML.

A) Pearson correlation between observed relative abundance of 14 cell types from scRNA-seq and predicted abundance of each cell type from deconvolution of matched bulk RNA-seq data, analyzed by patient. Gene expression deconvolution using CIBERSORTx (S-mode or No Batch Correction), DWLS, Bisque, or MuSIC (direct or recursive) were benchmarked across these samples. B) scRNA-seq of 1389 ProMono-like cells across 10 patients, demonstrating separation between AML556 and other patients. C) Pearson correlation depicting deconvolution performance of CIBERSORTx for AML556’s bulk RNA-seq profile using patient-specific reference signatures derived from scRNA-seq data from AML556. D-E) Correlation between deconvolution and clinical flow cytometry for 7 AML patients from the Toronto PMH cohort. Deconvolution using scRNA-seq reference profiles was performed on RNA-seq data and matched with clinical flow cytometry data, both obtained from peripheral blood. D) Pearson correlation between total mature myeloid abundance (ProMono-like + Mono-like + cDC-like) from deconvolution with pan-myeloid surface marker CD64. E) Pearson correlation between mono-like abundance from deconvolution with monocyte-specific surface marker CD14. F) Dendrogram depicting associations between leukemic cell-types across scRNA-seq samples from 12 diagnostic AML patients. G) Observed associations between leukemic cell types from deconvolution analysis of 173 patients within the TCGA cohort, depicted for each deconvolution tool. MuSIC Direct was excluded due to multiple cell types having a detection rate of zero in bulk RNA-seq. H-I) Correlation between observed transcriptomic profiles and synthetic transcriptomic profiles reconstructed based on predicted cell-type abundance from CIBERSORTx. Higher correlation suggests greater deconvolution confidence. Box plots indicate the range of the central 50% of the data, with the central line marking the median. Whiskers extend from each box to 1.5*(interquartile range). Comparisons were performed through two-sided Wilcoxon signed-rank tests. These correlations are depicted for H) Deconvolution of 864 patients across three AML cohorts using reference signatures from leukemic populations compared to deconvolution with reference signatures from matched healthy populations, and I) RNA-seq compared to microarray from 158 matched TCGA patient samples. Prior to deconvolution, microarray data was normalized through either chip-based (RMA) or single-sample (SCAN) normalization approaches. J) Pearson correlation of estimated LSPC abundances between RNA-seq deconvolution and Microarray deconvolution, normalized with either RMA or SCAN, among 158 matched patient samples from the TCGA cohort.