TO THE EDITOR:

European Leukemia Net (ELN) guidelines recommend flow cytometry for widely applicable (~90% of patients) measurable residual disease (MRD) assessment in acute myeloid leukemia (AML) [1]. This immunophenotypic MRD is assessed by manual gating (mgMRD), which identifies leukemic cells based on manual inspection of two-dimensional plots. However, this process is time-consuming (>30 min per sample) and requires extensive standardization for reproducibility, particularly outside specialized laboratories [2, 3]. The growing number of markers that can be evaluated together with spectral cytometry platforms exacerbates the analytical complexity, rendering mgMRD increasingly impractical [3].

To address these issues, we previously developed a fully automated (~3 s) computational MRD (cMRD) pipeline that can identify leukemic blasts in flow cytometry data using interpretable machine learning [4]. This algorithm first automatically detects healthy and leukemic blasts, after which a statistical model detects and enumerates the cells with aberrant marker expression (Fig. 1a). To assess the prognostic relevance of cMRD, we retrospectively analyzed the HOVON-SAKK-132 trial [5], which prospectively evaluated MRD-guided consolidation therapy using a standardized four-tube eight-color assay [6]. We included all flow MRD measurements on bone marrow (BM) from AML patients in remission after two cycles of induction chemotherapy, with measurements carried out at the central laboratory at Amsterdam UMC according to ELN guidelines [7]. If multiple MRD measurements were available from the same patient, the first measurement after completion of induction therapy was used.

Fig. 1: Prognostic value of computational MRD assessment (cMRD) in AML compared with manual gating MRD (mgMRD).
figure 1

a Schematical overview of cMRD pipeline. Each cell is sequentially processed by two Gaussian mixture models (GMM1, GMM2) that subset blasts and leukemic cells respectively. b Association between MRD% evaluated by mgMRD vs. cMRD. Kaplan Meier curves for OS (c) RFS (d), and the CIR curve (e). Hazard ratios were calculated by univariate Cox proportional hazards (OS, RFS) or by univariate proportional subdistribution hazards (CIR). P-values were derived from Wald tests based on regression coefficients. Confidence intervals are based on 2.5% and 97.5% percentiles. MRD measurable residual disease, AML acute myeloid leukemia, HR hazard ratio, OS overall survival, RFS relapse-free survival, CIR cumulative incidence of relapse.

We included 399 patients (Table S1) and determined the cMRD% for each patient using the pipeline. Leukemic burden (defined as % of white blood cells, WBCs) assessed by cMRD and mgMRD was significantly correlated (rs = 0.55, p < 0.001) (Fig. 1b). To investigate the independent prognostic value of cMRD% (independent of cut-offs), we performed a multivariable analysis adjusted for AML type, WBC count at diagnosis, ELN risk group (2017 and 2022), age, and a time-dependent covariable for consolidation treatment (Table S2S5). This analysis confirmed cMRD as an independent prognostic factor for overall survival (OS) and relapse-free survival (RFS).

To define cMRD-positivity, the maximally selected rank statistic was used to select a cut-off based on patient outcomes (RFS). We consistently identified two robust prognostic cMRD cut-offs (0.1% and 0.56%) in 1,000 different permutations of the cohort (Fig. S1, Table S6). Although the conventional 0.1% mgMRD cut-off was prognostic for cMRD (Fig. S2) the higher number of predicted leukemic cells in cMRD compared to mgMRD (Fig. 1b) led to a high number of cMRD+ patients (63.7%, 254/399) and lower concordance with mgMRD status (52.4%, 209/399). Elevated cMRD levels (Fig. 1b) may result from false positive misclassifications at the cell level. However, they can also reflect a technical difference: while the cMRD pipeline estimates the total leukemic burden, manual gating generally reports the percentage of the dominant leukemic population using two-dimensional analysis, which may underestimate the overall leukemic burden. Based on these clinical and technical considerations, we decided to classify patients as cMRD+ using the 0.56% cut-off. With this cut-off, 12.3% (49/399) of patients were cMRD+, compared to 17.5% (70/399) mgMRD+ patients using the 0.1% cut-off. mgMRD and cMRD status were concordant for 85.2% (340/399) of patients. OS, RFS, and cumulative incidence of relapse (CIR) were comparable between mgMRD and cMRD negative groups (Fig. 1c–e). cMRD-positivity was associated with shorter OS (HR (95% CI): 1.97 (1.16–2.53), p < 0.01; Fig. 1c) and RFS (HR (95% CI): 2.14 (1.29–3.01), p < 0.001; Fig. 1d) compared to cMRD-negativity. CIR was significantly higher for cMRD+ patients compared to cMRD- patients (sHR (95% CI): 1.85 (1.20–2.86), p < 0.01; Fig. 1e), whereas this effect was absent for mgMRD (sHR (95% CI): 1.22 (0.80–1.85), p = 0.35; Fig. 1e). Prognostic differences for cMRD in RFS were found in both intermediate and adverse ELN2017 (Fig. S3) and ELN2022 (Fig. S4) risk groups.

To understand the differences between manual and computational MRD assessment, we investigated the cases with discordant MRD status. All discordant cases were independently re-examined by two manual gating experts in two rounds. In the first round, leukemic cells were gated according to current standards and checked by the second operator. Both operators were blinded to previous analyses and MRD status. In the second round, cMRD output was added, allowing for a side-by-side comparison of manual and computational cell classifications.

The largest discordant-group (mgMRD+/cMRD-, n = 40) had a 5-year RFS of 54%, which was not significantly different from mgMRD-/cMRD- patients (5-year RFS: 56%; log-rank: p = 0.6; Fig. 2a–c). However, their RFS was significantly longer than mgMRD+/cMRD+ patients (5-year RFS: 27%; log-rank: p < 0.05), suggesting potential false-positive mgMRD results. This was confirmed by manual re-analysis, in which 10 out of 40 samples initially classified as mgMRD+ were now considered mgMRD- with current gating procedures. This difference originated from specific expression patterns (e.g., CD15, CD22) now recognized as transient phenotypes associated with BM regeneration. As the computational pipeline identified and modeled clusters with these cells in our regenerating BM reference cohort [4], the cMRD algorithm did not classify these cells as aberrant (Fig. 2d, e). Other discrepant cases resulting in cMRD-negativity originated from patients with mature phenotypes (CD34-CD117-) that were removed due to the initial selection of blasts in the cMRD pipeline (Fig. 1a) or corresponded with manual gating but did not exceed the 0.56% cMRD cut-off.

Fig. 2: Side-by-side comparison of computational MRD (cMRD) and manual gating MRD (mgMRD) analysis in AML.
figure 2

Kaplan Meier curves for OS (a) and RFS (b), and the CIR curve (c). Example of CD13+CD22+ (d) and CD13+CD15+ (e) populations associated with regenerating bone marrow that are ignored by cMRD. fh Leukemic populations identified by both mgMRD and cMRD. i Example of aberrant blasts identified by cMRD with debris scatter characteristics. MRD measurable residual disease, AML acute myeloid leukemia, HR hazard ratio, OS overall survival, RFS relapse-free survival, CIR cumulative incidence of relapse.

In the mgMRD-/cMRD+ group (n = 19), manual re-analysis identified novel populations absent at diagnosis, resulting in mgMRD-positivity in 9 out of 19 patients. In Fig. 2f–h, we provide three examples (CD13+CD56+, CD13+CD7+, CD13+CD33- blasts) identified both by manual and computational analysis. Although differences in RFS between mgMRD-/cMRD- (5-year RFS: 56%) and mgMRD-/cMRD+ (5-year RFS: 35%) did not exceed statistical significance (log-rank: p = 0.06), we previously reported the complementary value of emerging cell populations in MRD assessment [8]. In the remaining cases, we often observed non-malignant cells with low abundance in regenerating bone marrow that were incorrectly labeled as aberrant by cMRD, such as CD34-CD117dim lymphocyte precursors and CD117++ mast cells. However, some cMRD+ cases could also be explained based on technical limitations. In Fig. 2I we show an extreme case, in which the CD45dim blast compartment contained cells with aberrant expression (CD13+HLA-DR-) but were identified as cell debris based on scatter (SSClow/FSClow). Because scatter characteristics are not well standardized in flow cytometry data, we could not properly include these parameters in the modeling. For such samples, pre-analytical standardization and quality control is key for evaluating whether samples are fit for computational analysis.

Overall, our results show that cMRD delivers a fast (~3 s) and accurate MRD assessment with clinically relevant relapse associations. Using a quantitative approach to define leukemic phenotypes not only allows for eliminating inter-operator and inter-center variability but also provides utility in re-evaluating AML-MRD gating strategies. Although integration of cMRD into routine diagnostics requires future external, multi-center, and prospective validation to conform to regulatory requirements, the cMRD pipeline we developed was designed with clinical use in mind by avoiding “black box” methodology through robust statistical modeling. Moreover, given its minimal requirements of a small training set (n = 18), we envision that this approach is relatively easy to implement in other centers compared to previously proposed computational methods [3], avoiding common regulatory (e.g., data-sharing) and technical (e.g., batch-effect) difficulties. Consequently, the hurdles of implementing AML-MRD in clinical practice can be reduced.