Table 1 Characteristics of Included Studies Exploring Machine Learning in Paediatric Haematological Malignancies.
Study | Cancer Type | Task/ Problem | Input Variable | Output Variable | ML Method | No. of Patients | Cross validation | External Validation | Highest AUC | Other Comparative Statical Scores Used |
|---|---|---|---|---|---|---|---|---|---|---|
Prognosis and Relapse/ Recurrence Studies | ||||||||||
He et al. 12 | AML | Prognostic prediction | Expression levels of pyroptosis-related genes | Risk score | LASSO | NA | Not specified | E-MTAB-1216 dataset | 0.893 | HR = 2.04 |
He et al. 13 | ALL | Survival prediction | Clinical characteristics, immunophenotype, genetic data | EFS predictive model | LASSO | 1693 | 10-fold CV | Not specified | 0.822 | C-index = 0.81 |
Cui et al. 14 | Leukaemia (sub-type not specified) | Survival prediction | Clinical characteristics | Survival predictive model | Bayesian inference | 17539 | Not specified | Not specified | C-index = 0.93 | |
Zheng et al. 15 | AML | Prognostic prediction | Gene expression data of m6A-related lncRNAs | OS predictive model | LASSO | 646 | Not specified | TCGA database | 0.685 | C-index = 0.82 |
Bohannan et al. 16 | ALL | Survival prediction | Genomic data | EFS predictive model | RF | 156 | Training (70%) and testing (30%) cohorts used | Not specified | 0.929 | HR = 5.41 C-index = 0.82 |
Gao et al. 17 | B-cell ALL | Survival prediction | Clinical characteristics | OS predictive model | LASSO | 1316 | C-index | TARGET database | 0.898 | C-index = 0.87 |
Lin et al. 18 | B-cell ALL | Relapse prediction | NAD+ metabolism-related genes | Relapse predictive model | RF | NA | Mentioned but not specified | Not specified | 0.8031 | |
Pan et al. 19 | ALL | Relapse prediction | Sociodemographic, clinical, immunological, and cytogenetic data | Relapse predictive model | RF, SVM, LR & DT | 570 | 10-fold CV | Independent test set of 84 patients | 0.904 | Accuracy = 82.9% |
Treatment Response Studies | ||||||||||
Gbadamosi et al. 20 | AML | GO-related response prediction | Genetic data | Treatment response outcome model | LASSO | 301 | 1000-fold CV | Not specified | OS = 0.676 HR = 0.565 | |
Pedreira et al. 21 | ALL | Treatment intensity decision support model | Clinical data | Treatment decision model | NN | 158 | Leave-one-out | Not specified | RHR = 98% ROR = 21% RSR = 0% | |
Gal et al. 22 | AML | Complete remission prediction | Gene expression data | Complete remission predictive model | K-NN, SVM & RF | 473 | 5-fold CV | Not specified | 0.840 | |
Kashef et al. 23 | ALL | Treatment prediction | Clinical data | Complete remission predictive model | GBM, RF, GLM | 241 | 5-fold CV | Not specified | 0.8725 | |
Kashef et al. 24 | ALL | Treatment outcome classification | Clinical characteristics & treatment related toxicity | Classification of treatment outcomes model | DT, SVM, RF,LDA,MLR, GBM, | 241 | 10-fold CV | Not specified | 0.870 | Accuracy = 94.9% |
Treatment Toxicity Studies | ||||||||||
Al-Fahad et al. 25 | ALL | Treatment toxicity prediction | MRI-derived information | Classification of cognitive abilities | LASSO | 200 | Training (80%) and testing (20%) cohorts used | Not specified | 0.870 | |
Ramalingam et al. 26 | ALL | Treatment toxicity prediction | Genotypes of SLC19A1, MTHFR, TYMS, and cytogenetic data | Methotrexate-related toxicities | MDR | 115 | 10-fold CV | Not specified | OR = 5.71 -2 Log Likelihood of Reduced Model = 97.104 | |
Zhan et al. 27 | B-cell ALL | Treatment toxicity prediction | SNPs in 16 genes, clinical characteristics and methotrexate delayed clearance | Predictive models for the risk of neutropenia and fever | RF with ADASYN SVM, DT | 139 | Training (70%) and testing (30%) cohorts used | Not specified | 0.927 | |
Tram et al. 28 | Lymphoma | Treatment toxicity prediction | CT image | Risk of treatment-related late effects | NN | 100 | 5-fold CV | Against human raters | Dice value = 0.988 HR = 3.1 | |
Theruvath et al. 29 | Lymphoma | Dosing prediction | PET/MRI scan data | Enhanced PET/MRI images | NN | 20 | Not specified | Against neural network called SubtlePET | 1 | K statistic = 1 |
Disease Susceptibility/ Diagnosis Studies | ||||||||||
Mahmood et al. 30 | ALL | Classifying risk factors | Clinical, genomic and socio-environmental data | Risk score for ALL | CART, RF, GBM & DT | 50 | 10-fold CV | Not specified | Accuracy = 99.83% | |
Kulis et al. 31 | B-Cell Precursor ALL | Classifying risk factors | Antigens measured through flow cytometry | Identification of specific genetic aberrations | GBM | 818 | 5-fold CV | Not specified | OR = 16.90 | |