Table 1 Characteristics of Included Studies Exploring Machine Learning in Paediatric Haematological Malignancies.

From: Machine learning in paediatric haematological malignancies: a systematic review of prognosis, toxicity and treatment response models

Study

Cancer Type

Task/ Problem

Input Variable

Output Variable

ML Method

No. of Patients

Cross validation

External Validation

Highest AUC

Other Comparative Statical Scores Used

Prognosis and Relapse/ Recurrence Studies

 He et al. 12

AML

Prognostic prediction

Expression levels of pyroptosis-related genes

Risk score

LASSO

NA

Not specified

E-MTAB-1216 dataset

0.893

HR = 2.04

 He et al. 13

ALL

Survival prediction

Clinical characteristics, immunophenotype, genetic data

EFS predictive model

LASSO

1693

10-fold CV

Not specified

0.822

C-index = 0.81

 Cui et al. 14

Leukaemia (sub-type not specified)

Survival prediction

Clinical characteristics

Survival predictive model

Bayesian inference

17539

Not specified

Not specified

 

C-index = 0.93

 Zheng et al. 15

AML

Prognostic prediction

Gene expression data of m6A-related lncRNAs

OS predictive model

LASSO

646

Not specified

TCGA database

0.685

C-index = 0.82

 Bohannan et al. 16

ALL

Survival prediction

Genomic data

EFS predictive model

RF

156

Training (70%) and testing (30%) cohorts used

Not specified

0.929

HR = 5.41 C-index = 0.82

 Gao et al. 17

B-cell ALL

Survival prediction

Clinical characteristics

OS predictive model

LASSO

1316

C-index

TARGET database

0.898

C-index = 0.87

 Lin et al. 18

B-cell ALL

Relapse prediction

NAD+ metabolism-related genes

Relapse predictive model

RF

NA

Mentioned but not specified

Not specified

0.8031

 

 Pan et al. 19

ALL

Relapse prediction

Sociodemographic, clinical, immunological, and cytogenetic data

Relapse predictive model

RF, SVM, LR & DT

570

10-fold CV

Independent test set of 84 patients

0.904

Accuracy = 82.9%

Treatment Response Studies

 Gbadamosi et al. 20

AML

GO-related response prediction

Genetic data

Treatment response outcome model

LASSO

301

1000-fold CV

Not specified

 

OS = 0.676

HR = 0.565

 Pedreira et al. 21

ALL

Treatment intensity decision support model

Clinical data

Treatment decision model

NN

158

Leave-one-out

Not specified

 

RHR = 98%

ROR = 21%

RSR = 0%

 Gal et al. 22

AML

Complete remission prediction

Gene expression data

Complete remission predictive model

K-NN, SVM & RF

473

5-fold CV

Not specified

0.840

 

 Kashef et al. 23

ALL

Treatment prediction

Clinical data

Complete remission predictive model

GBM, RF, GLM

241

5-fold CV

Not specified

0.8725

 

 Kashef et al. 24

ALL

Treatment outcome classification

Clinical characteristics & treatment related toxicity

Classification of treatment outcomes model

DT, SVM, RF,LDA,MLR, GBM,

241

10-fold CV

Not specified

0.870

Accuracy = 94.9%

Treatment Toxicity Studies

 Al-Fahad et al. 25

ALL

Treatment toxicity prediction

MRI-derived information

Classification of cognitive abilities

LASSO

200

Training (80%) and testing (20%) cohorts used

Not specified

0.870

 

 Ramalingam et al. 26

ALL

Treatment toxicity prediction

Genotypes of SLC19A1, MTHFR, TYMS, and cytogenetic data

Methotrexate-related toxicities

MDR

115

10-fold CV

Not specified

 

OR = 5.71

-2 Log Likelihood of Reduced Model = 97.104

 Zhan et al. 27

B-cell ALL

Treatment toxicity prediction

SNPs in 16 genes, clinical characteristics and methotrexate delayed clearance

Predictive models for the risk of neutropenia and fever

RF with ADASYN SVM, DT

139

Training (70%) and testing (30%) cohorts used

Not specified

0.927

 

 Tram et al. 28

Lymphoma

Treatment toxicity prediction

CT image

Risk of treatment-related late effects

NN

100

5-fold CV

Against human raters

 

Dice value = 0.988

HR = 3.1

 Theruvath et al. 29

Lymphoma

Dosing prediction

PET/MRI scan data

Enhanced PET/MRI images

NN

20

Not specified

Against neural network called SubtlePET

1

K statistic = 1

Disease Susceptibility/ Diagnosis Studies

 Mahmood et al. 30

ALL

Classifying risk factors

Clinical, genomic and socio-environmental data

Risk score for ALL

CART, RF, GBM & DT

50

10-fold CV

Not specified

 

Accuracy = 99.83%

 Kulis et al. 31

B-Cell Precursor ALL

Classifying risk factors

Antigens measured through flow cytometry

Identification of specific genetic aberrations

GBM

818

5-fold CV

Not specified

 

OR = 16.90

  1. ADASYN Adaptive Synthetic, ALL Acute Lymphoblastic Leukaemia, AML Acute Myeloid Leukaemia, AUC Area Under the Curve, CART Classification and Regression Tree, CV Cross-validation, DT Decision Tree, GBM Gradient Boosting Model, GLM Generalised Linear Model, GO gemtuzumab ozogamicin, HR Hazard Ratio, LncRNA Long noncoding RNA, K-NN k-nearest neighbour, LASSO Least Absolute Shrinkage and Selection Operator, LDA Linear Discriminant Analysis, LR Linear Regression, MDR Multifactor Dimensionality Reduction, NN Neural Network, OR Odds Ratio, RF Random Forest, RHR Rate of High Risk, ROR Rate of Overestimated Risk, RSR Rate of Subestimation Risk, SNP Single Nucleotide Polymorphism, SVM Support Vector Machine