Table 3 Model comparisons based on the five machine leaning techniques.

From: The major effects of health-related quality of life on 5-year survival prediction among lung cancer survivors: applications of machine learning

Feature set

Machine learning algorithm

Validation method

N folds

Training set size

Testing set size

Training accuracy

Testing accuracy

1

DT

Holdout sampling

 

1,140

286

0.668

0.703

DT

Cross-validation

5

912

286

0.625

0.692

LR

Holdout sampling

 

1,140

286

0.663

0.647

LR

Cross-validation

5

912

286

0.657

0.632

Bagging

Holdout sampling

 

1,140

286

0.680

0.710

Bagging

Cross-validation

5

912

286

0.655

0.706

RF

Holdout sampling

 

1,140

286

0.675

0.713

RF

Cross-validation

5

912

286

0.675

0.692

AdaBoost

Holdout sampling

 

1,140

286

0.668

0.696

Real AdaBoost

Cross-validation

5

912

286

0.642

0.713

2

DT

Holdout sampling

 

1,140

286

0.780

0.762

DT

Cross-validation

5

912

286

0.758

0.745

LR

Holdout sampling

 

1,140

286

0.791

0.746

LR

Cross-validation

5

912

286

0.814

0.825

Bagging

Holdout sampling

 

1,140

286

0.976

0.930

Bagging

Cross-validation

5

912

286

0.794

0.776

RF

Holdout sampling

 

1,140

286

0.949

0.916

RF

Cross-validation

5

912

286

0.918

0.941

AdaBoost

Holdout sampling

 

1,140

286

0.943

0.878

Real AdaBoost

Cross-validation

5

912

286

0.932

0.948

  1. DT, decision tree; RF, random forest; LR, logistic regression.
  2. Feature set 1 includes sociodemographic and clinical variables.
  3. Feature set 2 includes PRO variables and the variables included in feature set 1.