Table 4 Performance of 10 ML algorithms in the training and validation cohorts for early diagnosis of BC.

From: Machine learning model for early diagnosis of breast cancer based on PiRNA expression with CA153

 

Name

Optimal threshold

Accuracy

Positive Precision

Negative Precision

Positive Recall

Negative Recall

F1 score

kappa

AUC

AUC 95% CI

train

AdaBoost

0.49845

0.980769

0.972222

0.988095

0.985915493

0.976470588

0.98078

0.961271

0.999171

0.9968582964289002-1.0

train

ANN

0.517937

0.794872

0.867925

0.757282

0.647887324

0.917647059

0.789803

0.577594

0.812593

0.7392675981258746–0.8760783996022822

train

DT

0.5

0.788462

0.88

0.745283

0.61971831

0.929411765

0.781734

0.56284

0.788152

0.7245831383806066–0.8474177767426827

train

GBDT

0.464509

0.99359

1

0.988372

0.985915493

1

0.993586

0.98706

1

0.9999999999999999-1.0

train

KNN

0.5

0.769231

0.857143

0.728972

0.591549296

0.917647059

0.761298

0.52253

0.834963

0.768916320895988–0.8909322273028321

train

LGBM

0.400531

0.839744

0.838235

0.840909

0.802816901

0.870588235

0.839404

0.675757

0.91251

0.863754546957672–0.9525539685923516

train

LR

0.510033

0.75641

0.770492

0.747368

0.661971831

0.835294118

0.75395

0.503101

0.799171

0.7230585237547262–0.8629722618044392

train

RF

0.394845

0.814103

0.783784

0.841463

0.816901408

0.811764706

0.814356

0.626486

0.893538

0.8455313358116229–0.9405541361030084

train

SVM

0.5

0.769231

0.818182

0.742574

0.633802817

0.882352941

0.764504

0.525916

0.806628

0.7299192040598291–0.8709271904608475

train

XGBoost

0.68755

0.99359

1

0.988372

0.985915493

1

0.993586

0.98706

1

0.9999999999999999-1.0

Validation

AdaBoost

0.494679

0.691176

0.625

0.785714

0.806451613

0.594594595

0.68937

0.391823

0.737576

0.6026757097069598–0.8455445075757576

Validation

ANN

0.42376

0.75

0.769231

0.738095

0.64516129

0.837837838

0.746946

0.489399

0.759372

0.6181468229002831–0.8791060102688009

Validation

DT

0.883721

0.544118

0

0.544118

0

1

0.383473

0

0.686138

0.5806740723045071–0.7938135915958496

Validation

GBDT

0.710062

0.764706

0.857143

0.723404

0.580645161

0.918918919

0.756087

0.512981

0.841325

0.7338505747126437–0.9260123026252057

Validation

KNN

0.6

0.691176

0.8125

0.653846

0.419354839

0.918918919

0.667921

0.352087

0.771578

0.6635252157119272–0.8704733896072797

Validation

LGBM

0.386878

0.823529

0.827586

0.820513

0.774193548

0.864864865

0.82291

0.642419

0.839146

0.7289324325847764–0.9324775529614239

Validation

LR

0.519913

0.764706

0.826087

0.733333

0.612903226

0.891891892

0.758754

0.515583

0.804708

0.6856231279418057–0.905282741738066

Validation

RF

0.486437

0.794118

0.84

0.767442

0.677419355

0.891891892

0.790809

0.578388

0.841761

0.7331673385100805–0.9250475778546713

Validation

SVM

0.289992

0.735294

0.666667

0.827586

0.838709677

0.648648649

0.734377

0.47737

0.809939

0.6940746753246754–0.908764714600271

Validation

XGBoost

0.472235

0.794118

0.774194

0.810811

0.774193548

0.810810811

0.794118

0.585004

0.842197

0.7381517033690946–0.9308225108225108

  1. K-nearest Neighbor (KNN), logistic regression (LR), random forest (RF), decision tree (DT), artificial neural networks (ANN), support vector machine (SVM), gradient boosting decision tree (GBDT), light gradient boosting machine (LGBM), adaptive boosting (AdaBoost), extreme gradient boosting (XGboost).