Table 1 Predictive performance comparison of multiple classification algorithms on the constructed dataset

From: Integrating protein language and geometric deep learning models for enhanced vaccine antigen prediction

Dataset

Methods

Accuracy

Precision

Recall

F1 Score

ROC–AUC

PR–AUC

MCC

All

Ridge regression

0.857

0.360

0.742

0.485

0.881

0.574

0.450

Balanced bagging

0.886

0.419

0.650

0.510

0.840

0.493

0.463

Linear SVC

0.918

0.677

0.192

0.299

0.870

0.549

0.331

Random forest

0.939

0.800

0.433

0.562

0.888

0.633

0.561

XGBoost

0.949

0.791

0.600

0.683

0.885

0.692

0.663

Bacteria

Ridge regression

0.870

0.363

0.857

0.510

0.917

0.519

0.504

Balanced bagging

0.891

0.396

0.714

0.509

0.892

0.548

0.479

Linear SVC

0.931

0.614

0.351

0.446

0.915

0.516

0.431

Random forest

0.939

0.674

0.429

0.524

0.927

0.657

0.507

XGBoost

0.946

0.688

0.571

0.624

0.922

0.660

0.598

Eukaryota

Ridge regression

0.849

0.326

0.737

0.452

0.844

0.475

0.422

Balanced bagging

0.813

0.275

0.737

0.400

0.856

0.456

0.370

Linear SVC

0.920

0.556

0.263

0.357

0.855

0.433

0.346

Random forest

0.929

1.000

0.158

0.273

0.929

0.694

0.383

XGBoost

0.942

0.688

0.579

0.629

0.940

0.685

0.600

Viruses

Ridge regression

0.800

0.500

0.417

0.455

0.792

0.572

0.335

Balanced bagging

0.808

0.526

0.417

0.465

0.716

0.453

0.354

Linear SVC

0.842

0.857

0.250

0.387

0.780

0.573

0.409

Random forest

0.850

0.800

0.333

0.471

0.794

0.611

0.452

XGBoost

0.867

0.786

0.458

0.579

0.764

0.583

0.532

  1. The highest value for each metric is highlighted in bold.