Table 1 Predictive performance comparison of multiple classification algorithms on the constructed dataset
Dataset | Methods | Accuracy | Precision | Recall | F1 Score | ROC–AUC | PR–AUC | MCC |
|---|---|---|---|---|---|---|---|---|
All | Ridge regression | 0.857 | 0.360 | 0.742 | 0.485 | 0.881 | 0.574 | 0.450 |
Balanced bagging | 0.886 | 0.419 | 0.650 | 0.510 | 0.840 | 0.493 | 0.463 | |
Linear SVC | 0.918 | 0.677 | 0.192 | 0.299 | 0.870 | 0.549 | 0.331 | |
Random forest | 0.939 | 0.800 | 0.433 | 0.562 | 0.888 | 0.633 | 0.561 | |
XGBoost | 0.949 | 0.791 | 0.600 | 0.683 | 0.885 | 0.692 | 0.663 | |
Bacteria | Ridge regression | 0.870 | 0.363 | 0.857 | 0.510 | 0.917 | 0.519 | 0.504 |
Balanced bagging | 0.891 | 0.396 | 0.714 | 0.509 | 0.892 | 0.548 | 0.479 | |
Linear SVC | 0.931 | 0.614 | 0.351 | 0.446 | 0.915 | 0.516 | 0.431 | |
Random forest | 0.939 | 0.674 | 0.429 | 0.524 | 0.927 | 0.657 | 0.507 | |
XGBoost | 0.946 | 0.688 | 0.571 | 0.624 | 0.922 | 0.660 | 0.598 | |
Eukaryota | Ridge regression | 0.849 | 0.326 | 0.737 | 0.452 | 0.844 | 0.475 | 0.422 |
Balanced bagging | 0.813 | 0.275 | 0.737 | 0.400 | 0.856 | 0.456 | 0.370 | |
Linear SVC | 0.920 | 0.556 | 0.263 | 0.357 | 0.855 | 0.433 | 0.346 | |
Random forest | 0.929 | 1.000 | 0.158 | 0.273 | 0.929 | 0.694 | 0.383 | |
XGBoost | 0.942 | 0.688 | 0.579 | 0.629 | 0.940 | 0.685 | 0.600 | |
Viruses | Ridge regression | 0.800 | 0.500 | 0.417 | 0.455 | 0.792 | 0.572 | 0.335 |
Balanced bagging | 0.808 | 0.526 | 0.417 | 0.465 | 0.716 | 0.453 | 0.354 | |
Linear SVC | 0.842 | 0.857 | 0.250 | 0.387 | 0.780 | 0.573 | 0.409 | |
Random forest | 0.850 | 0.800 | 0.333 | 0.471 | 0.794 | 0.611 | 0.452 | |
XGBoost | 0.867 | 0.786 | 0.458 | 0.579 | 0.764 | 0.583 | 0.532 |