Table 1 Summary of existing ML-based models for thermophilic protein prediction.

From: A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides

Author (year)

Classifier a

Features b

Evaluation strategyc

Web server availabilityd

Zhang et al.31

PLS

AAC

5CV/IND

No

Zhang et al.32

LogitBoost

AAC

5CV/IND

No

Gromiha et al.27

NN

AAC

5CV/IND

No

Montanucci et al.21

SVM

AAC, DPC

5CV

Not accessible

Lin et al.20

SVM

AAC, GGAC

Jackknife

Yes

Wang et al.24

SVM

AAC, DPC, PCP, CTD

5CV

No

Nakariyakul et al.28

SVM

AAC, DPC

5CV/IND

No

Zuo et al.33

KNN

AAC

Jackknife

Not accessible

Wang et al.30

SVM

AAC, GGAC

5CV/IND

No

Fan et al.25

SVM

AAC, pka, PSSM

10CV/IND

No

Tang et al.29

SVM

k-mer

5CV

No

Feng et al.26

SVM

ACC, DPC, PCP,RAAC

10CV/IND

No

Charoenkwan et al. (this study)

SCM

DPS

10CV/IND

Yes

  1. aKNN k-nearest neighbor, NN neural networks, PLS partial least-square regression, SVM support vector machine.
  2. bAAC amino acid composition, CTD composition-transition-distribution, DPC dipeptide composition, DPS dipeptide propensity scores, GGAP g-gap dipeptide composition, k-mer fragment-based technique, pka acid dissociation constant, PCP physicochemical properties, PseACC pseudo amino acid composition, PSSM position specific scoring matrix, RACC reduce amino acid composition, TC tripeptide composition.
  3. c5CV fivefold cross-validation, 10CV tenfold cross-validation, jackknif jackknife cross-validation, IND independent test.
  4. dNot accessible: the webserver was not functional during the preparation of this manuscript.