Table 1 Data statistics of training and testing datasets after the removal of homologous sequences using CD-HIT program.

From: Identification of subtypes of anticancer peptides based on sequential features and physicochemical properties

Sequence identity cut-off

Number of ACPs

Number of non-ACPs

Raw data

1354

2250

Sequence length > 10aa

1256

2250

Sequence identity < 90%

992

1980

Training dataset

800

1600

Independent testing dataset

192

380

  1. aa amino acid, ACPs anti-cancer peptides, non-ACPs non-anti-cancer peptides.