Table 7 Subsets of training data ASTRAL + CullPDB.

From: Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method

Subset

Protein length L

Number of proteins

Number of amino acids

D1

(0, 100]

2260

161952

D2

(100, 200]

5256

774167

D3

(200, 300]

3548

877583

D4

(300, 400]

2382

822913

D5

(400, 500]

1170

519422

D6

(500, ∞)

1058

707309