Fig. 4: Screening in plausible sequence subspaces.

a The number of remaining sequences in subfamily-A (SF-A), subfamily-B (SF-B), subfamily-C (SF-C), and subfamily-D (SF-D) after one-by-one step of stringent selection criteria, respectively. b–e Clustering within each subfamily and the selection of representative sequences. In the PCA plot, different colored dots represent different clusters, with the black-labeled dots indicating the position of the selected representative sequence. The surrounding structural diagrams depict the three-dimensional structure of each selected representative sequence: the inner part shows the detailed three-dimensional structure, while the outer semi-transparent part illustrates the peptide’s hydrophobicity distribution.