Figure 6

A column graph depicting the number of predicted characteristics in candidate proteins per phylum per genus (A); and a bar graph showing the number of publications associated with the candidates (B). Protein characteristics were predicted from 1099 representative sequences related to protein names extracted from 332,627 PubMed ‘title+abstract’ texts using the presented study’s pipeline. The 1099 proteins are considered here as potential vaccine candidates. Characteristics predicted are accessibility to the immune system by Vacceed, transmembrane (TM) domains by TMHMM, the presence of a signal peptide (SP) by signalP, and glycosylphosphatidylinositol (GPI) anchors by PredGPI. As an example of how to interpret the graphs, there are 1099 candidates of which 320 are proteins from the genus Plasmodium (a member of the Apicomplexa phylum)—257 of the 320 proteins are predicted to be naturally accessible to the immune system, 173 have at least one TM, 204 have SPs, and 76 GPI-anchors. The 320 candidates appear collectively in 4055 publications.