Table 3 Manual verification of vaccine candidates derived from automated extraction of names from published abstracts.

From: Compilation of parasitic immunogenic proteins from 30 years of published research using machine learning and natural language processing

Candidate

UniProt ID

H

PubMed ID

Year

Species

Prot

Circumsporozoite protein

A0A4V0KF74

Q03752

A0A077Y0S6

80

22252877

2011

Plasmodium. yoelii

Yes

Citrate synthase

False positive

27

8817831

1996

Plasmodium berghei

No

Microneme protein MIC3

B2D1U3

20

21632181

2011

Toxoplasma gondii

Yes

Liver stage antigen

Q25893

15

8609407

1996

Plasmodium falciparum

Yes

Rhoptry associated protein-1

A7AS21

12

12933845

2003

Babesia bovis

Noa

Putative kunitz-type protease inhibitor

A0A3Q0KUE0

A0A3Q0KFV5

A0A5K4F6V0

G4VEE0

A0A3Q0KN03

G4VBB1

G4VED8

1

31736947

2019

Schistosoma mansoni

Yes

Ribonuclease T2

Q6PYW1

1

28212670

2017

Schistosoma japonicum

Yes

Thioredoxin-like

A0A1N6LW58

1

29335000

2018

Babesia microti (strain RI)

No

Hydatid disease diagnostic antigen P-29

Q9U8G7

0

32908913

 

Echinococcus granulosus

Yes

Surface antigen 22

Q70CC3

0

33689009

2021

Eimeria tenella

Yes

  1. Candidate = a protein name taken from sheet [Unique Names] in Supplementary Table S4. This sheet lists 438 unique names from the 1099 representative names extracted from the 1776 positively classified abstracts. The 324 unique names without a warning are considered possible vaccine candidates. Candidates with the highest and lowest h-indexes were chosen from the list at regular row intervals for manual verification; UniProt ID. = UniProt ID(s) linked to candidate name (names are mapped via sheets [Representative Names] and [Clusters] in Supplementary Table S4). Multiple IDs per candidate are representatives from different clusters. UniProt IDs underlined exactly match to the protein identifier in the publication; H. = h-index of the protein names’ source publications; Year = year of publication; Species = the source species for the protein; Prot. = ‘Yes’ or ‘No’ whether the publication reports testing for protein immunogenicity in an animal model.
  2. aProtein reported in other publications as a possible vaccine candidate.