Table 1 Performance comparison between EasIFA and the baseline models in SwissProt E-RXN ASA test set^a

From: Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites

Methods	Note	Binary-classification (active site location annotation task)					Multi-classification^d (active site type annotation task)
Methods	Note	Precision	Recall	FPR	F1	MCC-bin	Recall (Binding)	FPR (Binding)	Recall (Catalytic)	FPR (Catalytic)	Recall (Other Site)	FPR (Other Site)	MCC-multi
EasIFA-ESM-bin^b	①	85.78%	79.03%	0.41%	79.15%	0.8010	na	na	na	na	na	na	na
EasIFA-SaProt-bin^c		83.87%	80.57%	0.55%	78.68%	0.7971	na	na	na	na	na	na	na
EasIFA-ESM-multi		85.65%	80.83%	0.48%	80.09%	0.8101	64.85%	0.48%	48.99%	0.02%	8.03%	0.01%	0.8029
EasIFA-ESM-multi	②	85.09%	81.77%	0.51%	80.56%	0.8139	68.47%	0.51%	36.44%	0.02%	7.12%	0.01%	0.8093
EasIFA-SaProt-multi	①	85.39%	80.05%	0.46%	78.85%	0.8006	64.35%	0.46%	48.78%	0.02%	7.77%	0.01%	0.7932
EasIFA-SaProt-multi	②	84.38%	80.96%	0.51%	78.97%	0.8012	67.93%	0.50%	36.47%	0.02%	7.20%	0.01%	0.7957
AEGAN	③	16.84%	56.73%	7.87%	22.15%	0.2449	na	na	50.81%	8.70%	na	na	na
AEGAN	②	16.82%	54.96%	7.73%	21.82%	0.2394:	na	na	36.17%	8.62%	na	na	na
BLASTp	①	64.97%	73.13%	1.21%	65.68%	0.6634	59.31%	1.12%	45.71%	0.07%	8.50%	0.03%	0.6618
BLASTp	④	72.57%	73.26%	0.76%	70.41%	0.7089	59.30%	0.71%	46.12%	0.04%	8.28%	0.02%	0.7073
Schrodinger-SiteMap	①	na	na	na	12.21%	0.1096	45.28%	20.69%	na	na	na	na	na

^aThe bold represents the best.
^bUse the ESM-2 for enzyme residue sequence representation.
^cUse the SaProt for enzyme residue sequence representation.
^dBinding: Consistent with the definition of “Binding Site” in UniProt, they are the amino acid residues that bind to substrates, products, and cofactors., Catalytic: Consistent with the “Active Site” as defined in UniProt, it refers to the residues that directly participate in catalysis., Other site: Consistent with the definition of “Site” in UniProt, Other interesting amino acid sites, such as the inhibitory sites of proteases.
Note:
① Use the training set of the SwissProt E-RXN ASA dataset as knowledge base and sequence alignment database, containing enzymes sequence and structural data of 44,341, and score on its test set, which includes 892 samples. (Empirical rule-based methods do not use this knowledge base).
② EasIFA utilizes the training set of the SwissProt E-RXN ASA dataset as knowledge base. AEGAN employs the model state reported in the literature. Both score on the test set of the SwissProt E-RXN ASA dataset, but the scoring does not consider the 225 samples in the test set that overlap with AEGAN’s training set, resulting in 667 samples in the test set.
③ AEGAN uses the model state reported in the literature to score on the test set of the SwissProt E-RXN ASA dataset, without removing the 225 samples overlapping with AEGAN’s training set, making a total of 892 samples in the test set.
④ Use the entire SwissProt as sequence alignment database, comprising 569,516 sequence samples. Employ all enzymes in SwissProt as a knowledge base, totaling 139,469 samples, and score on the SwissProt E-RXN ASA test set, which includes 892 samples.

Back to article page

Table 1 Performance comparison between EasIFA and the baseline models in SwissProt E-RXN ASA test set^a

Search

Quick links