Fig. 3: Results for utilizing conformal prediction for enzyme function annotation, using a leading classification model. | Nature Communications

Fig. 3: Results for utilizing conformal prediction for enzyme function annotation, using a leading classification model.

From: Functional protein mining with conformal guarantees

Fig. 3: Results for utilizing conformal prediction for enzyme function annotation, using a leading classification model.

We compare the two methods for ''EC-calling” proposed by CLEAN39, \(\max -{\mathsf{sep}}\) (max-seperation) and \({\mathsf{p}}-{\mathsf{value}}\) p-value selection, against our conformal method. We report confidence intervals through violin plots for 10 random shuffles of the dataset to ensure coverage across New. A Violin plots of ROC-AUC for conformal, \({\mathsf{p}}-{\mathsf{value}}\), and \(\max -{\mathsf{sep}}\). B Violin plots of F1 for conformal, \({\mathsf{p}}-{\mathsf{value}}\), and \(\max -{\mathsf{sep}}\). C Returned EC annotations for a SARS-CoV-2 antibody. Conformal selection correctly identifies SARS-CoV-2 antigen-binding fragments (Fabs) as not enzymes, whereas \(\max -{\mathsf{sep}}\) and \({\mathsf{p}}-{\mathsf{value}}\) methods from CLEAN return possible enzyme annotations. DF Intuitive overview of selection methods for EC annotation using similarity scores. Here, D represents the cutoff threshold determined by p-values, which ranks query enzymes to each EC cluster center ECi amongst a background of random proteins from the training dataset. E displays our conformal distance-threshold, while F displays max-separation selection, which aims to select EC numbers that stand out from the other EC query-centroid distances. Here, the arrow describes the intuitive ''point of max separation” amongst all EC numbers and the query. Source data are provided as a Source Data file.

Back to article page