Extended Data Fig. 3: JPLE outperforms alternative methods.

a, b, Comparison of the RNA-binding profile reconstructions generated by JPLE trained with 5-mer peptide features, to those generated by two linear regression (LR) models trained with deep learning features from TAPE: UniRep (a) and BERT (i.e., Transformer) (b). As in Extended Data Fig. 2, reconstruction PCCs are computed between the reconstructed (r*) and measured (r) RNA-binding profiles. c, Precision-recall curves for RNA- binding profile reconstructions generated by JPLE, AA SID, affinity regression (AR), and the nearest neighbor model (see Methods). Precision (y-axis) shows the mean PCC for reconstructions at least as confident as the threshold (top axes). JPLE confidence is e-dist to the nearest neighbor; AA SID confidence is % amino acid identity; AR confidence is one minus PCC between the test and nearest neighbor’s embedding; the nearest neighbor model confidence is e-dist to the nearest neighbor. At the AA SID threshold of 70%, a mean PCC of 0.75 is achieved (red line). The recall for all four methods at a mean PCC of 0.75 is indicated. Standard error is shown in the shaded area around each line. d, Distribution of the mean interface predicted aligned errors (PAEs) for RoseTTAFold2NA (RF2NA) and AlphaFold3 (AF3) predicted structures with high-affinity 7-mers (binding) and low-affinity 7-mers (nonbinding) for all 355 RBPs. e, ROC curves compare the performance of RF2NA, AF3, and JPLE in the task of differentiating high-affinity from low- affinity 7-mers. The predictions for RF2NA and AF3 are the mean interface PAEs (see d). The predictions for JPLE are the predicted z scores on held-out RBPs. Numbers contained within brackets in the legend display the AUROCs.