Fig. 2: Downstream model performance of different unsupervised pre-training tasks and downstream training procedures in terms of top-L precision and global Matthews correlation coefficient on an independent test set.
From: RNA contact prediction by data efficient deep learning

The red line shows the DCA baseline performance for PPV21,24, the orange line the shallow neural network CoCoNet25, and the dotted blue line the best-trained model performance. The square marker shows respective score averaged over several models trained with different early stopping metrics. Early stopping is performed using a small holdout set from the training dataset. The error bars show the best and worst score. For fine-tuned XGBoost we split top-L and global metrics used for backbone fine-tuning. One can directly observe an improvement of both PPV and MCC over the baseline in our approach.