Fig. 3: Comparison of self-supervised methods for transfer learning tasks. | Communications Biology

Fig. 3: Comparison of self-supervised methods for transfer learning tasks.

From: A self-supervised deep learning method for data-efficient training in genomics

Fig. 3

Self-GenomeNet representations outperform other baseline methods, such as language models20 trained by predicting single nucleotides, 3-grams or 6-grams, Contrastive Predictive Coding21, and Contrastive-sc18, when pre-trained with the bacteria dataset and then fine-tuned for effective gene detection and bacteriophage classification tasks. We also provide an additional evaluation, where we train Self-GenomeNet on a wider range of datasets, which includes bacteria, virus and human data (generic Self-GenomeNet). This model achieves even higher performance compared to Self-GenomeNet, showing that a wider range of data improves the performance of Self-GenomeNet. The context and encoder model weights are initialized with training results from the SSL task, but are further trained (fine-tuned) on the new supervised task along with an additional linear layer on top. The label “Supervised” and “7-mer frequency profile” corresponds to the setting without any pre-training, where the weights are randomly initialized for the supervised task. Here, the first model is the same architecture used in SSL settings, which similarly takes the one-hot encoded sequences. The second model is the CNN model developed by Fiannaca et al. 31, and it uses a 7-mer frequency profile as input. a Overview of the dataset and tasks used for evaluation. b The class-balanced accuracy performance for the effector gene detection task, the bacteriophage detection task, and for the protozoa-fungi prediction task. This figure was created in part with BioRender.com. The phage icon was created by DBCLS and is licensed under a CC-BY 4.0 Unported license. Modifications were made. Original icon sourced from Bioicons. DBCLS | CC-BY 4.0 License.

Back to article page