Fig. 2: mRNABERT captures multi-level evolutionary homology information. | Nature Communications

Fig. 2: mRNABERT captures multi-level evolutionary homology information.

From: mRNABERT: advancing mRNA sequence design with a universal language model and comprehensive dataset

Fig. 2: mRNABERT captures multi-level evolutionary homology information.The alternative text for this image may have been generated using AI.

High-dimensional embeddings are projected into a two-dimensional space using t-SNE. Here, panels A and B depict the results of the mRNABERT model without contrastive learning, while the remaining four panels illustrate the results of the mRNABERT model. AC The vocabulary embeddings from the model. Each point represents a codon or nucleotide, with colors corresponding to the amino acids of the codons. BD Codons are then clustered based on amino acid properties. Codons encoding the same amino acid and those with similar biochemical properties tend to be spatially proximate. E Classification of different types of sequences, including lncRNA sequences that bear high similarity to mRNA and all regions of mRNA. F Species and sequence data were randomly sampled from the retained dataset, with each point representing a complete mRNA sequence. ARI Adjusted Rand Index, FMI Fowlkes-Mallows Index.

Back to article page