Table 1 Performance comparison on CDS downstream tasks using Spearman’s correlation
From: mRNABERT: advancing mRNA sequence design with a universal language model and comprehensive dataset
Model | mRFP expression | Fungal expression | E. coli proteins | mRNA stability | Tc-riboswitch | SARS-CoV-2 vaccine degradation | |
|---|---|---|---|---|---|---|---|
Nucleotide-based | TextCNN | 0.62 | 0.53 | 0.39 | 0.01 | 0.41 | 0.55 |
RNABERT | 0.40 | 0.41 | 0.39 | 0.16 | 0.47 | 0.64 | |
RNA-FM | 0.80 | 0.59 | 0.43 | 0.34 | 0.58 | 0.74 | |
UTRLM | 0.68 | 0.48 | 0.19 | 0.23 | 0.19 | 0.78 | |
RNAErnie | 0.79 | 0.42 | 0.19 | 0.16 | 0.34 | 0.81 | |
ERNIE-RNA | 0.50 | 0.31 | 0.30 | 0.44 | 0.39 | 0.87 | |
kmer-based | 3UTRBERT | 0.85 | 0.69 | 0.47 | 0.50 | 0.30 | 0.84 |
Codon-based | TF-IDF | 0.57 | 0.68 | 0.44 | 0.54 | 0.49 | 0.69 |
TextCNN | 0.78 | 0.76 | 0.36 | 0.26 | 0.43 | 0.80 | |
Codon2vec | 0.77 | 0.61 | 0.43 | 0.33 | 0.56 | 0.70 | |
mRNAFM | 0.88 | 0.78 | 0.49 | 0.48 | 0.26 | 0.85 | |
CaLM | 0.86 | 0.75 | 0.58 | 0.44 | 0.37 | 0.84 | |
CodonBERT | 0.88 | 0.89 | 0.57 | 0.35 | 0.48 | 0.78 | |
Combined | mRNABERT | 0.89 | 0.89 | 0.58 | 0.56 | 0.58 | 0.89 |