Table 1 Performance comparison on CDS downstream tasks using Spearman’s correlation

From: mRNABERT: advancing mRNA sequence design with a universal language model and comprehensive dataset

 

Model

mRFP expression

Fungal expression

E. coli proteins

mRNA stability

Tc-riboswitch

SARS-CoV-2 vaccine degradation

Nucleotide-based

TextCNN

0.62

0.53

0.39

0.01

0.41

0.55

RNABERT

0.40

0.41

0.39

0.16

0.47

0.64

RNA-FM

0.80

0.59

0.43

0.34

0.58

0.74

UTRLM

0.68

0.48

0.19

0.23

0.19

0.78

RNAErnie

0.79

0.42

0.19

0.16

0.34

0.81

ERNIE-RNA

0.50

0.31

0.30

0.44

0.39

0.87

kmer-based

3UTRBERT

0.85

0.69

0.47

0.50

0.30

0.84

Codon-based

TF-IDF

0.57

0.68

0.44

0.54

0.49

0.69

TextCNN

0.78

0.76

0.36

0.26

0.43

0.80

Codon2vec

0.77

0.61

0.43

0.33

0.56

0.70

mRNAFM

0.88

0.78

0.49

0.48

0.26

0.85

CaLM

0.86

0.75

0.58

0.44

0.37

0.84

CodonBERT

0.88

0.89

0.57

0.35

0.48

0.78

Combined

mRNABERT

0.89

0.89

0.58

0.56

0.58

0.89

  1. The best-performing result for each task is indicated in bold.