Fig. 5: Comprehensive benchmark of RNA language models for secondary structure prediction across diverse datasets. | Nature Communications

Fig. 5: Comprehensive benchmark of RNA language models for secondary structure prediction across diverse datasets.

From: ERNIE-RNA: an RNA language model with structure-enhanced representations

Fig. 5

a Schematic of the unified evaluation framework. For all language models, token embeddings are extracted and fed into a common downstream network for fine-tuning. ERNIE-RNA uniquely offers its attention maps as an alternative feature and allows for zero-shot prediction directly from its pre-trained heads. be Violin plots comparing the F1 score distributions of various models on four benchmark datasets with increasing generalization difficulty: b the standard bpRNA-1m TS0 testset (sample size n = 1305); c the RIVAS TestSetB (n = 430); d the bpRNA-new dataset (n = 5388); and e, the RNA3DB-2D testset (n = 158). The red dashed line represents the performance of the best-performing traditional dynamic programming (DP) method on that specific dataset for reference. Within each violin plot, the white center line of the inner box indicates the median, the box represents the interquartile range (IQR, 25th and 75th percentiles), and the whiskers extend to 1.5 times the IQR. The red star marker indicates the mean F1-score.

Back to article page