Fig. 1: G4mer model overview and performance across multiple benchmarks. | Nature Communications

Fig. 1: G4mer model overview and performance across multiple benchmarks.

From: G4mer: An RNA language model for transcriptome-wide identification of G-quadruplexes and disease variants from population-scale genetic data

Fig. 1: G4mer model overview and performance across multiple benchmarks.The alternative text for this image may have been generated using AI.

a rG4 structures can be categorized into canonical as well as noncanonical subtypes such as long loop, bulges, and two-quartet. b G4mer is developed based on an RNA language model that was pre-trained on the entire human transcriptome and fine-tuned with experimentally detected rG4 sequences. c Comparison of transformer-based G4mer (red) and CNN-based rG4detector (blue) for rG4 binary prediction (top) and rG4 subtype multiclass prediction (bottom), evaluated by accuracy, ROC-AUC, and PR-AUC. d PR-AUC performance of G4mer (red) compared to rG4detector (blue), cGcC (purple), and G4Hunter (gray) across sequence lengths from the G4RNA database. Gray bars indicate the number of sequences per length bin; colored points show PR-AUC. e PR-AUC comparison across the top 5 experimental rG4 detection protocols. The top bar plot shows the number of sequences; the bottom plot shows PR-AUC values across models.

Back to article page