Fig. 4: Transcript isoform classification and splice junction analysis using basecalled reads from five tools across human cell lines(i.e., A549, HCT116, and NA12878) and Spike-In RNA Variant (SIRV).
From: A dual context-aware basecaller for nanopore direct RNA sequencing

a Radar plots comparing de novo splice junction detection performance using ESPRESSO across human cell lines (A549: n = 1,254,612 reads; HCT116: n = 987,488 reads; NA12878: n = 8,540,683 reads). Metrics include precision, recall, true positive rate (TPR), true negative rate (TNR), and mean perfect read coverage (MPRC) at splice junctions. All values are normalized to the best-performing value, shown in parentheses. b Bar plots showing the number of reads classified into transcript categories: FSM, ISM, SEX, NIC, NNC, and NCD. c Upset plot (bottom) showing the intersection size of annotated transcripts discovered by Illumina (using Salmon) and the specific basecallers indicated by linked filled dots below each bar in the A549 dataset. Intersection sets smaller than 300 elements are not displayed. The total number of annotated transcripts identified by different basecallers is shown in parentheses, with each method distinguished by a unique color. Strip plot (upper) showing the abundance distribution of each basecaller across different annotated transcript sets. d Upset plot (top-left panel) showing the intersection size of SIRV transcripts discovered by the specific basecallers indicated by linked filled dots below each bar. The total number of SIRV transcripts identified by different basecallers is shown in the left-hand horizontal bars. Scatter plots (remaining panels) showing the correlation between transcript expression estimates obtained from Coral/Guppy/RODAN/GCRTcall/Dorado reads and the known SIRV input concentrations. The analysis was performed on n = 68 SIRV transcripts. The solid line represents the linear regression fit, and the shaded band indicates the 95% confidence interval (CI) of the regression estimate. Pearson’s correlation coefficient (r) and Spearman’s rank correlation coefficient (ρ) are provided for each basecaller. The expression is quantified as abundance (read count). Source data are provided as a Source Data file.