Fig. 3: InPACT outperforms other methods on IPA identification and quantification.

a Motif enrichment of canonical polyA signal (AAUAAA) upstream (50 nt) on the IPA sites identified by InPACT, APAIQ and IPAFinder in different RNA-seq datasets by hypergeometric test of MEME Suite (two-sided). b, c The cumulative distribution curves of the distance between the true positive identified IPA sites and the ground truth, including A-seq (b) and 3P-seq (c). If an identified IPA site is located within 50 nt from the ground truth, it was regarded as true positive. All the datasets are from HEK293 cells. d The cumulative distribution cure of the distance between the true positive identified IPA sites and the ground truth (Iso-seq). This dataset is from human small airway epithelial cells. e, f Two examples of InPACT-identified IPA isoforms in human small airway epithelial cells. The plots show the read coverage of RNA-seq in the locus for HPS1 (e) and CDC23 (f), with the annotation assembled from long-read Iso-seq and InPACT-identified IPA isoforms. g, h The precision (g) and sensitivity (h) of InPACT, APAIQ and IPAFinder are evaluated for identifying IPA sites using simulated RNA-seq data with varying sequencing coverage levels ranging from 10X to 50X. The identified IPA site located within 50 nt to a predefined polyA site was classified as true positives (TP), while those were not classified as false positives (FP). Replicates were utilized for each coverage level (n = 5 random simulations). The precisions are presented as mean values \(+/-\) SD (g). The sensitivities are presented as box plots. The center lines denote the median values with the boxes are bounded by the 25th and 75th percentiles. The whiskers extend to the maximum and minimum values within 1.5 times the interquartile range (IQR) from each end of the box. i The cumulative distribution curve depicts error of the relative usage of IPA sites determined by InPACT, APAIQ and IPAFinder in the simulated RNA-seq data with a sequencing coverage level of 50X.