Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

StringTie3 improves total RNA-seq assembly by resolving nascent and mature transcripts

Abstract

Accurate assembly of rRNA-depleted (total) RNA sequencing (RNA-seq) remains challenging because existing methods often conflate incomplete, nascent RNA with fully processed mature isoforms, leading to misassemblies and quantification errors. Here, we present StringTie3, a major update to the widely used StringTie assembler, specifically designed for total RNA-seq. StringTie3 introduces a nascent mode that models co-transcriptional splicing to separate nascent from mature transcripts, and a refined long-read module that distinguishes genuine polyadenylation sites from poly(A)-priming artifacts. Across short-, long- and hybrid-read datasets, StringTie3 substantially reduces assembly errors and outperforms existing tools. In Argonaute knockout experiments, nascent-mode analysis reveals that single knockouts predominantly alter nascent transcripts while leaving mature RNA largely unchanged, whereas double or triple knockouts disrupt both fractions. In breast cancer samples, certain extracellular matrix and tumor suppressor genes show discordant nascent and mature expression, suggesting posttranscriptional regulation. StringTie3 provides a framework for investigating transcriptional and posttranscriptional processes in total RNA-seq data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Comparison of poly(A)+ selection and rRNA depletion reveals misassemblies arising from nascent transcripts.
The alternative text for this image may have been generated using AI.
Fig. 2: Modeling partially spliced nascent transcripts and mature isoforms within StringTie3’s splicing graph.
The alternative text for this image may have been generated using AI.
Fig. 3: Comparison of StringTie3 (nascent mode), StringTie2 and Scallop2 in rRNA-depleted RNA-seq evaluated against the CHESS 3.0 annotation.
The alternative text for this image may have been generated using AI.
Fig. 4: Precision, sensitivity and coverage differences for StringTie2 versus StringTie3 (nascent mode) in poly(A)+ and rRNA-depleted libraries.
The alternative text for this image may have been generated using AI.
Fig. 5: Long-read assembly performance of StringTie3, StringTie2, IsoQuant and Bambu in annotation-free mode.
The alternative text for this image may have been generated using AI.
Fig. 6: Argonaute knockouts reveal transcriptional versus posttranscriptional regulation.
The alternative text for this image may have been generated using AI.

Data availability

The RNA-seq data for DLPFC poly(A) and DLPFC RiboZ libraries are available through the Lieber Institute for Brain Development at http://eqtl.brainseq.org/phase2/ and http://eqtl.brainseq.org/phase1/, respectively. The neuron differentiation long and short-read RNA-seq data are accessible in the Gene Expression Omnibus (GEO) under accession number GSE245325, and the breast cancer RNA-seq dataset is available in the GEO under accession number GSE103001.

The ENCODE Consortium datasets used in the LRGASP challenges for human WTC‑11 and H1‑mix samples across ONT dRNA, PacBio cDNA and Illumina cDNA are available from ENCODE under accessions ENCSR392BGY, ENCSR673UKZ, ENCSR507JOF, ENCSR967FTZ, ENCSR154RVC and ENCSR731MFY. LRGASP sample and file accessions used in this study (including ENCSR and ENCFF identifiers for WTC‑11 and H1‑mix) are provided in Supplementary Data 13. Illumina short‑read files from these datasets were used only to derive the splice‑junction BED file supplied to minimap2. The Argonaute dataset is available under accession GSE146688. VASA-seq data are available under accession GSE176588; our analyses used only the Mus musculus runs (n = 45).

Code availability

StringTie3 is implemented in C++ and is freely available as open-source software under the MIT license at https://github.com/gpertea/stringtie/ and is archived at Zenodo (https://doi.org/10.5281/zenodo.17604767)52. Additional instructions for running nascent mode, including parameters and command-line flags, can be found in the project documentation.

References

  1. Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Yao, L. et al. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat. Biotechnol. 40, 1056–1065 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Chu, T. et al. Chromatin run-on and sequencing maps the transcriptional regulatory landscape of glioblastoma multiforme. Nat. Genet. 50, 1553–1564 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Gaidatzis, D., Burger, L., Florescu, M. & Stadler, M. B. Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nat. Biotechnol. 33, 722–729 (2015).

    Article  CAS  PubMed  Google Scholar 

  5. Ameur, A. et al. Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat. Struct. Mol. Biol. 18, 1435–1440 (2011).

    Article  CAS  PubMed  Google Scholar 

  6. Zhao, S., Zhang, Y., Gamini, R., Zhang, B. & von Schack, D. Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion. Sci. Rep. 8, 4781–4812 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Adiconis, X. et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Choquet, K. et al. Pre-mRNA splicing order is predetermined and maintains splicing fidelity across multi-intronic transcripts. Nat. Struct. Mol. Biol. 30, 1064–1076 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Svoboda, M., Frost, H. R. & Bosco, G. Internal oligo(dT) priming introduces systematic bias in bulk and single-cell RNA sequencing count data. NAR Genom. Bioinform. 4, lqac035 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Viscardi, M. J. & Arribere, J. A. Poly(a) selection introduces bias and undue noise in direct RNA-sequencing. BMC Genomics 23, 530 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Shumate, A., Wong, B., Pertea, G. & Pertea, M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput. Biol. 18, e1009730 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zhang, Q., Shi, Q. & Shao, M. Accurate assembly of multi-end RNA-seq data with Scallop2. Nat. Comput. Sci. 2, 148–152 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Collado-Torres, L. et al. Regional heterogeneity in gene expression, regulation, and coherence in the frontal cortex and hippocampus across development and schizophrenia. Neuron 103, 203–216 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Jaffe, A. E. et al. Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis. Nat. Neurosci. 21, 1117–1125 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Wenric, S. et al. Transcriptome-wide analysis of natural antisense transcripts shows their potential role in breast cancer. Sci. Rep. 7, 17452 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Ulicevic, J. et al. Uncovering the dynamics and consequences of RNA isoform changes during neuronal differentiation. Mol. Syst. Biol. 20, 767–798 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Varabyou, A. et al. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol. 24, 249–316 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Nido, G. S. et al. Common gene expression signatures in Parkinson’s disease are driven by changes in cell composition. Acta Neuropathol. Commun. 8, 55 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Pardo-Palacios, F. J. et al. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nat. Methods 21, 1349–1363 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Prjibelski, A. D. et al. Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 41, 915–918 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Chen, Y. et al. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nat. Methods 20, 1187–1195 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Chu, Y. et al. Argonaute binding within 3′-untranslated regions poorly predicts gene repression. Nucleic Acids Res. 48, 7439–7453 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Cernilogar, F. M. et al. Chromatin-associated RNAi components contribute to transcriptional regulation in Drosophila. Nature 480, 391–395 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Zaytseva, O. et al. Transcriptional repression of Myc underlies the tumour suppressor function of AGO1 in Drosophila. Development 147, dev190231 (2020).

  27. Huang, V. et al. Ago1 interacts with RNA polymerase II and binds to the promoters of actively transcribed genes in human cancer cells. PLoS Genet. 9, e1003821 (2013).

  28. Mayr, C., Hemann, M. T. & Bartel, D. P. Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 315, 1576–1579 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Dang, C. MYC on the path to cancer. Cell 149, 22–35 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Matsui, M. et al. Activation of LDL receptor expression by small RNAs complementary to a noncoding transcript that overlaps the LDLR promoter. Chem. Biol. 17, 1344–1355 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wagschal, A. et al. Genome-wide identification of microRNAs regulating cholesterol and triglyceride homeostasis. Nat. Med. 21, 1290–1297 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Su, H., Trombly, M. I., Chen, J. & Wang, X. Essential and overlapping functions for mammalian Argonautes in microRNA silencing. Genes Dev. 23, 304–317 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. De Martino, D. & Bravo-Cordero, J. J. Collagens in cancer: structural regulators and guardians of cancer progression. Cancer Res. 83, 1386–1392 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Chen, X. et al. COL5A1 promotes triple-negative breast cancer progression by activating tumor cell-macrophage crosstalk. Oncogene 43, 1742–1756 (2024).

    Article  CAS  PubMed  Google Scholar 

  35. Shi, Y. et al. Reduced expression of METTL3 promotes metastasis of triple-negative breast cancer by m6A methylation-mediated COL3A1 up-regulation. Front. Oncol. 10, 1126 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Kwon, J. J., Factora, T. D., Dey, S. & Kota, J. A Systematic review of miR-29 in cancer. Mol. Ther. Oncolytics 12, 173–194 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Zhu, J. et al. Chaperone Hsp47 drives malignant growth and invasion by modulating an ECM gene network. Cancer Res. 75, 1580–1591 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Wang, Y., Mei, X., Song, W., Wang, C. & Qiu, X. LncRNA LINC00511 promotes COL1A1-mediated proliferation and metastasis by sponging miR-126-5p/miR-218-5p in lung adenocarcinoma. BMC Pulm. Med. 22, 272 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Wang, Y. et al. MiR-410 is overexpressed in liver and colorectal tumors and enhances tumor cell growth by silencing FHL1 via a direct/indirect mechanism. PLoS ONE 9, e108708 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Wang, J. et al. lncRNA ZNRD1-AS1 promotes malignant lung cell proliferation, migration, and angiogenesis via the miR-942/TNS1 axis and is positively regulated by the m6A reader YTHDC2. Mol. Cancer 21, 229 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Pool, A., Poldsam, H., Chen, S., Thomson, M. & Oka, Y. Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references. Nat. Methods 20, 1506–1515 (2023).

    Article  CAS  PubMed  Google Scholar 

  42. Salmen, F. et al. High-throughput total RNA sequencing in single cells using VASA-seq. Nat. Biotechnol. 40, 1780–1793 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Shinder, I. & Pertea, M. Filtered CHESS 3.0.1 human transcript annotation for StringTie3 benchmarking. Zenodo https://doi.org/10.5281/zenodo.18223655 (2026).

  45. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Shinder, I., Hu, R., Ji, H. J., Chao, K. & Pertea, M. EASTR: identifying and eliminating systematic alignment errors in multi-exon genes. Nat. Commun. 14, 7223 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res. 9, 304 (2020).

    Article  Google Scholar 

  50. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  CAS  PubMed  Google Scholar 

  51. Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Pertea, G., Shinder, I. & Pertea, M. StringTie 3.0.3. Zenodo https://doi.org/10.5281/zenodo.17604767 (2025).

Download references

Acknowledgements

This work was supported in part by National Science Foundation grant DBI-2412449 (to M.P.) and National Institutes of Health grants R01-MH123567 (to M.P.) and R35-GM156470 (to M.P.).

Author information

Authors and Affiliations

Authors

Contributions

I.S. conceived the study, designed and contributed to the implementation of the software, conducted computational analyses, analyzed and interpreted the results, and wrote the manuscript. Z.R. and R.H. aided in the analysis of results. G.P. assisted with software development. M.P. conceived the study, contributed to software, assisted in writing and editing the manuscript, and supervised the entire project. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Ida Shinder or Mihaela Pertea.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Adam Ameur, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Lei Tang and Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Annotation‑guided long‑read assembly performance across three modalities.

Per‑replicate scatter plots of precision (%) (x‑axis) versus the number of reference transcripts correctly reconstructed by each assembler (y‑axis; ×103) for assemblies run in annotation‑guided mode against the CHESS 3.0.1 reference. Rows show (top to bottom): neuron‑differentiation ONT cDNA (days 0/3/5; three biological replicates each), LRGASP ONT dRNA (H1‑mix and WTC11; three replicates each), and LRGASP PacBio cDNA (H1‑mix and WTC11; three replicates each). Colors denote methods (StringTie2, StringTie3, StringTie3 nascent mode, IsoQuant, Bambu); shapes indicate replicate identity (see panel legends). “X” marks the per‑method mean across replicates.

Extended Data Fig. 2 SIRV only assembly accuracy.

Scatter plots show SIRV-only precision (x-axis) versus matching transcripts (TPs) or the number of correctly assembled SIRV transcripts (y-axis) in annotation free (left column) and annotation guided (right column) modes. Each point represents one replicate. In the guided panels, CHESS was provided to each method but SIRV models were withheld to test generalization to novel transcript discovery beyond the guidance set; metrics in all four panels are computed only on SIRVs.

Extended Data Fig. 3 Transcript length profiles in assemblies from short-read, long-read, and hybrid data of matched samples.

A. Reference transcripts (CHESS 3.0.1) were binned by exonic length (x-axis). For each modality—short‑read rRNA‑depleted, poly(A)‑selected long‑read, and hybrid (one short‑read and one long‑read library from the same day)—the y‑axis shows the fraction of expressed references in each bin that were exactly reconstructed. A transcript was considered expressed if it had coverage ≥ 1 read per bp in any modality, and exactly reconstructed if its intron chain was identical and its start and end coordinates were within ±100 bp of the reference start and end. Curves plot the weighted average across assemblies, with weights proportional to the number of expressed references in each bin. Shaded areas indicate 95% confidence intervals across assemblies. Numbers above the x‑axis represent the median number of expressed references per bin across assemblies. Results aggregate 14 short‑read assemblies, 9 long‑read assemblies, and 42 hybrid assemblies from the iPSC‑to‑neuron series. B. Longest reference reconstructed per assembly (kb). Each point represents one assembly. Box plots display the median (center line), 25th and 75th percentiles (box bounds), and whiskers extending to the most extreme data points within 1.5X interquartile range; points beyond whiskers represent outliers. The same 14 short‑read, 9 long‑read, and 42 hybrid assemblies described in panel A were analyzed.

Extended Data Fig. 4 StringTie3’s nascent mode algorithm.

A. Candidate transcript selection and quantification in a gene locus. Read coverage is shown at the top, with regions corresponding to the candidate transcript highlighted in orange. Arcs below the coverage plot represent splice junctions supported by spliced reads; thicker arcs indicate stronger read support. The splice graph, with the candidate transcript (that is, the heaviest path) highlighted in orange, is shown below. Arrows beneath the coverage plot indicate the genomic regions corresponding to the nodes in the splice graph. Dashed nodes and edges (for example, nodes 1_2 and 5_6) represent intronic nodes that, due to sparse coverage, are not included in the splice graph when using the non-nascent mode. A flow network (highlighted in green) is then constructed using all nodes from the heaviest path, with edges connecting two nodes if a transfrag starts at one and ends at the other. B. Nascent transcript quantification. Splice graphs and flow networks corresponding to the two nascent transcripts derived from the transcript in panel A are shown. The paths of the nascent transcripts are highlighted in orange in the splice graphs.

Supplementary information

Supplementary Information (download PDF )

Supplementary Tables 1–3, Supplementary Figs. 1–14 and Supplementary Notes 1–4.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Data (download XLSX )

Excel workbook containing 13 tabs: (1–2) Per-sample assembly precision and sensitivity for rRNA-depleted and poly(A) + DLPFC libraries; (3) Transcript overlap between library types; (4) Long-read alignment statistics; (5–8) Differential expression results for AGO1, AGO2, AGO1/AGO2 and AGO1/AGO2/AGO3 knockouts; (9) Breast cancer tumor versus normal differential expression; (10–12) Runtime and memory benchmarks; (13) LRGASP sample and file accessions.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shinder, I., Pertea, G., Hu, R. et al. StringTie3 improves total RNA-seq assembly by resolving nascent and mature transcripts. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03080-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41592-026-03080-3

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics