Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads

Abstract

Long-read RNA sequencing is a powerful technology to link transcript structures to genetic variants, but this type of analysis is not often performed owing to the lack of end-user tools. Here we introduce longcallR for joint single-nucleotide polymorphism calling, haplotype phasing and allele-specific analysis, which achieves high accuracy on benchmark datasets. Applied to 202 human samples, longcallR identified 88 significant allele-specific splicing events per sample on average, of which 46% involved unannotated junctions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the longcallR algorithm.
Fig. 2: Allele-specific analysis using longcallR.

Similar content being viewed by others

Data availability

The raw reads for HG002a:MAS:cDNA are available at https://downloads.pacbcloud.com/public/dataset/Kinnex-full-length-RNA/DATA-Revio-HG002-1/. The raw reads for HG002b:MAS:cDNA, HG002c:MAS:cDNA, HG002d:MAS:cDNA, HG004:MAS:cDNA, HG005:MAS:cDNA, HG002a:ISO:cDNA, HG002b:ISO:cDNA, HG002c:ISO:cDNA, HG004:ISO:cDNA and HG005:ISO:cDNA can be accessed at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data_RNAseq/. The WTC11:ONT:cDNA reads can be found in ENCODE under ENCSR539ZXJ. The raw reads for HG002a:ONT:cDNA, HG002b:ONT:cDNA, HG004:ONT:cDNA and HG005:ONT:cDNA are available at https://s3.amazonaws.com/gtl-public-data/giab/bams/cDNA/05_09_23_R941_GIAB_cDNA_PCS111_NA26105_Guppy_6.4.6_sup.pass.fastq.gz.hg38.bam, https://s3.amazonaws.com/gtl-public-data/giab/bams/cDNA/05_09_23_R941_GIAB_cDNA_PCS111_NA27730_Guppy_6.4.6_sup.pass.fastq.gz.hg38.bam, https://s3.amazonaws.com/gtl-public-data/giab/bams/cDNA/05_09_23_R941_GIAB_cDNA_PCS111_NA24143_Guppy_6.4.6_sup.pass.fastq.gz.hg38.bam and https://s3.amazonaws.com/gtl-public-data/giab/bams/cDNA/05_09_23_R941_GIAB_cDNA_PCS111_NA24631_Guppy_6.4.6_sup.pass.fastq.gz.hg38.bam. The HG002:ONT:dRNA, HG004:ONT:dRNA and HG005:ONT:dRNA datasets, generated by the University of Hong Kong, are available from the NCBI under SRX26304755, SRX26304756 and SRX26304757. HPRC human samples are available at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=working/HPRC/. For ground-truth DNA variants, the VCF files for HG002, HG004 and HG005 are available at GIAB v4.2.1. The VCF for WTC11 is provided by the Allen Institute at https://open.quiltdata.com/b/allencell/tree/aics/wtc11_short_read_genome_sequence/. The reference genome GRCh38 can be downloaded from NCBI. All generated SNP calls, ASJs and underlying data for the figures are available via Zenodo at https://doi.org/10.5281/zenodo.17842979 (ref. 27). Source data are provided with this paper.

Code availability

LongcallR is available via GitHub at https://github.com/huangnengCSU/longcallR. LongcallR-nn is available via GitHub at https://github.com/huangnengCSU/longcallR-nn. LongcallR scripts are available via GitHub at https://github.com/huangnengCSU/longcallR_scripts. The Nextflow of longcallR is available via GitHub at https://github.com/huangnengCSU/longcallR-nf.

References

  1. Loving, R. K. et al. Long-read sequencing transcriptome quantification with lr-kallisto. PLoS Comput. Biol. 21, e101369 (2025).

    Article  Google Scholar 

  2. Jousheghani, Z. Z. et al. Oarfish: enhanced probabilistic modeling leads to improved accuracy in long read transcriptome quantification. Bioinformatics 41, i304–i313 (2025).

    Article  Google Scholar 

  3. Chen, Y. et al. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nat. Methods 20, 1187–1195 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Prjibelski, A. D. et al. Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. 41, 915–918 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Gao, Y. et al. ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Sci. Adv. 9, eabq5072 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Pardo-Palacios, F. J. et al. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat. Methods 21, 793–797 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Quinones-Valdez, G., Amoah, K. & Xiao, X. Long-read RNA-seq demarcates cis- and trans-directed alternative RNA splicing. Nat. Commun. 16, 9603 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Tilgner, H., Grubert, F., Sharon, D. & Snyder, M. P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc. Natl Acad. Sci. USA 111, 9869–9874 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Tang, A. D. et al. Detecting haplotype-specific transcript variation in long reads with FLAIR2. Genome Biol. 25, 173 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Deonovic, B., Wang, Y., Weirather, J., Wang, X.-J. & Au, K. F. IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Res. 45, e32 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Edge, P. & Bansal, V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat. Commun. 10, 4660 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797–803 (2022).

    Article  PubMed  Google Scholar 

  16. Ahsan, M. U., Liu, Q., Fang, L. & Wang, K. NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks. Genome Biol. 22, 261 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Huang, N. et al. NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data. Bioinformatics 39, btac824 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wang, B. et al. Variant phasing and haplotypic expression from long-read sequencing in maize. Commun. Biol. 3, 78 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  19. De Souza, V. B. C. et al. Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data. Genome Biol. 24, 91 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Zheng, Z. et al. Clair3-RNA: a deep learning-based small variant caller for long-read RNA sequencing data. Nat. Commun. 16, 11553 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  22. Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Mansi, L. et al. REDIportal: millions of novel A-to-I RNA editing events from thousands of RNAseq experiments. Nucleic Acids Res. 49, D1012–D1019 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Lippert, R. et al. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief Bioinform. 3, 23–31 (2002).

  26. Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Huang, N. & Li, H. longcallR: SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads. Zenodo https://doi.org/10.5281/zenodo.17842979 (2026).

Download references

Acknowledgements

This work is supported by National Institute of Health (grant nos. R01HG010040, R01HG014175, U24CA294203, U01HG013748 and U41HG010972 (to H.L.)). We thank the National Genome Research Institute for funding the following grants supporting the creation of the human pangenome reference: U41HG010972, U01HG010971, U01HG013760, U01HG013755, U01HG013748, U01HG013744 and R01HG011274, and the HPRC (BioProject ID: PRJNA730823).

Author information

Authors and Affiliations

Authors

Consortia

Contributions

N.H. and H.L. designed the algorithm, implemented longcallR and drafted the paper. N.H. conducted the performance evaluation of longcallR for SNP calling, haplotype phasing and allele-specific analysis. The HPRC provided the 202 MAS-seq human sample data.

Corresponding author

Correspondence to Heng Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Andrey Prjibelski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editors: Lei Tang and Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Accuracy of SNP calling and haplotype phasing.

A, precision and recall for SNP calling on PacBio datasets between longcallR (circles) and Clair3-RNA (triangles). Colors represent different samples. Genomic SNPs from GIAB are taken as the ground truth. B, precision and recall for SNP calling on Nanopore datasets. C, Phasing switch error rates between longcallR-phase (green) and WhatsHap (yellow) across PacBio and Nanopore datasets. Trio-based HG002 SNP phasing from GIAB is taken as the ground truth. Lower switch error rates indicate higher phasing accuracy. D, Phasing hamming error rates in haplotype phasing.

Source data

Supplementary information

Source data

Source Data Fig. 2 (download XLSX )

Upset plot source data, Venn plot source data, cumulative count of significant ASJ genes across HPRC samples, count of known and novel ASJ events across HPRC samples.

Source Data Extended Data Fig. 1 (download XLSX )

Precision and recall for SNP calling on PacBio datasets between longcallR and Clair3-RNA, precision and recall for SNP calling on Nanopore datasets, phasing switch error rates between longcallR-phase and WhatsHap across PacBio and Nanopore datasets, phasing hamming error rates between longcallR-phase and WhatsHap across PacBio and Nanopore datasets.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, N., Li, H. & Human Pangenome Reference Consortium. SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03045-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41592-026-03045-6

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics