Fig. 4: MiniQuant-H improves the accurate quantification of gene isoform.

a, Schematic illustration of miniQuant to estimate gene isoform abundance. Left, miniQuant leverages the strengths of long and short reads. Middle, miniQuant-L and miniQuant-H are developed for two data scenarios. The miniQuant-H integrates the likelihood of both long and short reads with gene- and data-specific weight αc, which is determined by a machine learning model with data and gene structure features (for example, K-value) as input. Right, gene isoform abundances estimated from the hybrid likelihood using the EM algorithm. b, Comparison of the median MARD by miniQuant-H with uniform (from 0 to 1, purple) and gene-specific (red) weight across combinations of short-read and long-read sequencing depths (n = 9 combinations) under GENCODE annotation. c, Lines represent median MARD by miniQuant-H with increasing short-read sequencing depths under GENCODE annotation. Long reads across three sequencing depths with three protocols (cDNA-PacBio, cDNA-ONT and dRNA-ONT) are used. Red arrows, recommended combinations of sequencing depths. d, Comparison of the median MARD by five short-read tools, eight long-read tools and miniQuant-H across three short-read and three long-read sequencing depths under GENCODE annotation. e, Barplot showing the mean MARD of ERCC (left) and SIRV (right) spike-in transcripts of SIRV-set4 from the LRGASP consortium (n = 6 biological replicates). Bars within the dashed line indicate the de novo sample-specific annotation identified by Cufflinks and StringTie. Asterisk, technological issues specified in Supplementary Note 6. f, Dot chart represents the median MARD by miniQuant-L and miniQuant-H with different expression levels under GENCODE (left) and sample-specific (right) annotations. g, Dot chart showing the median MARD by short-read-based (left), long-read-based (right) and miniQuant-H tools within different K-value groups. Asterisk, LIQA does not quantify genes with single isoform. LR, long reads; SR, short-read pairs (2 × 150 bp). M, million.