Abstract
Current single-cell RNA atlases largely capture polyadenylated transcripts while missing critical regulatory layers from noncoding RNA. To address this, we develop a generalizable framework that adapts total RNA profiling for use in standard droplet-based platforms and captures a broad complement of coding and noncoding RNAs using a unified pipeline. Applying this approach to the developing human brain, we generate a dataset mapping diverse RNA biotypes across all neuronal and non-neuronal lineages, revealing biotype-specific expression programs with cell-type and temporal specificity. Tracking microRNA dynamics in Cajal–Retzius neurons, transient and early-born neurons in the cortex, we show the enrichment and target anticorrelation of MIR137, associated with schizophrenia and intellectual disability, suggesting tight regulatory control. We apply TotalX to human peripheral blood mononuclear cells and identify transcriptional modules combining coding and noncoding RNAs and tRNA dynamics. In addition, we analyze dengue-infected hepatocytes and capture non-adenylated viral transcripts that distinguish infection states. This expanded coverage helps with understanding cellular identity and gene regulation at the atlas scale.
Similar content being viewed by others
Main
Non-polyadenylated RNAs comprise a substantial and functionally diverse portion of the transcriptome. These include microRNAs (miRNAs), long noncoding RNAs (lncRNAs), small nucleolar and small nuclear RNAs (snoRNAs, snRNAs), circular RNAs (circRNAs), transfer RNAs (tRNAs), histone mRNAs1,2 and many viral transcripts that naturally lack polyA tails3,4. Together, they have essential roles in transcriptional regulation, RNA processing, translation, stress response and cell fate transitions5,6,7,8. Yet, these transcripts remain largely invisible to conventional single-cell RNA sequencing (scRNA-seq) platforms, which rely on polyA capture and are consequently biased toward protein-coding mRNAs9.
While specialized protocols such as Smart-seq-total10, VASA-seq11, RamDA-seq12, snapTotal-seq13, MATQ-seq14 and scComplete-seq15 have expanded single-cell profiling to include some species of non-polyadenylated RNAs, these methods often require custom equipment, custom enzymes, extensive sample processing or bespoke computational pipelines—factors that limit their scalability and integration into high-throughput and widely adopted workflows. Moreover, most of these approaches either underperform or entirely fail to capture mature miRNAs, which have critical roles in development, immunity and disease. Therefore, none of these methods have been adopted to generate data at the atlas scale, leaving them largely confined to proof-of-principle or small-scale studies. As a result, large portions of RNA biology are systematically excluded from most single-cell studies, leaving blind spots in efforts to map regulatory networks, cell states and disease mechanisms that involve noncoding transcripts.
Here we present a generalizable and scalable framework for total RNA profiling in single cells that is fully compatible with commercial high-throughput microfluidic single-cell analysis systems16. Our approach captures both short and long polyadenylated and non-polyadenylated RNAs using a minimal set of biochemical and computational modifications, enabling broad adoption without sacrificing throughput, accessibility or interoperability with existing pipelines and datasets.
We applied this framework to over 500,000 single cells spanning diverse biological systems. In peripheral blood mononuclear cells (PBMCs), we detected robust cell-type-specific expression of noncoding RNAs, including miRNAs, tRNAs and lncRNAs, and revealed the regulatory architecture of coding and noncoding co-expression modules. In cells infected with a non-polyadenylated RNA virus, we captured both host and viral transcripts, revealing antiviral programs otherwise missed by polyA-based protocols. In the developing human brain, we uncovered hundreds of dynamically regulated noncoding RNAs that trace developmental transitions and lineage specification.
Altogether, our work establishes a robust, accessible and scalable strategy for total RNA profiling at single-cell resolution. The comprehensive datasets presented in this study reveal regulatory architectures otherwise undetected by conventional approaches.
Results
TotalX streamlines high-throughput detection of noncoding RNA in single cells
To enable the robust detection of noncoding RNA species in a scalable single-cell framework, we adapted the principles of Smart-seq-total to a droplet-based 3′ chemistry, resulting in a method we term TotalX (Fig. 1a). TotalX uses a custom template-switching oligo (dUTSO) and uracil-DNA glycosylase (UDG) digestion after reverse transcription, as established in Smart-seq-total10. It further integrates Cas9-mediated ribosomal RNA (rRNA) depletion at the pre-amplified cDNA stage (depletion of abundant sequences by hybridization (DASH)) (Extended Data Fig. 1a–c), while preserving full compatibility with standard microfluidics hardware and software (Extended Data Fig. 1d,e and Supplementary Note). In addition, we use custom oligos to index small fragments through direct amplification (Supplementary Table 1 and Methods).
a, Schematic overview of the TotalX protocol. Total RNA is polyadenylated and reverse transcribed using a custom template-switching oligo (dUTSO). After reverse transcription, the TSO is digested with UDG and rRNA is depleted at the pre-amplified (Pre-amp) cDNA level using Cas9-based DASH. Long (>400 bp) and short (<400 bp) fragments are indexed separately, with optional inclusion of a gel-purified miRNA fraction (~18–50 bp) and pooled for sequencing. b, Gene detection efficiency across technologies. Comparison of the average number of genes per cell as a function of UMIs for TotalX (green), VASA-seq (orange) and 10x Genomics 3′ chemistry (blue), across binned depths. c,Unique genes detected per cell after UMI downsampling. Gene detection following normalization to 20,000 UMIs per cell. TotalX yields high gene complexity similar to VASA-seq and higher than the standard 10x Genomics Chromium 3′ workflow. The central line indicates the median, the box denotes the interquartile range (25th–75th percentiles) and the whiskers extend to the minimum and maximum values. The width of each violin represents the kernel density of the data. Sample sizes are indicated above each plot. d, Total number of unique genes detected per RNA biotype. Radial plots show numbers of unique genes detected in a representative experiment for each method, broken down by RNA biotype: protein-coding RNA, lncRNA, miscellaneous RNA (miscRNA), miRNA, snoRNA, snRNA, tRNA and histone RNA. Ratios represent the proportion of detected genes relative to the total number of annotated genes within each biotype. Only genes detected in 10 or more cells were counted. e, Improved detection of miRNAs using mixed library input. The scatterplot shows the average counts per million (CPM) per cell of TotalX alone (x-axis) versus TotalX with an added miRNA fraction (TotalX miRNA(+)) (y-axis) in HEK293T cells. Known HEK293T-specific miRNAs (red) reach expression levels comparable to those of low- and moderate-protein-coding genes (blue). Inset: proportion of reads mapping to the genome, indicating a trade-off with miRNA inclusion. f, Coverage profiles for selected miRNAs. Read depth plots for MIR17, MIR222 and MIR221 showing mature miRNA arms (gray regions). Schematic in a created in BioRender; Isakova, A. https://biorender.com/vsxwvmu (2026).
We benchmarked TotalX against VASA-seq (VASA-drop version), a high-performing noncoding RNA method that requires specialized custom equipment11, and the standard 10x Genomics 3′ platform. TotalX achieves a comparable number of genes per unique molecular identifier (UMI) to VASA-seq, with a similar slope of gene recovery as a function of sequencing depth (Fig. 1b). After normalizing to 20,000 UMIs per cell, TotalX detects a comparable number of genes per cell to VASA-seq, and reaches the scale typically achievable on the commercial 10x Genomics platform by profiling over 11,000 cells in a single run (Fig. 1c and Extended Data Fig. 1f–h). At this experiment scale, TotalX captured a broad diversity of noncoding RNAs, including lncRNAs, miRNAs, snoRNAs, snRNAs, tRNAs and histone RNAs, showing its ability to profile diverse RNA biotypes in one experiment (Fig. 1d). We also assessed technical noise across RNA biotypes and found that TotalX shows noise characteristics comparable to those of VASA-seq, with noncoding RNAs showing the expected higher variability owing to their lower abundance (Extended Data Fig. 1i,j).
To deepen detection of short RNA fragments, we performed size selection for miRNA-enriched cDNA (~18–50 bp) and mixed this with longer fragments during library preparation (Extended Data Fig. 1b,c). All RNAs are tagged using UMI incorporation, and thus, the final libraries, even enriched, are demultiplexed to count unique RNA molecules. This optional ‘miRNA(+)’ strategy increased detection of endogenous HEK293T-enriched miRNAs17 (Fig. 1e,f). The miRNAs were detected at expression levels comparable to those of low-to-moderately expressed protein-coding genes, indicating strong capture sensitivity. We observed a trade-off; however, the inclusion of size-selected short RNA fragments led to an ~30% decrease of reads confidently mapped to the genome and transcriptome (Fig. 1e), as shorter reads are more likely to map ambiguously or represent adapters with no insert (Extended Data Fig. 2a–c). Nevertheless, TotalX miRNA(+) still retained strong detection of both long and short noncoding RNA, outperforming the TotalX baseline version in biotype diversity (Extended Data Fig. 2d,e). Because DASH represents an additional rRNA-directed cleanup step, we compared TotalX implementations without as well as with two rounds of DASH in miRNA(+) configurations (Extended Data Fig. 2a–e). These analyses showed that DASH had only a modest effect on the miRNA(+) version of the protocol, producing a slightly broader distribution of per-cell miRNA fractions without altering the mean (Extended Data Fig. 2f).
To benchmark against bulk references, we compared pseudo-bulk profiles from TotalX, TotalX miRNA(+), and VASA-seq to bulk total RNA-seq18,19 and bulk miRNA-seq datasets20,21 from HEK293T cells (Extended Data Fig. 2g,h). TotalX showed strong concordance with bulk for protein-coding and lncRNA transcripts, comparable to VASA-seq. For miRNAs, the miRNA(+) modification improved agreement with bulk miRNA-seq, supporting the utility of this enrichment step.
Read coverage profiles for individual miRNA genes, such as MIR17, MIR221 and MIR222, revealed precise detection of mature miRNA arms (5p or 3p) by TotalX (Fig. 1f). These profiles confirm that TotalX not only captures miRNA presence, but also preserves their biologically relevant processing patterns, which could be informative for downstream functional analysis and interpretation.
Cell-type-resolved profiling of noncoding RNAs in human PBMCs
We next applied TotalX to human PBMCs to evaluate its ability to profile coding and noncoding RNAs across diverse immune cell types (Fig. 2a). Using protein-coding genes for initial clustering and annotation (Extended Data Fig. 3a and Methods), we identified major expected populations including T and B lymphocytes, monocytes, natural killer (NK) cells, dendritic cells, plasmablasts and proerythroblasts.
a, Schematic comparison of protocols applied to PBMCs. TotalX incorporates enzymatic polyadenylation followed by reverse transcription using a custom TSO (10x-dUTSO). A non-polyadenylated 10x-dUTSO protocol is used for comparison. b, UMAP projection of TotalX-profiled PBMCs. Cell types were identified using protein-coding gene expression and canonical markers. Mono, monocytes. Two monocyte states are shown: regular and a transitional state (^t), each outlined by dashed-line contours. c, Cell type frequencies across protocols. The bar plot and bubble chart compare the number and relative proportion of recovered cell types between TotalX and 10x-dUTSO datasets. The bubble size reflects the fraction of each cell type within each dataset. The bar plot shows the number of cells detected in both datasets. d, Cell-type-specific expression of noncoding RNAs. Dot plots show representative lncRNA, histone RNA, miRNA, scaRNA and snoRNA markers across annotated immune cell types. e, Relationship between tRNA supply and amino acid demand. The scatterplot shows global correlation across cell types (Pearson r = 0.79, P = 1.59 × 10−66), assessed using a two-sided Pearson correlation test. Individual amino acids are highlighted; arginine and glycine show more supply while tryptophan and phenylalanine show undersupply compared with other amino acids. f, T cell-enriched gene co-expression module (module 24). Top left: GO enrichment terms for protein-coding genes in the module, assessed using a one-sided hypergeometric test with P values adjusted for multiple comparisons using the Benjamini–Hochberg FDR < 0.05. Bottom left: biotype composition of the module; bottom right: module expression across cell types. The module includes MIR150, SNORA26 and CD28. g, Platelet-specific gene co-expression module (module 7). Top: GO enrichment reveals association with platelet activation and megakaryocyte development, assessed using a one-sided hypergeometric test with Benjamini–Hochberg FDR correction (FDR < 0.05). The module is composed largely of lncRNAs (for example, SMANTIS, SMILR, LINC01750) co-expressed with GP9 and ITGA2B. Schematic in a created in BioRender; Isakova, A. https://biorender.com/fljv382 (2026).
Compared with a version of the same protocol lacking the polyadenylation step (10x-dUTSO), TotalX recovered a similar diversity of immune cell types (Fig. 2b) and allowed for more consistent detection of low-RNA-content cell types, including platelets and basophils, although we do not claim comprehensive detection of rare populations (Fig. 2c). We observed that monocytes were sensitive to protocol conditions, especially the presence of ATP used during the enzymatic polyadenylation step. In agreement with previous reports22, the presence of ATP led to a transcriptional transition state in monocytes characterized by downregulation of LYZ and upregulation of JARID2 and GAB2, accompanied by a global reduction in gene expression and a partial loss of canonical identity (Extended Data Fig. 3b–g). Beyond this monocyte-specific effect, pseudobulk comparisons of protein-coding gene expression showed high concordance between TotalX and 10x-dUTSO across other PBMC cell types, with only a small set of genes enriched for known rapidly turned-over transcripts (for example, DUSP1, JUNB, JUND, ZFP36) showing higher abundance in TotalX (Extended Data Fig. 4a).
TotalX enabled detection of noncoding RNAs that are consistently expressed in defined immune cell populations, supporting their role in maintaining cell identity. Wilcoxon rank-sum testing identified hundreds of differentially expressed noncoding RNAs (log2(fold change (FC)) > 1, adjusted P value < 0.05) across cell types (Extended Data Fig. 4b,c and Supplementary Table 2). For example, MIR650 was enriched in plasmablasts, MIR147B in conventional dendritic cells, type 2 (cDC2s) and MIR150 in T cells, proerythroblasts and naive B cells. Several lncRNAs also showed cell-type enrichment, including LINC00299 in NK and natural killer T (NKT) cells, and PELATON in monocytes. We additionally detected structured noncoding RNAs such as SCARNA6 (plasmablasts) and SNORD13 (plasmacytoid dendritic cells (pDCs)) (Fig. 2d), suggesting broad recovery of functionally relevant small RNAs.
Because TotalX captures tRNAs, we next asked whether there is alignment between tRNA availability and amino acid demand based on codon usage (Methods). Across cell types, we observed a strong correlation (Pearson r = 0.79, P = 1.59 × 10−66). However, individual amino acids such as arginine and glycine showed relatively higher tRNA supply compared with others, while tryptophan and phenylalanine were more limited (Fig. 2e and Extended Data Fig. 4d). These modest deviations indicate that while codon usage strongly predicts tRNA availability, there may be additional influences shaping tRNA pools.
To investigate how noncoding RNAs are co-regulated with protein-coding genes, we performed weighted gene co-expression network analysis (Methods) across all TotalX-profiled PBMCs. We identified over 30 gene modules comprising both coding and noncoding transcripts (Extended Data Fig. 5a,b and Supplementary Table 3). For example, module 24, enriched in T cells and CD14+ monocytes, included MIR150, SNORA26 and CD28 and was associated with gene ontology (GO) terms related to lymphocyte activation (Fig. 2f). Module 7, a platelet-specific module, included lncRNAs such as SMANTIS, SMILR and LINC01750, co-expressed with canonical platelet transcripts GP9 and ITGA2B, and enriched for terms related to platelet activation and megakaryocyte development (Fig. 2g).
To further assess whether TotalX captures expected miRNA–mRNA regulatory relationships, we analyzed correlations between detected miRNAs and their validated or predicted targets in PBMCs (42 miRNAs with expressed targets; Extended Data Fig. 5c and Methods). We observed the prevalence of anticorrelated miRNA–mRNA pairs compared with positively correlated, including many previously reported interactions, such as MIR150/EP300 (ref. 23), MIR663A/JUNB and JUND (ref. 24), MIR21/CCR7 (ref. 25) and others (Extended Data Fig. 5d) consistent with canonical miRNA-mediated repression. This analysis shows that TotalX captures biologically meaningful miRNA–target relationships.
These results highlight that noncoding RNAs are not only detectable but also systematically co-expressed with functional coding genes in a cell-type-specific manner, suggesting their potential roles in immune cell identity and function.
Co-detection of non-polyadenylated viral transcripts and host transcriptome in DENV2-infected cells
We next evaluated whether TotalX enables simultaneous detection of non-polyadenylated viral RNAs and host transcripts within individual cells. As a test case, we profiled human Huh7 cells infected with dengue virus serotype 2 (DENV2)—a non-polyadenylated flavivirus whose RNA genome encodes a single polyprotein composed of both structural and nonstructural proteins (C-prM/M-E-NS1-NS5) and a family of structured, noncoding RNAs (sfRNA1–4) that originate from the 3′ untranslated region (UTR)26,27 (Fig. 3a and Extended Data Fig. 6a).
a, Overview of the DENV2 genome structure and infection design. The positive-strand RNA genome encodes a polyprotein comprising structural (C-prM/M-E) and nonstructural proteins (NS1–NS5), flanked by 5′ and 3′ UTRs. The 3′ UTR gives rise to four highly structured, noncoding subgenomic RNAs (sfRNA1–4). Huh7 cells were infected with DENV2 and profiled using TotalX. b, Host gene expression correlation with viral load. Heatmap of coding and noncoding host genes (rows) sorted by increasing viral load across infected cells (columns). POLY and sfRNA levels contribute to a composite viral load score. c, UMAP projection of DENV2-infected and uninfected cells. Clustering is based on host protein-coding gene expression. d, Abundance of sfRNA1 and POLY in infected cells. Feature plots confirm detection of viral noncoding and coding transcripts at the single-cell level. e, Infection states resolved by unsupervised clustering of single-cell host transcriptomes. Infected cells separate into two transcriptionally distinct clusters: active and quiescent host response. f, TF enrichment analysis. Gene set enrichment analysis shows activation of ATF4 and DDIT3, among others, in infected cells with active host response. g, Examples of gene expression versus viral load. POLY and sfRNA1 increase monotonically with load, while host stress response and interferon genes diverge between active and quiescent states. Normalized expression values are scaled between 0 and 1 for each gene. Dots represent single cells; lines indicate LOWESS-smoothed trends for each cluster. Colors denote distinct clusters as indicated in the legend. h, Viral load distribution across states. Quiescent and active infection groups both contain cells with low and high viral load, indicating that viral load alone does not determine host response. i, Enrichment of chromatin and RNA regulation pathways. Reactome GO enrichment showing histone deacetylation and acetylation signatures in the quiescent response, high viral load (quiescent_highVL) state. Enrichment significance was assessed using a one-sided over-representation test (hypergeometric test), with P values adjusted for multiple comparisons using the Benjamini–Hochberg FDR. Significance is indicated with asterisks: ***, FDR < 0.001; NS, non-significant. j, Module scores for infection-associated programs. Violin plots show activation of UPR, ER stress, granule formation and MAPK signaling across groups. UPR and ER stress increase in quiescent cells with high viral load (quiescent_highVL), but MAPK signaling remains suppressed. Schematic in a created in BioRender; Isakova, A. https://biorender.com/hbpc3kw (2026).
We applied TotalX to infected and mock-treated Huh7 cells and successfully recovered viral transcripts including both protein-coding regions (POLY) and structured noncoding RNAs (sfRNAs) in the infected cells. The number of viral UMIs per 10,000 host UMIs exceeded previous reports28, highlighting the sensitivity of TotalX to short, structured RNAs (Extended Data Fig. 6). To quantify infection per cell, we defined a viral load score based on summed abundance of POLY and sfRNA1–4. This score correlated with broad host transcriptomic changes across both protein-coding and noncoding genes (Fig. 3b).
Infected cells partitioned into two transcriptional states—‘active response’ and ‘quiescent response’—based on host gene expression (Fig. 3c–e). Despite similar viral levels, cells in the quiescent response state failed to mount the canonical antiviral response and transcriptionally looked similar to uninfected cells (Fig. 3c–e). This heterogeneity is consistent with the infection conditions used (multiplicity of infection (MOI) = 1, 48 h postinfection; Methods), which are expected to yield a mixture of uninfected cells, secondary-infected cells and more advanced primary-infected cells within the population. Transcription factor (TF) enrichment analysis using GSEA29 showed that active response was associated with ATF4 and DDIT3 activation—regulators of the unfolded protein response (UPR) and endoplasmic reticulum (ER) stress (Fig. 3f)—consistent with previous studies30. By contrast, quiescent cells did not show upregulation of these targets.
Notably, we found a subset of ‘quiescent highVL’ cells—those with high viral load but suppressed host activation. While these cells contained abundant levels of both POLY and sfRNA transcripts (Fig. 3g,h), they maintained baseline (uninfected) levels of EIF1 and S100P and showed higher levels of AFP, APOB and MAVS compared with the ‘active response’ group (Fig. 3g). In addition, the ‘quiescent highVL’ cells showed muted inflammatory and stress-response signatures, including downregulation of mitogen-activated protein kinase (MAPK) signaling (Fig. 3i). This observation prompted us to explore possible mechanisms underlying the suppression of host responses in these cells. Although previous studies have shown that sfRNA can inhibit host antiviral defenses31,32, the precise factors at play in our system remain unclear. To investigate further, we performed differential expression analysis across high (highVL) and low viral load (lowVL) cells within both active and quiescent cell response states (Extended Data Fig. 7 and Supplementary Table 4). In the ‘quiescent highVL’ group, we observed strong enrichment of chromatin regulatory signatures, including histone deacetylase (HDAC) and histone acetyltransferase (HAT) pathways (Fig. 3j), implicating epigenetic remodeling in the suppression of host responses. Previous studies have shown that HDAC inhibition during DENV2 infection can modulate inflammatory cytokine production33; here we provide a transcriptome-wide, single-cell-level quantification of this effect, highlighting a potential mechanism of immune evasion through chromatin-based silencing.
Region- and cell-type-specific expression of noncoding RNAs in the developing human brain
To chart the cellular and spatial distribution of noncoding RNAs in the human brain, we applied TotalX to neurosurgical and fetal tissue collected across four developmental stages: gestational week 19 (GW19), 9 months (9MO), 8 years (8YO) and 16 years (16YO) (Fig. 4a). The dataset includes 22 biological samples from 4 individuals, covering a broad array of brain regions, including the prefrontal (PFC) and primary visual (V1) cortices, thalamus, hippocampus, cerebellum, insula, meninges, choroid plexus, ventricular zones and ocular tissue (Fig. 4b and Extended Data Fig. 8a,b).
a, Schematic of sampled brain regions. Dissected regions include the cortex (frontal, temporal and occipital), subventricular zone (SVZ), hippocampus, thalamus, cerebellar cortex, choroid plexus, insula, meninges, middle temporal gyrus and eye tissue. Developmental time points span gestational week 19, 9 months postnatal, 8 years and 16 years. CUSA refers to the sample obtained from cavitron ultrasonic surgical aspirator. b, Cell class composition across age and region. The stacked bar plot shows the distribution of major cell classes (for example, neuronal, glial, epithelial, immune) across sampled regions at each developmental stage. c, UMAP projection of 301,515 single cells profiled using TotalX. Cells are colored by cell type (main plot), cell class (top right) and age (top left). Arcs connect the most cell-type-specific noncoding RNA markers (top 400 per cell type, selected based on log2(FC) > 1 and adjusted P value < 0.01, determined using a two-sided Wilcoxon rank-sum test with Benjamini–Hochberg FDR correction)—including miRNAs, lncRNAs, snRNAs and histone RNAs—to their genomic coordinates. Arcs are colored by gene biotype, illustrating relationships among RNA biotypes, chromosomal loci and cell classes. d, Selected cell-type-specific noncoding RNA markers. Dot plots show the expression of noncoding RNAs across brain cell types, grouped by broad cell classes. chplx, choroid plexus; CGE, MGE and LGE, caudal, medial and lateral ganglionic eminence, respectively; IPC, intermediate progenitor cell; SMC, smooth muscle cell.
We profiled 301,515 single cells using the TotalX-miRNA(+) protocol, which enabled robust detection of diverse noncoding RNA biotypes across a broad range of cell types and developmental stages. Dimensionality reduction revealed distinct transcriptional clusters corresponding to canonical neural (including inhibitory (IN) and excitatory (EN) neurons), glial, immune and non-neural cell types, with clear separation by developmental stage and brain region (Fig. 4c). Cell types were annotated based on established coding gene markers (Extended Data Fig. 8c)34,35,36,37 and further characterized by their noncoding RNA expression profiles.
Noncoding RNA expression varied markedly across cell types and developmental stages. Histone mRNAs were highly expressed in rapidly dividing progenitors such as radial glia, while snRNAs and lncRNAs were enriched in immune and differentiated neuronal populations. In total, we identified hundreds of differentially expressed non-polyadenylated transcripts, revealing their widespread association with cell identity and developmental state (Extended Data Fig. 8d).
To examine how noncoding RNA expression relates to genomic context, we mapped the top cell-type-specific noncoding RNA markers to their chromosomal coordinates (Fig. 4c). This analysis uncovered biotype-, time- and lineage-specific coordinated expression of entire noncoding RNA clusters: for instance, multiple histone genes clustered on chromosome 6 were co-expressed predominantly in progenitors, while multiple snRNAs within a chromosome 1 cluster showed coordinated enrichment in microglia. These findings show that cell-type-specific regulation of noncoding gene transcription extends beyond individual genes to encompass entire chromosomal clusters, requiring coordinated activation across extended genomic regions.
We next looked for cell-type-specific noncoding RNA markers. Figure 4d highlights selected noncoding RNAs across major cell types and RNA biotypes. Dot plots and heatmaps show the expression of representative lncRNAs, miRNAs and snoRNAs across neuronal, glial, immune, epithelial and mesenchymal populations. Several miRNAs showed striking cell-type specificity. For example, MIR17 was highly expressed in Cajal–Retzius (CR) cells38,39 and deep-layer excitatory neurons (EN-deep_layer), while MIR128-2 was detected in the same populations but showed peak expression in lateral ganglionic eminence-derived interneurons (IN-LGE-FOXP1). MIR9-1 was enriched in astrocytes, MIR92B in ependymal cells and MIR125B1 in early neurons. We also observed MIR147B expression in macrophages but not in microglia, and broad expression of MIR23A and MIR27A across immune, stromal and epithelial populations, as well as in a subset of radial glia (Extended Data Fig. 8e).
Beyond miRNAs, other biotypes showed similarly specific expression patterns. For example, LINC00299 was enriched in astrocytes and ependymal cells, SNORA7A and LNCAROD in microglia, and SNORA11F in neurons. ENSG00000258312 marked choroid plexus ependymal cells, while LINC01644 and PCAT19 were detected in endothelial cells. SNHG18 and SNORA78 were broadly expressed across mesenchymal and epithelial progenitor populations (Fig. 4d).
Together, these data show that noncoding RNA, along with coding genes, exhibit distinct and dynamic patterns of cell-type and temporal specificity. Their abundance deepens the layered regulatory complexity of brain development and highlights the utility of total RNA profiling for resolving the noncoding transcriptome at single-cell resolution.
Noncoding RNA dynamics across developing neuronal and glial lineages
To investigate the role of noncoding RNAs in neuronal lineage commitment, we focused on developing excitatory and inhibitory neurons (Fig. 5a,b). Using partition-based graph abstraction (PAGA) and pseudotime alignment, we reconstructed developmental trajectories from radial glia (RG) toward both glutamatergic neuronal lineages, and from ganglionic eminence progenitors toward GABAergic neuronal lineages40. While most neurons in our dataset remain in immature states, we observed robust expression of canonical early differentiation markers. These included EOMES in intermediate progenitors, DCX in newborn neurons, and SATB2 and SLC17A7 in developing excitatory neurons, as well as DLX1, DLX2 and GAD1 in GABAergic interneuron precursors (Fig. 5a,b and Extended Data Figs. 8a,c and 9a,b), consistent with previous studies of the developing human brain34,36.
a, UMAP of neuronal lineages annotated by marker gene expression. Expression of canonical glutamatergic (SLC17A7) and GABAergic (GAD1) markers delineates excitatory and inhibitory trajectories across developing brain regions, including cortical, subplate and ganglionic eminence-derived neuroblasts. b, Pseudotime trajectories of glutamatergic and GABAergic neuronal lineages. Pseudotime was inferred using PAGA. CGE, MGE and LGE, caudal, medial and lateral ganglionic eminence, respectively. c, Selected gene expression across neuronal progenitors and mature neuron types. The heatmap shows relative expression of protein-coding genes, lncRNAs, snoRNAs, snRNAs and miRNAs across major excitatory and inhibitory neuronal lineages. Expression values represent normalized means across 100 pseudotime-defined bins. The color bar indicates pseudotime, consistent with b. d, Dot plot of disease-associated miRNA abundance across neuronal cell types. e, Read coverage profile of the MIR137 gene locus. Aggregate read depth across the MIR137 gene region (chr1(–): 98,046,070–98,046,171), showing precise coverage of the mature 3p arm by TotalX in developing neuronal lineages. f, Spearman correlation between MIR137 and its targets. Histogram of Spearman correlation coefficients across MIR137 and all other genes. Correlations were assessed using a two-sided Spearman rank correlation test. Some MIR137 validated targets are significantly anticorrelated (Benjamini–Hochberg adjusted P value (FDR) < 0.05), consistent with post-transcriptional repression. Vertical lines and gene names denote top anticorrelated protein-coding genes. g, GO enrichment for MIR137 targets. Reactome GO analysis of negatively correlated (Spearman ρ < −0.3, Benjamini–Hochberg adjusted P value (FDR) < 0.01) validated MIR137 targets reveals enrichment for pathways related to nervous system development, WNT signaling and axon guidance. TKI, tyrosine kinase inhibitors.
Along these trajectories, we identified dozens of developmentally dynamic noncoding RNAs—including miRNAs, lncRNAs and snRNAs (Fig. 5c). Several small RNAs with known neurological relevance, such as MIR222, MIR103A1 and MIR128-241,42,43, showed transient expression peaks in early neurons and declined with maturation, suggesting potential roles in fate specification and lineage progression.
We performed similar analyses for glial trajectories, reconstructing differentiation paths from RG into glia-restricted progenitor cells (GPCs)44 to astrocytes or oligodendrocyte precursor cells (OPCs), and mature oligodendrocytes (Extended Data Fig. 9c–f). We observed comparable dynamic patterns of noncoding RNA expression along glial lineages. LINC00299 was upregulated in late-stage astrocytes, while MIR219A2HG showed increased expression at later stages of OPC differentiation. MIR143 was enriched in mature astrocytes, whereas MIR568 showed a dynamic trajectory—highly expressed in radial glia, downregulated in astrocyte progenitors and then re-expressed during astrocyte and OPC maturation. Notably, MIR23A was also expressed in RG and re-emerged in mature oligodendrocytes, suggesting a possible reuse of regulatory programs at distinct stages of glial development.
miRNA dynamics and target repression across developing neuronal trajectories
CR cells are one of the earliest-born neurons in the developing cortex, although they largely disappear by birth. During development, they occupy the outermost cortical layer I and guide radial neuron migration by secretion of reelin (RELN)45,46. Among the developing neuronal populations, CR cells emerged as a hotspot of miRNA expression, showing higher overall miRNA abundance compared with other neuronal subtypes (Extended Data Fig. 10a). This suggests that CR cells—although transient—may rely on enhanced post-transcriptional regulation during early cortical development. Within this population, MIR137 stood out for its specificity: it was highly expressed in CR cells and nearly absent from other neuronal subtypes at GW19 (Fig. 5d). MIR137 is also one of the strongest genome-wide association studies (GWAS)-associated loci for schizophrenia and intellectual disability47,48, with established roles in synaptic development and epigenetic regulation49. Additional miRNAs, including MIR17 and MIR222—previously linked to neuronal survival and axon outgrowth50,51—also showed enriched expression across several populations during early neurogenesis.
To investigate whether these miRNAs exert post-transcriptional control within the same cells, we examined the expression of their validated targets. For MIR137, TotalX read coverage revealed detection of both 5p and 3p arms, with expression dominated by the 3p arm (Fig. 5e)—consistent with previous findings showing that miR-137-3p accounts for 97% of all miR-137-specific reads across 71 sequencing experiments compiled by miRBase52. MIR137 was selectively expressed in early neurons, particularly CR cells, whereas many of its known targets—genes involved in cell cycle progression, chromatin remodeling, RNA splicing and synaptic vesicle transport—were enriched in neuronal progenitors and showed mutually exclusive expression with MIR137. Spearman correlation analysis confirmed a significant anticorrelation between MIR137 and some of its validated targets (adjusted P value < 0.05) (Fig. 5f), consistent with miRNA-mediated repression53. Among the top anticorrelated genes were SLC38A2 and CDC42, the latter of which has previously been shown to be downregulated by MIR137 in cancer cells54. GO enrichment analysis further implicated MIR137 targets in nervous system development and Wnt signaling (Fig. 5g), underscoring their relevance to early neurogenesis.
We performed similar analyses for MIR17, MIR125B-2 and MIR128-2 (Extended Data Fig. 10c–e) and found that both validated and predicted targets53,55 were consistently anticorrelated with miRNA expression (adjusted P value < 0.05).
Together, these findings support a critical role for noncoding RNAs—particularly miRNAs—in shaping early neuronal lineage commitment, and position CR cells as a unique miRNA-regulated population during cortical development.
Discussion
This study introduces TotalX, a scalable and accessible framework for total RNA profiling in single cells that enables the simultaneous detection of both polyadenylated and non-polyadenylated transcripts. By building upon a widely adopted droplet scRNA-seq platform with minimal protocol and software modifications, TotalX overcomes long-standing limitations in single-cell transcriptomics in which noncoding RNAs—particularly short or non-polyadenylated species—have remained largely inaccessible. Through benchmarking, optimization and application across diverse biological contexts, we show that TotalX provides a biotype-rich molecular view of individual cells while preserving throughput and compatibility with existing pipelines.
Using TotalX, we detected diverse classes of noncoding RNAs, including miRNAs, tRNAs, lncRNAs, snoRNAs, snRNAs and histone RNAs, at single-cell resolution across immune cells, virally infected hepatocytes and the developing human brain. In PBMCs, we identified noncoding RNA markers of cell identity and co-expression modules linking coding and noncoding genes. In DENV2-infected cells, TotalX captured both coding and structured viral RNAs and identified a transcriptionally quiescent but high viral load infection state potentially linked to chromatin regulation. In the developing brain, we mapped the landscape of noncoding RNA expression across regions and developmental time points, uncovering biotype-specific enrichment patterns and chromosomal localization of cell-type-restricted noncoding RNAs. These findings highlight the biological value of simultaneously capturing coding and noncoding RNA programs in single cells.
CR cells, among the earliest-born neurons in the cerebral cortex, have a foundational role in brain development by secreting reelin and guiding the laminar architecture45. Although transient, they scaffold the future cortex. Our finding that CR cells show enriched expression of MIR137—a miRNA strongly associated with schizophrenia risk47—raises the possibility that post-transcriptional dysregulation in this population could have lasting consequences for cortical circuit assembly. Because MIR137 regulates genes involved in chromatin remodeling, synaptic function and neuronal maturation, its misexpression during the narrow developmental window when CR cells are active may contribute to neurodevelopmental disorders. These observations potentially position CR cells as both structural architects and a regulatory vulnerability point in psychiatric disease.
A key advantage of TotalX lies in its ability to resolve dynamic expression of small RNAs, such as miRNAs, and link them to functional outcomes. For instance, in developing neuronal lineages, we identified temporally restricted expression of several miRNAs and showed strong anticorrelation with its validated targets, suggesting direct post-transcriptional repression within the same cell. This level of resolution is difficult to achieve using traditional polyA-capture methods or indirect inferences from bulk data. More broadly, the ability to recover miRNA–target relationships, histone mRNA bursts and coordinated tRNA-codon programs opens new avenues for studying cell state transitions, stress responses and translational control. TotalX can, in principle, be applied to both single cells and single nuclei; however, in this study, we focused on cell-based profiling, as mature miRNAs are predominantly cytoplasmic56.
Despite these advances, TotalX has limitations. The addition of short RNA libraries, while improving small RNA detection, can reduce the overall proportion of mappable reads and requires careful optimization of library mixing ratios. Detection of circRNAs and very low-abundance transcripts remains limited, and future protocol iterations may benefit from dedicated capture strategies or enrichment steps. In particular, low-abundance noncoding RNAs—such as certain lncRNAs, snoRNAs and miRNAs—are also potentially underrepresented or missed entirely in the current workflow. Furthermore, although our modified Cell Ranger pipeline enables short read processing, specialized quantification tools may improve detection sensitivity and assignment accuracy for certain noncoding classes, such as miRNA.
Altogether, TotalX generates rich, integrative data fully compatible with widely used workflows, such as single-cell atlases and Perturb-seq screens on the 10x Genomics Chromium platform57,58. Resolving coding and noncoding elements simultaneously within single cells provides a more complete molecular phenotype, essential for uncovering subtle regulatory interactions, rare cell states and context-specific responses.
Methods
HEK293T cell isolation
HEK293T human embryonic kidney cells (female; ATCC, CRL-3216) were cultured in complete Dulbecco’s modified Eagle’s medium (DMEM) high-glucose medium (Thermo Fisher Scientific, 11965092) supplemented with 10% fetal bovine serum (Thermo Fisher Scientific, 16000044), 1 mM sodium pyruvate (Thermo Fisher Scientific, 11360070) and 100 μg ml−1 penicillin–streptomycin (Thermo Fisher Scientific, 15070063). Cells were dissociated using 0.25% Trypsin–EDTA (Thermo Fisher Scientific, 25200056) for 2–4 min at 37 °C and collected for analysis.
PBMC isolation
PBMCs were isolated from a leukoreduction system chamber collected from a healthy 61-year-old adult donor at Stanford Blood Center. The leukoreduction system chamber product was diluted 1:4 in phosphate-buffered saline (PBS) supplemented with 2% fetal bovine serum (FBS; Thermo Fisher Scientific, 16000044) and layered onto Ficoll-Paque Plus (GE Healthcare, 17-1440-02) for density gradient centrifugation at 400 × g for 30 min at room temperature, with the brake off. The mononuclear cell layer was collected and washed twice in PBS plus 2% FBS, and residual red blood cells were lysed using ACK lysis buffer (Thermo Fisher Scientific, A1049201) for 3 min at room temperature. Cells were then washed, counted and resuspended at 1 × 107 cells per ml in CryoStor CS10 cryopreservation medium (Stemcell Technologies, 07930), aliquoted into cryovials and frozen at −80 °C in a controlled-rate freezing container before long-term storage in liquid nitrogen.
Before single-cell capture, cryopreserved PBMCs were rapidly thawed at 37 °C, transferred to pre-warmed RPMI 1640 medium (Thermo Fisher Scientific, 11875093) supplemented with 10% FBS, and washed twice in PBS plus 0.04% BSA (Thermo Fisher Scientific, AM2616). Viability and concentration were assessed using the Luna automated cell counter (Logos Biosystems) with Trypan Blue exclusion. Final input cell concentrations were adjusted to 750–1,000 cells per µl.
Dengue virus infection of Huh7 cells
Huh7 human hepatoma cells (male; Apath LLC, catalog number AP-HUH7) were cultured in DMEM (Thermo Fisher Scientific, 11965092) supplemented with 10% FBS (Thermo Fisher Scientific, 16000044), 1 mM sodium pyruvate (Thermo Fisher Scientific, 11360070) and 100 U ml−1 penicillin–streptomycin (Thermo Fisher Scientific, 15140122) at 37 °C in a 5% CO2 humidified incubator. Cells were seeded in 6-well plates and infected at ~70% confluence with DENV2 (strain 16681) at an MOI of 1 as previously described28.
For infection, virus-containing medium was diluted in serum-free DMEM and added to cells for 1 h at 37 °C with gentle rocking every 15 min. After incubation, the inoculum was removed and cells were washed once with PBS (Thermo Fisher Scientific, 10010023) before the medium was replaced with fresh complete medium. Cells were incubated for 48 h postinfection to allow for robust viral replication. Mock-infected controls were treated identically with virus-free media.
At the endpoint, cells from two replicate wells (approximately 1 × 106 cells per well) were collected by trypsinization (0.25% Trypsin–EDTA; Thermo Fisher Scientific, 25200056), pelleted by centrifugation (300 × g, 5 min) and resuspended in PBS for downstream processing. Cell viability was assessed using the Luna automated cell counter (Logos Biosystems), and cells were diluted to 750–1,000 cells per µl before encapsulation for single-cell TotalX.
Human brain tissue processing
Fetal brain samples (19 gestational weeks) were obtained from Advanced Bioscience Resources and shipped overnight in cold preservation solution. Informed consent for the use of fetal brain tissue was obtained by the tissue provider from the donor’s legal representative, in accordance with applicable federal and state regulations. Samples were provided to the investigators in a fully deidentified form. As no identifiable private information was available to the research team, informed consent was not directly applicable to our study and no institutional review board approval was required under Stanford University guidelines. All subsequent experimental use of these tissues was conducted under preapproved Stem Cell Research Oversight protocols at Stanford University. Intact samples were dissected into anatomical regions by licensed neuropathologists, minced using sterile razor blades and digested in HBSS (Thermo Fisher, 24020117) containing 10 mg ml−1 Liberase (Roche, 5401119001) and 200 µg ml−1 DNase I (Worthington, LS002007) for 40 min at 37 °C with gentle agitation. The digestion was repeated once. Samples were then incubated in Accutase (Innovative Cell Technologies, AT104) supplemented with 200 µg ml−1 DNase I for 15 min at 37 °C. Red blood cells were removed using Histopaque-1077 (Sigma, 10771) by layering the suspension at a 2:1 ratio and centrifuging at 400 × g for 30 min at 25 °C (low acceleration, no brake). The buffy coat was collected and washed in HBSS containing 0.1% polyvinyl alcohol (Sigma, P8136).
Neurosurgical brain tissue samples (from 9-month-old, 8-year-old and 16-year-old individuals) were obtained during epilepsy surgeries at Stanford Hospital with informed consent from the donor’s legal representatives under a protocol approved by the Stanford Institutional Review Board. Samples were provided to the investigators with deidentified personal information, except for the age. Tissue was transferred on ice immediately from the operating room and processed within 1 h of resection. Samples were manually minced and enzymatically dissociated using papain (Worthington Biochemical, LK003176) or Accutase (Innovative Cell Technologies, AT104) supplemented with DNase I (Worthington, LS002007), depending on sample integrity and downstream application. Cell suspensions were filtered through 40-µm strainers (Corning, 352340) and used directly for downstream assays.
Following dissociation, cortical samples showed high levels of myelin debris and CD45+ vascular macrophages. These populations were depleted in two steps. First, magnetic bead-based negative selection was performed using anti-CD45 microbeads (Miltenyi Biotec, 130-045-801), followed by 30 min of incubation at 4 °C and magnetic separation using MS columns (Miltenyi Biotec, 130-096-433). Second, residual myelin was depleted by incubation with Myelin Removal Beads II following the manufacturer’s protocol. Post-depletion, cell suspensions were counted using a Luna automated cell counter (Logos Biosystems) and viability was assessed using LIVE/DEAD viability/cytotoxicity reagent (Thermo Fisher Scientific, L3224). Final cell concentrations were adjusted to 750–1,000 cells per µl before encapsulation.
Single-cell total RNA sequencing with TotalX
Single-cell total RNA sequencing was performed using TotalX, a modified 10x Genomics Chromium 3′ v3.1 workflow incorporating enzymatic strategies adapted from Smart-seq-total10. The protocol enables the simultaneous capture of polyadenylated and non-polyadenylated transcripts through a co-incubation approach integrating 3′ polyadenylation, 5′ capping and reverse transcription (RT) in a single reaction.
RT master mix was prepared by supplementing the standard 10x RT reagents with the addition of the following components: E. coli poly(A) polymerase (New England Biolabs, M0276S; final 1 U µl−1), ATP (10 mM), Vaccinia capping enzyme (NEB, M2080S; 0.1 U µl−1), S-adenosylmethionine (2 mM; NEB, M2080S), guanosine triphosphate (10 mM; NEB, M2080S) and a custom biotinylated template-switching oligonucleotide (5′ biotin-ATGGCUCGGAGAUGUGUAUAAGAGACAGUCUrGrG+G; obtained from Integrated DNA Technologies (IDT)). Reagents were mixed immediately before loading and kept on ice until droplet generation. Cells or nuclei were loaded into the 10x Genomics Chromium controller following the manufacturer’s protocol for 3′ v3.1 chemistry.
cDNA amplification and rRNA depletion
Following reverse transcription, emulsion droplets were broken according to 10x recovery steps. cDNA was then treated with UDG (NEB, M0280S) to remove dUTSO fragments at 37 °C for 50–60 min. This was followed by 5 cycles of amplification using 10x proprietary primers and a custom spike in primer (according to TotalX protocol, Supplementary Note). Pre-amped cDNA (for 5 cycles) was cleaned up using 1.8X SPRI Select (Beckman Coulter, B23317), and cDNA corresponding to rRNA was depleted by Cas9-mediated cleavage using guide RNAs targeting mitochondrial and cytoplasmic ribosomal sequences (DASH). CRISPR ribonucleoprotein complexes were assembled from Alt-R S.p. Cas9 Nuclease V3 (NEB, 1081059), tracrRNA (IDT, 1072532) and a custom pool of 57 crRNAs (IDT) designed based on Smart-seq-total DASH oligos (Supplementary Table 1) and obtained from Integrated DNA Technologies. The reaction was incubated at 37 °C for 60 min, followed by treatment with proteinase K (Thermo Fisher Scientific, EO0491) for 10 min at 56 °C. cDNA was then cleaned up using SPRI Select and amplified for an additional 7–10 cycles using 10x proprietary primers and a custom spike in primer (see Supplementary Note for the detailed TotalX protocol).
Library construction, size selection and sequencing
Amplified cDNA was separated into long- and short-fragment pools using SPRIselect magnetic beads (Beckman Coulter, catalog number B23318). To obtain the long-fragment pool, SPRIselect beads were added to amplified cDNA at a 0.6× bead-to-sample ratio, and bound fragments were retained according to the manufacturer’s instructions. Unbound fragments in the supernatant were further purified at a 1.2× bead ratio (1.8× final) to isolate the short-fragment pool. The long-fragment pool was processed according to the Chromium Single Cell 3′ v3.1 protocol (10x Genomics), using the Single Index Kit T Set A (PN-1000213/PN-2000240). Short fragments were indexed using custom primers through PCR amplification containing 10x Genomics 3′ v3.1 library amplification mix, SI primer and custom index primers (Supplementary Table 1) For each 40 μl reaction, the following components were combined: 20 μl Amp Mix, 4 μl SI primer, 1 μl custom index primer (10 μM), 1 μl amplified cDNA (1 ng μl−1) and 14 μl nuclease-free water. PCR conditions were as follows: 98 °C for 3 min; 12 cycles of 98 °C for 15 s, 62 °C for 20 s and 72 °C for 2 min; and a final extension at 72 °C for 1 min. PCR products were purified with a 1.2X SPRIselect bead cleanup and eluted in 30 μl nuclease-free water. The resulting libraries were either split for further purification to remove very short fragments (below 200–250 bp) and obtain the small RNA fraction or used for optional miRNA enrichment. The miRNA(+) fragments were further size selected from the short-fragment pool by enrichment of fragments corresponding to small RNAs (18–30 bp) using the Pippin Prep gel system (Sage Science, 3% agarose), following the manufacturer’s protocol. Short and long pools were recombined in desired ratios before final library quality control and sequencing. Resulting libraries were sequenced on a NovaSeq X instrument (Illumina) with the following configuration: read 1, 28 bp; index 1, 8 bp; index 2, 0 bp; and read 2, 91 bp. All custom oligonucleotides and guide RNA sequences generated in this study are provided in Supplementary Table 1 and were synthesized by IDT. No unique biological materials were generated. Requests for additional information should be directed to the corresponding author.
Data processing and alignment
Raw sequencing data were demultiplexed using Cell Ranger mkfastq (v8.0.1), and read 2 was trimmed using Cutadapt59 with the following parameters: -u 6 -a ‘AAAAAAAAAA;min_overlap=10’ -m 18. Trimmed reads were then processed using cellranger count with a dual-pass alignment strategy to enable quantification of both long and short RNA species. Two custom reference transcriptomes were constructed from the GRCh38 primary assembly (Ensembl release 109; GRCh38.p13). Gene annotations were sourced from GENCODE v44 (ref. 60) and manually curated to include an expanded set of biotypes.
The primary (long RNA) reference included the following biotypes: protein_coding, lncRNA, miRNA, snRNA, snoRNA, small Cajal body-specific RNA (scaRNA), tRNA, Mt_tRNA, vault_RNA, misc_RNA, antisense, scRNA, and immunoglobulin and T cell receptor genes (for example, IG_*, TR_*). tRNA annotations were appended using curated entries from the high-confidence set in GtRNAdb61. For viral infection studies, the DENV2 genome (GenBank NC_001474; GCF_000871845.1)62 and its corresponding annotation (modified_dengue.gtf) were appended.
The secondary (small RNA) reference was restricted to short RNA biotypes (miRNA, snoRNA, scaRNA) to enhance mapping specificity. tRNAs and viral transcripts were excluded from this reference.
Alignment, barcode correction, UMI collapsing and gene quantification were performed using Cell Ranger’s internal rules, with modified alignment parameters specified in the cellranger-8.0.1/lib/bin/parameters.toml file to improve the capture of short RNA species (≤18 bp). Key modified parameters included the following:
star_parameters = ‘–outFilterMismatchNoverLmax=0.05–outFilterMatchNmin=18–outFilterScoreMinOverLread=0–outFilterMatchNminOverLread=0’
Many small noncoding transcripts—including miRNAs and other short RNAs—were found to be located within introns, exons or overlapping lncRNAs (Extended Data Fig. 1e). In the standard Cell Ranger pipeline, reads overlapping multiple features are not counted, leading to underrepresentation of these transcripts. To address this, a two-step mapping and quantification strategy was used.
Initial barcode calling and quality filtering were performed using the long RNA reference. Barcodes passing these filters were then used to extract matching data from the short RNA alignment. In accordance with the cellranger logic, only reads containing the tag xf:Z:25 in the BAM file were used for UMI counting and generation of the final gene expression matrix. The final gene expression matrix was constructed by substituting miRNA and other short RNA feature counts from the short-reference run into the long-reference matrix based on barcode concordance. Only barcodes confidently detected in the long RNA run were retained for downstream analysis.
Custom Python scripts were developed to merge the count matrices, resolve feature overlaps, ensure UMI count accuracy and annotate transcript biotypes (see GitHub repository63).
Comparison with other single-cell methods
Publicly available HEK293T scRNA-seq datasets generated using the VASA-seq droplet-based protocol were downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GSE176588). For the 10x Genomics Chromium v3.1 dataset, data files were obtained directly from the 10x Genomics dataset page. To enable direct comparison, all datasets were first harmonized by mapping genes to Ensembl gene IDs, and only genes present in all datasets were retained for downstream analyses. Gene symbols were standardized by removing version suffixes (for example, ‘.1’, ‘.2’), and gene detection rates were estimated based on unique, nonambiguous gene symbols within each dataset.
For the TotalX dataset, to minimize the impact of putative doublets, barcodes corresponding to the top 8% of total UMI counts (reflecting the predicted doublet rate) were excluded. For datasets containing more than 1,000 cells, only cells with at least 5,000 detected UMIs were retained; smaller datasets were included in full. To compare gene detection rates and account for differences in sequencing depth between protocols, UMI counts for each cell were randomly downsampled to a fixed target of 20,000 UMIs per cell using a custom script. Cells with fewer than the target UMI count were excluded, except in datasets with fewer than 1,000 cells, in which all cells were kept. Gene detection was assessed as the number of genes with at least one UMI detected per cell following downsampling. For comparisons of mean expression values between protocols, only genes shared across datasets were included.
Comparison with bulk RNA-seq and bulk miRNA-seq
Pseudo-bulk expression profiles were generated from TotalX, TotalX miRNA(+) and VASA-drop HEK293T datasets by summing UMI counts across all cells. These profiles were compared with two bulk total RNA-seq datasets (GSE198050, GSE106476) and two bulk miRNA-seq datasets (GSE261577, PRJNA994505). Genes detected in both datasets being compared were log-transformed (log1p), and Pearson correlations were computed separately for each RNA biotype.
Doublet removal, cell-type annotation and downstream analysis
Doublets were removed using Scrublet64 with thresholds selected to match the expected doublet rate. Integration across brain samples was performed using scVI65 using protein-coding highly variable genes. Dimensionality reduction and clustering were conducted in Scanpy66 using the scVI latent space, followed by principal component analysis and uniform manifold approximation and projection (UMAP)67 for visualization. Differential gene expression analysis was performed using the Wilcoxon rank-sum test with Benjamini–Hochberg false discovery rate (FDR) correction. Noncoding RNA expression was quantified using log-normalized counts. Gene co-expression modules were identified using the Python implementation of weighted gene co-expression network analysis68, with soft thresholding and module detection as previously described.
For pseudotime analyses, developmental trajectories were inferred using PAGA and diffusion pseudotime in Scanpy. For further details, refer to the GitHub page63.
miRNA–mRNA interaction analysis in PBMCs
We identified miRNAs expressed in at least 100 cells and present in ≥2% of at least one PBMC cell type. For each retained miRNA, validated targets (miRTarBase55) and predicted targets (TargetScan53) were retrieved, and only targets expressed in PBMCs (log1p(expression) > 0.1 and detected in >1% of cells) were kept for downstream analysis. Correlations between miRNA and mRNA expression were computed using Pearson correlation across all single cells, and interactions were categorized as anticorrelated, positively correlated or uncorrelated using a significance threshold of |r| > 0.2 and P < 0.05. For visualization, cells were ordered by miRNA expression and binned in groups of 20 cells, and average miRNA and target-gene expression were computed within each bin to generate expression trajectories. All analysis steps and parameters follow the workflow implemented in the accompanying notebook on the GitHub page63.
Quantification of tRNA availability and demand
To estimate cell-type-specific amino acid demand, we first obtained protein-coding sequences from GENCODE v44. For each protein-coding gene, the corresponding protein sequence was extracted, and the frequency of each standard amino acid was tabulated. Within each cell type, gene-level UMI counts were summed across all cells, yielding total expression for each gene per cell type. For genes with a mapped protein sequence, we calculated the total number of each amino acid required by multiplying the gene’s summed expression by the corresponding amino acid counts from the protein sequence. The aggregate for each amino acid per cell type thus reflects the overall translational demand for each amino acid. Amino acid demand matrices were stored as tables with amino acids as rows and cell types as columns. To quantify tRNA availability, we identified all tRNA genes present in the reference transcriptome (filtered for names containing the ‘tRNA’ string). For each tRNA gene, the associated amino acid was inferred from the gene name, using regular expression parsing (for example, extracting the amino acid label preceding ‘_tRNA’). We then summed UMI counts for all tRNA genes by cell type. For each amino acid, total tRNA expression was computed as the sum of all tRNA gene UMIs corresponding to that amino acid within each cell type. This yielded a matrix of tRNA ‘supply’ per amino acid per cell type.
Quantification of DENV2
DENV2 viral RNA abundance was quantified at the single-cell level using two complementary approaches. First, for each cell, the total viral RNA load was computed as the sum of normalized expression values for five DENV2-derived genes: sfRNA1, sfRNA2, sfRNA3, sfRNA4 and POLY. The resulting value (viral_load) was further normalized to the total cellular transcript count to calculate the percentage of viral transcripts per cell (viral_pct).
In parallel, a composite viral load score was generated using the score_genes function in Scanpy (v1.11.0), which computes the average expression of the specified viral genes after subtracting the average expression of a control gene set matched for overall expression distribution. This was performed for both the full viral gene set (viral_load_score, based on sfRNA1–4 and POLY) and for the noncoding flaviviral RNAs alone (sfrna_load, based on sfRNA1–4).
These viral burden metrics—absolute, relative and control adjusted—were used to stratify cells for all downstream analyses, including response clustering and pathway enrichment. Unless otherwise specified, viral_load was used for most analyses.
Gene Ontology and Reactome pathway enrichment analyses of differentially expressed genes were performed using the Python package GSEApy69 and g:Profiler70, respectively, applying a hypergeometric test with FDR correction to identify significantly enriched pathways among top-ranked genes.
Correlation between miRNA and its targets
To assess the relationship between miRNA and target gene expression at single-cell resolution, only cells expressing the miRNA of interest were included in the analysis. For each cell, miRNA expression was quantified, and cells were partitioned into 30 quantile-based bins according to miRNA abundance using pandas.qcut. This approach enabled a robust estimation of monotonic relationships between miRNA and target gene expression across the full dynamic range of miRNA levels observed.
Validated and predicted targets for each miRNA were compiled from TarBase55 and TargetScan 8.0 (ref. 53), respectively, and subsequently filtered to retain only those genes detected in the dataset. For each target gene, the mean expression within each miRNA expression bin was calculated. The Spearman rank correlation coefficient (ρ) between the binned mean miRNA expression and the corresponding target gene expression profile was then computed using the scipy.stats.spearmanr function. Only gene–miRNA pairs with sufficient coverage (at least five bins with non-missing values) were retained for further analysis. An analogous procedure was applied to all non-target genes to generate a background distribution of correlation coefficients, providing a reference for target specificity.
To elucidate the functional relevance of strongly anticorrelated miRNA targets, pathway enrichment analysis was performed. Validated target genes showing significant negative correlation (ρ < 0, adjusted P < 0.05) were subjected to pathway enrichment using the Reactome 2022 human gene set via the Enrichr interface (gseapy v0.10.8). Pathways with an FDR below 0.05 were considered significantly enriched. The top 20 enriched pathways were visualized as a scatterplot of –log10(FDR) versus the pathway name, with the point size proportional to the odds ratio.
Ethics statement
All human materials used in this study were obtained and used in accordance with applicable institutional and regulatory requirements. Fetal brain tissue samples were obtained from Advanced Bioscience Resources (ABR) as fully de-identified biospecimens; informed consent for donation and research use was obtained by the tissue provider from the donor’s legal representative. Under Stanford University guidelines, use of these de-identified samples was classified as not human subjects research and did not require additional Stanford IRB review. This study did not involve human embryonic stem cells, embryo culture or stem cell derivation. Tissue processing was performed under Stanford Stem Cell Research Oversight (SCRO) protocol no. 735. PBMCs were commercially sourced from the Stanford Blood Center donor bank as de-identified donor material collected for research use under the bank’s ethics oversight procedures. Neurosurgical samples were collected under Stanford IRB protocol IRB-12625 with informed consent.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Raw sequencing data generated from HEK293T and Huh7 cell line experiments have been deposited in GEO (accession number GSE315939)71. Raw sequencing data derived from human neurosurgical tissue, fetal samples and PBMCs from living donors are deposited in Synapse72. Processed data are available via Figshare at https://doi.org/10.6084/m9.figshare.29803628 (ref. 73).
Code availability
Custom analysis scripts and instructions for Cell Ranger modifications are available via GitHub63 and Zenodo at https://doi.org/10.5281/zenodo.18177678 (ref. 74).
References
Chen, L.-L. & Kim, V. N. Small and long non-coding RNAs: past, present, and future. Cell 187, 6451–6485 (2024).
Mattick, J. S. et al. Long non-coding RNAs: definitions, functions, challenges and recommendations. Nat. Rev. Mol. Cell Biol. 24, 430–447 (2023).
Tycowski, K. T. et al. Viral noncoding RNAs: more surprises. Genes Dev. 29, 567–584 (2015).
Edgil, D., Polacek, C. & Harris, E. Dengue virus utilizes a novel strategy for translation initiation when cap-dependent translation is inhibited. J. Virol. 80, 2976–2986 (2006).
Matera, A. G., Terns, R. M. & Terns, M. P. Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs. Nat. Rev. Mol. Cell Biol. 8, 209–220 (2007).
Marzluff, W. F., Wagner, E. J. & Duronio, R. J. Metabolism and regulation of canonical histone mRNAs: life without a poly(A) tail. Nat. Rev. Genet. 9, 843–854 (2008).
Melton, C., Judson, R. L. & Blelloch, R. Opposing microRNA families regulate self-renewal in mouse embryonic stem cells. Nature 463, 621–626 (2010).
Wagner, V. et al. Characterizing expression changes in noncoding RNAs during aging and heterochronic parabiosis across mouse tissues. Nat. Biotechnol. 42, 109–118 (2024).
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643.e4 (2017).
Isakova, A., Neff, N. & Quake, S. R. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc. Natl Acad. Sci. USA 118, e2113568118 (2021).
Salmen, F. et al. High-throughput total RNA sequencing in single cells using VASA-seq. Nat. Biotechnol. 40, 1780–1793 (2022).
Hayashi, T. et al. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nat. Commun. 9, 619 (2018).
Niu, Y., Luo, J. & Zong, C. Single-cell total-RNA profiling unveils regulatory hubs of transcription factors. Nat. Commun. 15, 5941 (2024).
Sheng, K., Cao, W., Niu, Y., Deng, Q. & Zong, C. Effective detection of variation in single-cell transcriptomes using MATQ-seq. Nat. Methods 14, 267–270 (2017).
Dinçaslan, F. B., Ngang, S. W. Y., Tan, R. Z. & Cheow, L. F. Automated high-throughput profiling of single-cell total transcriptome with scComplete-seq. Nucleic Acids Res. 53, gkaf699 (2025).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Tian, W. et al. High-throughput functional microRNAs profiling by recombinant AAV-based microRNA sensor arrays. PLoS ONE 7, e29551 (2012).
Herdy, B. et al. Analysis of NRAS RNA G-quadruplex binding proteins reveals DDX3X as a novel interactor of cellular G-quadruplex containing transcripts. Nucleic Acids Res. 46, 11592–11604 (2018).
Smolin, E. A., Buyan, A. I., Lyabin, D. N., Kulakovskiy, I. V. & Eliseeva, I. A. RNA-seq data of ALKBH5 and FTO double knockout HEK293T human cells. Data Brief 42, 108187 (2022).
Lee, S., Lee, S., Desnick, R., Yasuda, M. & Lai, E. C. Noncanonical role of ALAS1 as a heme-independent inhibitor of small RNA-mediated silencing. Science 386, 1427–1434 (2024).
Sekar, V. et al. Detection of transcriptome-wide microRNA–target interactions in single cells with agoTRIBE. Nat. Biotechnol. 42, 1296–1302 (2024).
Zanovello, P., Bronte, V., Rosato, A., Pizzo, P. & Di Virgilio, F. Responses of mouse lymphocytes to extracellular ATP. II. Extracellular ATP causes cell type-dependent lysis and DNA fragmentation. J. Immunol. 145, 1545–1550 (1990).
Guo, Y.-H. et al. Wnt/β-catenin pathway transactivates microRNA-150 that promotes EMT of colorectal cancer cells by suppressing CREB signaling. Oncotarget 7, 42513–42526 (2016).
Tili, E. et al. Resveratrol decreases the levels of miR-155 by upregulating miR-663, a microRNA targeting JunB and JunD. Carcinogenesis 31, 1561–1566 (2010).
Smigielska-Czepiel, K. et al. Dual role of miR-21 in CD4+ T-cells: activation-induced miR-21 supports survival of memory T-cells and regulates CCR7 expression in naive T-cells. PLoS ONE 8, e76217 (2013).
Kuhn, R. J. et al. Structure of dengue virus: implications for flavivirus organization, maturation, and fusion. Cell 108, 717–725 (2002).
Li, L. et al. The flavivirus precursor membrane-envelope protein complex: structure and maturation. Science 319, 1830–1834 (2008).
Zanini, F., Pu, S.-Y., Bekerman, E., Einav, S. & Quake, S. R. Single-cell transcriptional dynamics of flavivirus infection. eLife 7, e32942 (2018).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Peña, J. & Harris, E. Dengue virus modulates the unfolded protein response in a time-dependent manner. J. Biol. Chem. 286, 14226–14236 (2011).
Funk, A. et al. RNA structures required for production of subgenomic flavivirus RNA. J. Virol. 84, 11407–11417 (2010).
Schuessler, A. et al. West Nile virus noncoding subgenomic RNA contributes to viral evasion of the type I interferon-mediated antiviral response. J. Virol. 86, 5708–5718 (2012).
Delgado, F. G., Cárdenas, P. & Castellanos, J. E. Valproic acid downregulates cytokine expression in human macrophages infected with dengue virus. Diseases 6, 59 (2018).
Braun, E. et al. Comprehensive cell atlas of the first-trimester developing human brain. Science 382, eadf1226 (2023).
Heffel, M. G. et al. Temporally distinct 3D multi-omic dynamics in the developing human brain. Nature 635, 481–489 (2024).
Wang, L. et al. Molecular and cellular dynamics of the developing human neocortex. Nature 647, 169–178 (2025).
Li, Y. et al. Spatiotemporal transcriptome atlas reveals the regional specification of the developing human brain. Cell 186, 5892–5909.e22 (2023).
Bielle, F. et al. Multiple origins of Cajal–Retzius cells at the borders of the developing pallium. Nat. Neurosci. 8, 1002–1012 (2005).
Damilou, A. et al. Developmental Cajal–Retzius cell death contributes to the maturation of layer 1 cortical inhibition and somatosensory processing. Nat. Commun. 15, 6501 (2024).
Nowakowski, T. J. et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. Science 358, 1318–1323 (2017).
Franzoni, E. et al. miR-128 regulates neuronal migration, outgrowth and intrinsic excitability via the intellectual disability gene Phf6. eLife 4, e04263 (2015).
Bruno, I. G. et al. Identification of a microRNA that activates gene expression by repressing nonsense-mediated RNA decay. Mol. Cell 42, 500–510 (2011).
Zhang, Y. et al. MiR-130a regulates neurite outgrowth and dendritic spine density by targeting MeCP2. Protein Cell 7, 489–500 (2016).
Liu, D. D. et al. Purification and characterization of human neural stem and progenitor cells. Cell 186, 1179–1194.e15 (2023).
Frotscher, M. Dual role of Cajal–Retzius cells and reelin in cortical development. Cell Tissue Res. 290, 315–322 (1997).
Gil-Sanz, C. et al. Cajal–Retzius cells instruct neuronal migration by coincidence signaling between secreted and contact-dependent guidance cues. Neuron 79, 461–477 (2013).
The Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium Genome-wide association study identifies five new schizophrenia loci. Nat. Genet. 43, 969–976 (2011).
Willemsen, M. H. et al. Chromosome 1p21.3 microdeletions comprising DPYD and MIR137 are associated with intellectual disability. J. Med. Genet. 48, 810–818 (2011).
Mahmoudi, E. & Cairns, M. J. MiR-137: an important player in neural development and neoplastic transformation. Mol. Psychiatry 22, 44–55 (2017).
Nowakowski, T. J. et al. Regulation of cell-type-specific transcriptomes by microRNA networks during human brain development. Nat. Neurosci. 21, 1784–1792 (2018).
Chen, J.-A. et al. Mir-17-3p controls spinal neural progenitor patterning by regulating Olig2/Irx3 cross-repressive loop. Neuron 69, 721–735 (2011).
Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 47, D155–D162 (2019).
McGeary, S. E. et al. The biochemical basis of microRNA targeting efficacy. Science 366, eaav1741 (2019).
Liu, M. et al. miR-137 targets Cdc42 expression, induces cell cycle G1 arrest and inhibits invasion in colorectal cancer cells. Int. J. Cancer 128, 1269–1279 (2011).
Skoufos, G. et al. TarBase-v9.0 extends experimentally supported miRNA–gene interactions to cell-types and virally encoded miRNAs. Nucleic Acids Res. 52, D304–D310 (2024).
Bartel, D. P. Metazoan microRNAs. Cell 173, 20–51 (2018).
Adamson, B. et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell 167, 1867–1882.e21 (2016).
Dixit, A. et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866.e17 (2016).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10 (2011).
Mudge, J. M. et al. GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Res. 53, D966–D975 (2025).
Chan, P. P. & Lowe, T. M. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 44, D184–D189 (2016).
Kinney, R. M. et al. Construction of infectious cDNA clones for dengue 2 virus: strain 16681 and its attenuated vaccine derivative, strain PDK-53. Virology 230, 300–308 (1997).
Isakova, A. Scalable single-cell total RNA sequencing unifies coding and non-coding transcriptomics. Source code. Github https://github.com/alinaisakova/TotalX (2025).
Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291.e9 (2019).
Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://doi.org/10.48550/arXiv.1802.03426 (2020).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008).
Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).
Raudvere, U. et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).
Isakova, A. Scalable single-cell total RNA sequencing unifies coding and non-coding transcriptomics—cell lines raw data. GEO https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE315939 (2025).
Isakova, A. Scalable single-cell total RNA sequencing unifies coding and non-coding transcriptomics—raw data. Synapse https://www.synapse.org/Synapse:syn72382564(2026).
Isakova, A. Scalable single-cell total RNA sequencing unifies coding and non-coding transcriptomics. Processed data. Figshare https://doi.org/10.6084/m9.figshare.29803628 (2025).
Isakova, A. Scalable single-cell total RNA sequencing unifies coding and non-coding transcriptomics. Source code. Zenodo https://doi.org/10.5281/zenodo.18177678 (2026).
Acknowledgements
This study was supported by the Chan Zuckerberg Biohub and the Knight Initiative for Brain Resilience. A.I. was additionally supported by a Doc.Mobility Fellowship from the Swiss National Science Foundation. S.E. is a Chan Zuckerberg Biohub San Francisco Investigator who is also supported by the National Institute of Allergy and Infectious Diseases grant RO1AI158569, Defense Threat Reduction Fundamental Research to Counter Weapons of Mass Destruction grant HDTRA11810039 and Investigator Initiated Awards W81XWH1910235 and W81XWH2210283 as well as Expansion Award W81XWH2110456 from the Department of Defense office of the Congressionally Directed Medical Research Programs/Peer Reviewed Medical Research Program. Finally, we are deeply grateful to the donors and their families for their invaluable contributions to this research. Human brain tissue used in this study was obtained through the coordinated efforts of the brain research neurosurgery teams at Stanford University. We thank these extended teams for their expertise, dedication and care in tissue acquisition.
Author information
Authors and Affiliations
Contributions
A.I., S.R.Q. and I.L.W. conceptualized the study. A.I. designed the study. A.I. and I.C. performed HEK293T and PBMC TotalX experiments. S.S. performed Huh7 and Huh7–DENV2 cell culture. A.I., D.D.L., A.E.E. and R.S. performed brain tissue processing. A.I. performed TotalX on the brain samples. A.M.D. and N.N. sequenced TotalX samples. A.I. designed the computational pipeline and performed data analysis. A.I. wrote the paper with direct input from D.D.L., S.S., S.E. and S.R.Q. All authors reviewed and approved the final paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Dominic Gruen, Kun Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 TotalX experimental workflow and analysis pipeline.
a. Schematic overview of the TotalX workflow. Total RNA is enzymatically polyadenylated and reverse transcribed using a custom template-switching oligo (TSO). The TSO is then selectively removed, followed by Cas9-based rRNA depletion (DASH) at the cDNA level. Resulting cDNA fragments are size-separated, with short (< 400 bp) and long (> 400 bp) fractions indexed independently and pooled for sequencing. An optional gel-purified miRNA fraction (~ 18–50 bp) enables targeted enrichment of mature small RNAs. b. Structure of the short-fragment indexed library. The short-fragment indexed libraries can optionally undergo Pippin gel selection system to obtain miRNA-enriched libraries. c. Workflow for combining short RNA, miRNA-enriched and standard cDNA fractions. The resulting composite library captures a broad range of transcript sizes and biotypes. d. Cell Ranger-based data analysis pipeline of TotalX. e. Examples of gene coverage profiles for representative miRNAs located within the lncRNA locus (top), intron of a protein coding gene (middle) and exon of a protein coding gene (bottom). Bam file used for the coverage plots was generated with the modified Cell Ranger pipeline. f. Number of cells per dataset for 10x Genomics, VASA-seq, and TotalX libraries. g. Relationship between UMIs per cell and number of genes detected per cell across datasets and library configurations. h. Per-cell UMI comparison between TotalX and VASA-drop across RNA biotypes. i. Relationship between per-gene mean expression and CV² for 10×3′, VASA-drop, and TotalX. All methods show a Poisson-like regime at low mean (slope ≈ −1) and CV² saturation at higher mean. j. Negative-binomial overdispersion (ϕ) across RNA biotypes for 10×3′, VASA-drop and TotalX. Non-coding RNAs show higher ϕ than protein-coding genes in both methods, but TotalX and VASA-drop display similar overdispersion distributions, indicating comparable technical noise.
Extended Data Fig. 2 Performance benchmarking of TotalX libraries by gene complexity, biotype diversity, and throughput.
a. Schematic overview of library preparation workflows, including 10x-CustomTSO (10x-dUTSO), TotalX, and TotalX with or without DASH treatment. Three representative configurations are illustrated: TotalX with miRNA enrichment and two rounds of DASH (miRNA(+)/DASH(++)), TotalX with miRNA enrichment but no DASH (miRNA(+)/DASH(–)), and standard TotalX without additional modifications. b. Summary of sequencing library construction and pooling strategies, for each tested library configuration. c. Percentage of reads mapped to the genome and confidently mapped to the transcriptome across different library types. d. Fraction of uniquely mapped annotated reads per cell, stratified by RNA biotype. e. Contribution of non-coding RNA biotypes (for example, snRNA, tRNA, snoRNA, miRNA, miscRNA, lncRNA) to uniquely mapped reads per cell across methods. f. miRNA UMI fraction across TotalX protocol variants and VASA-seq. Mean and median values are shown as a dot and a line respectively. g. Correlation of pseudo-bulk profiles from TotalX, TotalX miRNA(+), and VASA-drop with two bulk total RNA-seq datasets from HEK293T cells (GSE198050, GSE106476), shown separately as scatter plots and summary statistics. h. Correlation of pseudo-bulk miRNA profiles from TotalX, TotalX miRNA(+), and VASA-drop with two bulk miRNA-seq datasets (GSE261577, PRJNA994505).
Extended Data Fig. 3 Comparative profiling of PBMC subtypes across protocols and sensitivity of monocytes to enzymatic polyadenylation.
a. Dot plot showing expression of canonical marker genes across annotated immune cell types, including major T cell subsets, monocytes, NK cells, dendritic cells, basophils, plasmablasts, and platelets. Marker genes were used for manual annotation of cell clusters in TotalX, 10x-dUTSO, and TotalX+miRNA(+) protocols. b. UMAP of PBMCs colored by protocol, demonstrating that TotalX and TotalX + retain consistent clustering and recover all expected immune populations observed in the 10x-dUTSO control. c. Left: Bubble plot of relative frequencies of immune cell types across protocols. Right: Absolute numbers of cells recovered by TotalX, TotalX pA + + and 10x-dUTSO. TotalX recovers comparable or greater numbers of low-RNA-content populations (for example, platelets, basophils, proerythroblasts). d. Violin plots showing the number of genes (left) and UMIs (right) detected per cell type. Low-content cell types such as platelets and basophils show improved gene detection under TotalX, while other major cell types are comparable across protocols. e. UMAP focused on monocyte populations reveals a distinct subcluster of CD14+ monocytes in TotalX + showing transcriptional divergence, indicative of sensitivity to protocol conditions. f. Dot plot showing selective upregulation of JARID2, GAB2, and APOBEC3A, along with downregulation of canonical monocyte markers (for example, LYZ, CD14), in CD14+ monocytes exposed to increasing concentrations of ATP and polyA polymerase. These cells also show enrichment for apoptosis and chromatin remodeling genes. g. Violin plots showing global decrease in gene detection and increased mitochondrial RNA content in the sensitive monocyte subpopulation, consistent with stress or partial identity loss under certain enzymatic conditions.
Extended Data Fig. 4 Detection of coding and non-coding genes in PBMCs.
a. Pseudobulk comparison of protein-coding gene expression between TotalX and 10x-dUTSO across PBMC cell types. Scatterplots show pseudobulk expression levels of protein-coding genes for each recovered PBMC cell type. Genes are colored by highly variable status, and transcripts with the largest differences between methods are labeled. b. Number of genes detected per RNA biotype (minimum expression in ≥3 cells) in the TotalX PBMC dataset. Biotypes include protein-coding genes, lncRNAs, tRNAs, snoRNAs, snRNAs, histone RNAs, miRNAs, and miscellaneous RNAs, reflecting TotalX’s capacity to capture broad transcript classes. c. Cell-type specific differentially expressed genes (DEGs) stratified by RNA biotype. Bars indicate the number of DEGs (FDR < 0.01) for each cell type and RNA class. While protein-coding genes remain dominant, multiple non-coding biotypes exhibit strong cell-type–specific expression – for example, lncRNAs in monocytes, miRNAs in lymphocytes, and snRNAs in plasmacytoid dendritic cells (pDCs). d. Relationship between amino acid demand (based on codon usage of expressed genes) and tRNA supply (measured by TotalX) across immune cell types. The strong correlation (Pearson r = 0.79, p = 1.59e–66) confirms internal consistency and biological relevance of captured tRNA profiles. Selected amino acids such as Arginine and Glycine show relative oversupply, whereas Tryptophan and Phenylalanine are undersupplied.
Extended Data Fig. 5 Cell-type specificity and co-expression patterns of non-coding RNAs in PBMCs.
a. Fraction of non-coding genes in each WGCNA module. Several modules (for example, 1, 2, 6, 7) include substantial non-coding content, suggesting that cell identity and function are encoded in mixed coding/non-coding programs. Total gene counts per module are indicated above each bar. b. Eigengene expression of co-expression modules across PBMC cell types, as identified by WGCNA. Blue modules are enriched in protein-coding content, whereas red modules include higher fractions of non-coding RNAs. Module 24 is enriched in CD14+ monocytes and T cells, while Module 7 is platelet-specific. c. Sankey diagram summarizing the filtering steps used to identify expressed miRNAs, their expressed target genes, and all predicted or validated miRNA-mRNA interactions in PBMCs. Interactions are further classified by correlation sign (anti-correlated, positively correlated, or uncorrelated) based on Person correlation across cells (| r | > 0.2, p < 0.05). d. For a subset of miRNAs, miRNA expression and the expression of their top anti-correlated predicted or validated target genes are plotted across cells after binning the dataset into groups of 20 cells. Each panel reports the number of predicted and validated targets.
Extended Data Fig. 6 Sensitivity of TotalX for viral transcript detection and host response stratification in DENV2-infected cells.
a. Quality control metrics comparing uninfected (Huh7) and DENV2-infected (Huh7-DENV2) samples. Violin plots show total UMI counts, number of detected genes, and mitochondrial transcript percentage per cell. b. Histogram showing the distribution of viral load per cell, defined as dengue virus–derived UMIs per 10,000 total UMIs. c. Scatter plots showing normalized expression of individual viral transcripts (POLY, sfRNA1–4) versus viral load per cell. Expression of both structural (POLY) and non-coding (sfRNAs) viral RNAs scales with total viral burden. Cells are colored by transcriptional state: quiescent response (blue) and active response (red). d. Volcano plot comparing gene expression between active and silent infection states. Several transcription factors (for example, ATF3, DDIT3) and immune response genes are upregulated in active response, while chromatin remodeling factors and proliferation-associated genes are enriched in the silent state. Gene names are colored as follows: coding in gray, non-coding in green and viral in blue.
Extended Data Fig. 7 Viral load–based stratification of host response in DENV2-infected cells.
a. Ridge plot showing the distribution of viral RNA levels across transcriptional clusters. Cells are grouped into two major infection states—quiescent infection (blue) and active infection (red)—based on host transcriptomic profiles. Although overall viral load can be similar between states, active cells display a robust host response, while quiescent cells remain largely unresponsive. b. Viral load across infection groups, colored by cell cycle phase (G1, S, G2/M). c. UMAP embeddings of infected cells colored by cell cycle phase (top) and infection group (bottom): quiescent_lowVL, quiescent_highVL, active_lowVL, active_highVL. d. Gene expression across four states. Cells were stratified into quiescent/active × low/high viral load groups. Dot plot shows expression patterns of differentially expressed genes. e. GO enrichment analysis for genes differentially expressed between high and low viral load cells within each infection state. Top: In quiescent response, high-load cells are enriched for chromatin remodeling, RNA splicing, and nucleosome-related processes. Bottom: In active response, high-load cells show enrichment for unfolded protein response (UPR), ER stress, and protein targeting to membranes. f. Violin plots showing module scores for key biological processes across the four infection groups. Enrichment significance was assessed using a one-sided over-representation test (hypergeometric test), with P values adjusted for multiple comparisons using the Benjamini–Hochberg FDR. Significance is indicated as ***FDR < 0.001, **FDR < 0.01, and *FDR < 0.05.
Extended Data Fig. 8 Regional coverage, cell-type diversity, and age-dependent non-coding RNA expression in the developing human brain.
a. UMAP embedding of 301,515 single cells profiled using TotalX, colored by brain region of origin. The dataset includes cortical and subcortical structures, non-parenchymal compartments (for example, meninges, choroid plexus), and ocular-associated tissues (for example, optic region, eyelid), reflecting broad anatomical sampling across developmental stages. b. Number of cells recovered from each brain region across four developmental timepoints: GW19, 9MO, 8YO, and 16YO. Bars are color-coded by region and capture both prenatal and postnatal contributions to the dataset. c. Dot plot showing marker gene expression across annotated cell types. Cell types span neuronal, glial, immune, and non-neural lineages and are hierarchically clustered by expression patterns. d. Cell-type–resolved gene detection across development. Left: Dot plot showing the number of cells per annotated cell type at each age. Center: Violin plots depicting the number of detected genes per cell. Right: Bar plots showing the number of differentially expressed non-coding RNAs per cell type and RNA biotype (Wilcoxon test, log2 fold change > 1, FDR < 0.01). Histone RNAs are enriched in proliferating progenitors, while other biotypes vary by lineage and developmental stage. e. Selected cell-type–specific non-coding RNA markers. Dot plots display expression of non-coding RNAs across brain cell types, grouped by broad cell classes. Same as in Fig. 4d.
Extended Data Fig. 9 Regional and molecular diversity of glial and neuronal subtypes in the developing human brain.
a. UMAP embedding of excitatory and inhibitory neuron subtypes, colored by cell type. Major cell types include Cajal-Retzius cells, postnatal neurons, deep and upper-layer excitatory neurons (EN), interneurons from MGE, CGE, and LGE origins (for example, IN-MGE-SST, IN-CGE-VIP, IN-LGE-FOXP1), and radial glia (RG), reflecting broad neuronal lineage heterogeneity. b. Bar plot showing regional contributions to each neuronal subtype. Bars represent the fraction of cells per region for each annotated neuronal cell type. c. UMAP embedding of glial subtypes, including astrocytes, oligodendrocytes, OPCs, OPC-proliferating cells, and RG-like progenitors. d. Bar plot showing regional contributions to each glial cell type. As in (b), bars indicate the fraction of cells derived from each brain region. e. Diffusion map of the glial lineage, showing trajectories from RG to mature OPCs, astrocytes, and oligodendrocytes. Top: Expression gradients of canonical markers (for example, AQP4, PDGFRA) across the lineage. Bottom: Pseudotime-colored embedding with annotations of cell type transitions. f. Heatmaps showing expression of selected coding and non-coding RNA markers across astrocytes, OPCs, and oligodendrocytes that change along the pseudotime trajectory. Rows represent genes; columns represent cells ordered by pseudotime (100 bins). Color bars denote pseudotime, consistent with panel (e).
Extended Data Fig. 10 Cell-type–specific expression and predicted target relationships for non-coding RNAs in developing neurons.
a. Dot plot showing the cell-type–specific expression of selected miRNAs across neuronal cell types. b. Dot plot showing the expression of selected mRNAs marker genes across neuronal subtypes, using the same format as in (a). c-e. Genomic loci, correlation distributions, and pathway enrichments for representative miRNAs with strong cell-type–specific expression and anticorrelated targets. Same as in Fig. 5. c. MIR125B2. Left: Genomic coordinates and coverage profile. Center: Distribution of Spearman correlation coefficients between MIR125B2 expression and predicted or validated target gene expression across cells. Right: Reactome pathway enrichment analysis of anticorrelated validated targets (odds ratio is represented by dot size). d. MIR1262; panels follow the same format as in c. e. MIR17; panels follow the same format as in c.
Supplementary information
Supplementary Information (download PDF )
Supplementary Note.
Supplementary Table 1 (download XLSX )
Custom oligonucleotides used in this study.
Supplementary Table 2 (download CSV )
Differentially expressed genes (DEGs) identified in the PBMC dataset (adjusted P < 0.05).
Supplementary Table 3 (download TXT )
Gene modules identified in the PBMC dataset, comprising both coding and noncoding genes.
Supplementary Table 4 (download CSV )
Differentially expressed genes in Huh7–DENV2: quiescent_highVL versus quiescent_lowVL.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Isakova, A., Liu, D.D., Cvijović, I. et al. Scalable single-cell total RNA sequencing unifies coding and noncoding transcriptomics. Nat Biotechnol (2026). https://doi.org/10.1038/s41587-026-03068-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41587-026-03068-6







