Main

Non-polyadenylated RNAs comprise a substantial and functionally diverse portion of the transcriptome. These include microRNAs (miRNAs), long noncoding RNAs (lncRNAs), small nucleolar and small nuclear RNAs (snoRNAs, snRNAs), circular RNAs (circRNAs), transfer RNAs (tRNAs), histone mRNAs1,2 and many viral transcripts that naturally lack polyA tails3,4. Together, they have essential roles in transcriptional regulation, RNA processing, translation, stress response and cell fate transitions5,6,7,8. Yet, these transcripts remain largely invisible to conventional single-cell RNA sequencing (scRNA-seq) platforms, which rely on polyA capture and are consequently biased toward protein-coding mRNAs9.

While specialized protocols such as Smart-seq-total10, VASA-seq11, RamDA-seq12, snapTotal-seq13, MATQ-seq14 and scComplete-seq15 have expanded single-cell profiling to include some species of non-polyadenylated RNAs, these methods often require custom equipment, custom enzymes, extensive sample processing or bespoke computational pipelines—factors that limit their scalability and integration into high-throughput and widely adopted workflows. Moreover, most of these approaches either underperform or entirely fail to capture mature miRNAs, which have critical roles in development, immunity and disease. Therefore, none of these methods have been adopted to generate data at the atlas scale, leaving them largely confined to proof-of-principle or small-scale studies. As a result, large portions of RNA biology are systematically excluded from most single-cell studies, leaving blind spots in efforts to map regulatory networks, cell states and disease mechanisms that involve noncoding transcripts.

Here we present a generalizable and scalable framework for total RNA profiling in single cells that is fully compatible with commercial high-throughput microfluidic single-cell analysis systems16. Our approach captures both short and long polyadenylated and non-polyadenylated RNAs using a minimal set of biochemical and computational modifications, enabling broad adoption without sacrificing throughput, accessibility or interoperability with existing pipelines and datasets.

We applied this framework to over 500,000 single cells spanning diverse biological systems. In peripheral blood mononuclear cells (PBMCs), we detected robust cell-type-specific expression of noncoding RNAs, including miRNAs, tRNAs and lncRNAs, and revealed the regulatory architecture of coding and noncoding co-expression modules. In cells infected with a non-polyadenylated RNA virus, we captured both host and viral transcripts, revealing antiviral programs otherwise missed by polyA-based protocols. In the developing human brain, we uncovered hundreds of dynamically regulated noncoding RNAs that trace developmental transitions and lineage specification.

Altogether, our work establishes a robust, accessible and scalable strategy for total RNA profiling at single-cell resolution. The comprehensive datasets presented in this study reveal regulatory architectures otherwise undetected by conventional approaches.

Results

TotalX streamlines high-throughput detection of noncoding RNA in single cells

To enable the robust detection of noncoding RNA species in a scalable single-cell framework, we adapted the principles of Smart-seq-total to a droplet-based 3′ chemistry, resulting in a method we term TotalX (Fig. 1a). TotalX uses a custom template-switching oligo (dUTSO) and uracil-DNA glycosylase (UDG) digestion after reverse transcription, as established in Smart-seq-total10. It further integrates Cas9-mediated ribosomal RNA (rRNA) depletion at the pre-amplified cDNA stage (depletion of abundant sequences by hybridization (DASH)) (Extended Data Fig. 1a–c), while preserving full compatibility with standard microfluidics hardware and software (Extended Data Fig. 1d,e and Supplementary Note). In addition, we use custom oligos to index small fragments through direct amplification (Supplementary Table 1 and Methods).

Fig. 1: TotalX enables scalable detection of coding and noncoding RNAs, including miRNAs, in single cells.
Fig. 1: TotalX enables scalable detection of coding and noncoding RNAs, including miRNAs, in single cells.
Full size image

a, Schematic overview of the TotalX protocol. Total RNA is polyadenylated and reverse transcribed using a custom template-switching oligo (dUTSO). After reverse transcription, the TSO is digested with UDG and rRNA is depleted at the pre-amplified (Pre-amp) cDNA level using Cas9-based DASH. Long (>400 bp) and short (<400 bp) fragments are indexed separately, with optional inclusion of a gel-purified miRNA fraction (~18–50 bp) and pooled for sequencing. b, Gene detection efficiency across technologies. Comparison of the average number of genes per cell as a function of UMIs for TotalX (green), VASA-seq (orange) and 10x Genomics 3′ chemistry (blue), across binned depths. c,Unique genes detected per cell after UMI downsampling. Gene detection following normalization to 20,000 UMIs per cell. TotalX yields high gene complexity similar to VASA-seq and higher than the standard 10x Genomics Chromium 3′ workflow. The central line indicates the median, the box denotes the interquartile range (25th–75th percentiles) and the whiskers extend to the minimum and maximum values. The width of each violin represents the kernel density of the data. Sample sizes are indicated above each plot. d, Total number of unique genes detected per RNA biotype. Radial plots show numbers of unique genes detected in a representative experiment for each method, broken down by RNA biotype: protein-coding RNA, lncRNA, miscellaneous RNA (miscRNA), miRNA, snoRNA, snRNA, tRNA and histone RNA. Ratios represent the proportion of detected genes relative to the total number of annotated genes within each biotype. Only genes detected in 10 or more cells were counted. e, Improved detection of miRNAs using mixed library input. The scatterplot shows the average counts per million (CPM) per cell of TotalX alone (x-axis) versus TotalX with an added miRNA fraction (TotalX miRNA(+)) (y-axis) in HEK293T cells. Known HEK293T-specific miRNAs (red) reach expression levels comparable to those of low- and moderate-protein-coding genes (blue). Inset: proportion of reads mapping to the genome, indicating a trade-off with miRNA inclusion. f, Coverage profiles for selected miRNAs. Read depth plots for MIR17, MIR222 and MIR221 showing mature miRNA arms (gray regions). Schematic in a created in BioRender; Isakova, A. https://biorender.com/vsxwvmu (2026).

We benchmarked TotalX against VASA-seq (VASA-drop version), a high-performing noncoding RNA method that requires specialized custom equipment11, and the standard 10x Genomics 3′ platform. TotalX achieves a comparable number of genes per unique molecular identifier (UMI) to VASA-seq, with a similar slope of gene recovery as a function of sequencing depth (Fig. 1b). After normalizing to 20,000 UMIs per cell, TotalX detects a comparable number of genes per cell to VASA-seq, and reaches the scale typically achievable on the commercial 10x Genomics platform by profiling over 11,000 cells in a single run (Fig. 1c and Extended Data Fig. 1f–h). At this experiment scale, TotalX captured a broad diversity of noncoding RNAs, including lncRNAs, miRNAs, snoRNAs, snRNAs, tRNAs and histone RNAs, showing its ability to profile diverse RNA biotypes in one experiment (Fig. 1d). We also assessed technical noise across RNA biotypes and found that TotalX shows noise characteristics comparable to those of VASA-seq, with noncoding RNAs showing the expected higher variability owing to their lower abundance (Extended Data Fig. 1i,j).

To deepen detection of short RNA fragments, we performed size selection for miRNA-enriched cDNA (~18–50 bp) and mixed this with longer fragments during library preparation (Extended Data Fig. 1b,c). All RNAs are tagged using UMI incorporation, and thus, the final libraries, even enriched, are demultiplexed to count unique RNA molecules. This optional ‘miRNA(+)’ strategy increased detection of endogenous HEK293T-enriched miRNAs17 (Fig. 1e,f). The miRNAs were detected at expression levels comparable to those of low-to-moderately expressed protein-coding genes, indicating strong capture sensitivity. We observed a trade-off; however, the inclusion of size-selected short RNA fragments led to an ~30% decrease of reads confidently mapped to the genome and transcriptome (Fig. 1e), as shorter reads are more likely to map ambiguously or represent adapters with no insert (Extended Data Fig. 2a–c). Nevertheless, TotalX miRNA(+) still retained strong detection of both long and short noncoding RNA, outperforming the TotalX baseline version in biotype diversity (Extended Data Fig. 2d,e). Because DASH represents an additional rRNA-directed cleanup step, we compared TotalX implementations without as well as with two rounds of DASH in miRNA(+) configurations (Extended Data Fig. 2a–e). These analyses showed that DASH had only a modest effect on the miRNA(+) version of the protocol, producing a slightly broader distribution of per-cell miRNA fractions without altering the mean (Extended Data Fig. 2f).

To benchmark against bulk references, we compared pseudo-bulk profiles from TotalX, TotalX miRNA(+), and VASA-seq to bulk total RNA-seq18,19 and bulk miRNA-seq datasets20,21 from HEK293T cells (Extended Data Fig. 2g,h). TotalX showed strong concordance with bulk for protein-coding and lncRNA transcripts, comparable to VASA-seq. For miRNAs, the miRNA(+) modification improved agreement with bulk miRNA-seq, supporting the utility of this enrichment step.

Read coverage profiles for individual miRNA genes, such as MIR17, MIR221 and MIR222, revealed precise detection of mature miRNA arms (5p or 3p) by TotalX (Fig. 1f). These profiles confirm that TotalX not only captures miRNA presence, but also preserves their biologically relevant processing patterns, which could be informative for downstream functional analysis and interpretation.

Cell-type-resolved profiling of noncoding RNAs in human PBMCs

We next applied TotalX to human PBMCs to evaluate its ability to profile coding and noncoding RNAs across diverse immune cell types (Fig. 2a). Using protein-coding genes for initial clustering and annotation (Extended Data Fig. 3a and Methods), we identified major expected populations including T and B lymphocytes, monocytes, natural killer (NK) cells, dendritic cells, plasmablasts and proerythroblasts.

Fig. 2: TotalX enables cell-type-specific profiling of noncoding RNAs in human PBMCs.
Fig. 2: TotalX enables cell-type-specific profiling of noncoding RNAs in human PBMCs.
Full size image

a, Schematic comparison of protocols applied to PBMCs. TotalX incorporates enzymatic polyadenylation followed by reverse transcription using a custom TSO (10x-dUTSO). A non-polyadenylated 10x-dUTSO protocol is used for comparison. b, UMAP projection of TotalX-profiled PBMCs. Cell types were identified using protein-coding gene expression and canonical markers. Mono, monocytes. Two monocyte states are shown: regular and a transitional state (^t), each outlined by dashed-line contours. c, Cell type frequencies across protocols. The bar plot and bubble chart compare the number and relative proportion of recovered cell types between TotalX and 10x-dUTSO datasets. The bubble size reflects the fraction of each cell type within each dataset. The bar plot shows the number of cells detected in both datasets. d, Cell-type-specific expression of noncoding RNAs. Dot plots show representative lncRNA, histone RNA, miRNA, scaRNA and snoRNA markers across annotated immune cell types. e, Relationship between tRNA supply and amino acid demand. The scatterplot shows global correlation across cell types (Pearson r = 0.79, P = 1.59 × 10−66), assessed using a two-sided Pearson correlation test. Individual amino acids are highlighted; arginine and glycine show more supply while tryptophan and phenylalanine show undersupply compared with other amino acids. f, T cell-enriched gene co-expression module (module 24). Top left: GO enrichment terms for protein-coding genes in the module, assessed using a one-sided hypergeometric test with P values adjusted for multiple comparisons using the Benjamini–Hochberg FDR < 0.05. Bottom left: biotype composition of the module; bottom right: module expression across cell types. The module includes MIR150, SNORA26 and CD28. g, Platelet-specific gene co-expression module (module 7). Top: GO enrichment reveals association with platelet activation and megakaryocyte development, assessed using a one-sided hypergeometric test with Benjamini–Hochberg FDR correction (FDR < 0.05). The module is composed largely of lncRNAs (for example, SMANTIS, SMILR, LINC01750) co-expressed with GP9 and ITGA2B. Schematic in a created in BioRender; Isakova, A. https://biorender.com/fljv382 (2026).

Compared with a version of the same protocol lacking the polyadenylation step (10x-dUTSO), TotalX recovered a similar diversity of immune cell types (Fig. 2b) and allowed for more consistent detection of low-RNA-content cell types, including platelets and basophils, although we do not claim comprehensive detection of rare populations (Fig. 2c). We observed that monocytes were sensitive to protocol conditions, especially the presence of ATP used during the enzymatic polyadenylation step. In agreement with previous reports22, the presence of ATP led to a transcriptional transition state in monocytes characterized by downregulation of LYZ and upregulation of JARID2 and GAB2, accompanied by a global reduction in gene expression and a partial loss of canonical identity (Extended Data Fig. 3b–g). Beyond this monocyte-specific effect, pseudobulk comparisons of protein-coding gene expression showed high concordance between TotalX and 10x-dUTSO across other PBMC cell types, with only a small set of genes enriched for known rapidly turned-over transcripts (for example, DUSP1, JUNB, JUND, ZFP36) showing higher abundance in TotalX (Extended Data Fig. 4a).

TotalX enabled detection of noncoding RNAs that are consistently expressed in defined immune cell populations, supporting their role in maintaining cell identity. Wilcoxon rank-sum testing identified hundreds of differentially expressed noncoding RNAs (log2(fold change (FC)) > 1, adjusted P value < 0.05) across cell types (Extended Data Fig. 4b,c and Supplementary Table 2). For example, MIR650 was enriched in plasmablasts, MIR147B in conventional dendritic cells, type 2 (cDC2s) and MIR150 in T cells, proerythroblasts and naive B cells. Several lncRNAs also showed cell-type enrichment, including LINC00299 in NK and natural killer T (NKT) cells, and PELATON in monocytes. We additionally detected structured noncoding RNAs such as SCARNA6 (plasmablasts) and SNORD13 (plasmacytoid dendritic cells (pDCs)) (Fig. 2d), suggesting broad recovery of functionally relevant small RNAs.

Because TotalX captures tRNAs, we next asked whether there is alignment between tRNA availability and amino acid demand based on codon usage (Methods). Across cell types, we observed a strong correlation (Pearson r = 0.79, P = 1.59 × 10−66). However, individual amino acids such as arginine and glycine showed relatively higher tRNA supply compared with others, while tryptophan and phenylalanine were more limited (Fig. 2e and Extended Data Fig. 4d). These modest deviations indicate that while codon usage strongly predicts tRNA availability, there may be additional influences shaping tRNA pools.

To investigate how noncoding RNAs are co-regulated with protein-coding genes, we performed weighted gene co-expression network analysis (Methods) across all TotalX-profiled PBMCs. We identified over 30 gene modules comprising both coding and noncoding transcripts (Extended Data Fig. 5a,b and Supplementary Table 3). For example, module 24, enriched in T cells and CD14+ monocytes, included MIR150, SNORA26 and CD28 and was associated with gene ontology (GO) terms related to lymphocyte activation (Fig. 2f). Module 7, a platelet-specific module, included lncRNAs such as SMANTIS, SMILR and LINC01750, co-expressed with canonical platelet transcripts GP9 and ITGA2B, and enriched for terms related to platelet activation and megakaryocyte development (Fig. 2g).

To further assess whether TotalX captures expected miRNA–mRNA regulatory relationships, we analyzed correlations between detected miRNAs and their validated or predicted targets in PBMCs (42 miRNAs with expressed targets; Extended Data Fig. 5c and Methods). We observed the prevalence of anticorrelated miRNA–mRNA pairs compared with positively correlated, including many previously reported interactions, such as MIR150/EP300 (ref. 23), MIR663A/JUNB and JUND (ref. 24), MIR21/CCR7 (ref. 25) and others (Extended Data Fig. 5d) consistent with canonical miRNA-mediated repression. This analysis shows that TotalX captures biologically meaningful miRNA–target relationships.

These results highlight that noncoding RNAs are not only detectable but also systematically co-expressed with functional coding genes in a cell-type-specific manner, suggesting their potential roles in immune cell identity and function.

Co-detection of non-polyadenylated viral transcripts and host transcriptome in DENV2-infected cells

We next evaluated whether TotalX enables simultaneous detection of non-polyadenylated viral RNAs and host transcripts within individual cells. As a test case, we profiled human Huh7 cells infected with dengue virus serotype 2 (DENV2)—a non-polyadenylated flavivirus whose RNA genome encodes a single polyprotein composed of both structural and nonstructural proteins (C-prM/M-E-NS1-NS5) and a family of structured, noncoding RNAs (sfRNA1–4) that originate from the 3′ untranslated region (UTR)26,27 (Fig. 3a and Extended Data Fig. 6a).

Fig. 3: Distinct infection states and host responses in DENV2-infected cells.
Fig. 3: Distinct infection states and host responses in DENV2-infected cells.
Full size image

a, Overview of the DENV2 genome structure and infection design. The positive-strand RNA genome encodes a polyprotein comprising structural (C-prM/M-E) and nonstructural proteins (NS1–NS5), flanked by 5′ and 3′ UTRs. The 3′ UTR gives rise to four highly structured, noncoding subgenomic RNAs (sfRNA1–4). Huh7 cells were infected with DENV2 and profiled using TotalX. b, Host gene expression correlation with viral load. Heatmap of coding and noncoding host genes (rows) sorted by increasing viral load across infected cells (columns). POLY and sfRNA levels contribute to a composite viral load score. c, UMAP projection of DENV2-infected and uninfected cells. Clustering is based on host protein-coding gene expression. d, Abundance of sfRNA1 and POLY in infected cells. Feature plots confirm detection of viral noncoding and coding transcripts at the single-cell level. e, Infection states resolved by unsupervised clustering of single-cell host transcriptomes. Infected cells separate into two transcriptionally distinct clusters: active and quiescent host response. f, TF enrichment analysis. Gene set enrichment analysis shows activation of ATF4 and DDIT3, among others, in infected cells with active host response. g, Examples of gene expression versus viral load. POLY and sfRNA1 increase monotonically with load, while host stress response and interferon genes diverge between active and quiescent states. Normalized expression values are scaled between 0 and 1 for each gene. Dots represent single cells; lines indicate LOWESS-smoothed trends for each cluster. Colors denote distinct clusters as indicated in the legend. h, Viral load distribution across states. Quiescent and active infection groups both contain cells with low and high viral load, indicating that viral load alone does not determine host response. i, Enrichment of chromatin and RNA regulation pathways. Reactome GO enrichment showing histone deacetylation and acetylation signatures in the quiescent response, high viral load (quiescent_highVL) state. Enrichment significance was assessed using a one-sided over-representation test (hypergeometric test), with P values adjusted for multiple comparisons using the Benjamini–Hochberg FDR. Significance is indicated with asterisks: ***, FDR < 0.001; NS, non-significant. j, Module scores for infection-associated programs. Violin plots show activation of UPR, ER stress, granule formation and MAPK signaling across groups. UPR and ER stress increase in quiescent cells with high viral load (quiescent_highVL), but MAPK signaling remains suppressed. Schematic in a created in BioRender; Isakova, A. https://biorender.com/hbpc3kw (2026).

We applied TotalX to infected and mock-treated Huh7 cells and successfully recovered viral transcripts including both protein-coding regions (POLY) and structured noncoding RNAs (sfRNAs) in the infected cells. The number of viral UMIs per 10,000 host UMIs exceeded previous reports28, highlighting the sensitivity of TotalX to short, structured RNAs (Extended Data Fig. 6). To quantify infection per cell, we defined a viral load score based on summed abundance of POLY and sfRNA14. This score correlated with broad host transcriptomic changes across both protein-coding and noncoding genes (Fig. 3b).

Infected cells partitioned into two transcriptional states—‘active response’ and ‘quiescent response’—based on host gene expression (Fig. 3c–e). Despite similar viral levels, cells in the quiescent response state failed to mount the canonical antiviral response and transcriptionally looked similar to uninfected cells (Fig. 3c–e). This heterogeneity is consistent with the infection conditions used (multiplicity of infection (MOI) = 1, 48 h postinfection; Methods), which are expected to yield a mixture of uninfected cells, secondary-infected cells and more advanced primary-infected cells within the population. Transcription factor (TF) enrichment analysis using GSEA29 showed that active response was associated with ATF4 and DDIT3 activation—regulators of the unfolded protein response (UPR) and endoplasmic reticulum (ER) stress (Fig. 3f)—consistent with previous studies30. By contrast, quiescent cells did not show upregulation of these targets.

Notably, we found a subset of ‘quiescent highVL’ cells—those with high viral load but suppressed host activation. While these cells contained abundant levels of both POLY and sfRNA transcripts (Fig. 3g,h), they maintained baseline (uninfected) levels of EIF1 and S100P and showed higher levels of AFP, APOB and MAVS compared with the ‘active response’ group (Fig. 3g). In addition, the ‘quiescent highVL’ cells showed muted inflammatory and stress-response signatures, including downregulation of mitogen-activated protein kinase (MAPK) signaling (Fig. 3i). This observation prompted us to explore possible mechanisms underlying the suppression of host responses in these cells. Although previous studies have shown that sfRNA can inhibit host antiviral defenses31,32, the precise factors at play in our system remain unclear. To investigate further, we performed differential expression analysis across high (highVL) and low viral load (lowVL) cells within both active and quiescent cell response states (Extended Data Fig. 7 and Supplementary Table 4). In the ‘quiescent highVL’ group, we observed strong enrichment of chromatin regulatory signatures, including histone deacetylase (HDAC) and histone acetyltransferase (HAT) pathways (Fig. 3j), implicating epigenetic remodeling in the suppression of host responses. Previous studies have shown that HDAC inhibition during DENV2 infection can modulate inflammatory cytokine production33; here we provide a transcriptome-wide, single-cell-level quantification of this effect, highlighting a potential mechanism of immune evasion through chromatin-based silencing.

Region- and cell-type-specific expression of noncoding RNAs in the developing human brain

To chart the cellular and spatial distribution of noncoding RNAs in the human brain, we applied TotalX to neurosurgical and fetal tissue collected across four developmental stages: gestational week 19 (GW19), 9 months (9MO), 8 years (8YO) and 16 years (16YO) (Fig. 4a). The dataset includes 22 biological samples from 4 individuals, covering a broad array of brain regions, including the prefrontal (PFC) and primary visual (V1) cortices, thalamus, hippocampus, cerebellum, insula, meninges, choroid plexus, ventricular zones and ocular tissue (Fig. 4b and Extended Data Fig. 8a,b).

Fig. 4: TotalX reveals cell-type-specific expression of noncoding RNAs across diverse regions of the developing human brain.
Fig. 4: TotalX reveals cell-type-specific expression of noncoding RNAs across diverse regions of the developing human brain.
Full size image

a, Schematic of sampled brain regions. Dissected regions include the cortex (frontal, temporal and occipital), subventricular zone (SVZ), hippocampus, thalamus, cerebellar cortex, choroid plexus, insula, meninges, middle temporal gyrus and eye tissue. Developmental time points span gestational week 19, 9 months postnatal, 8 years and 16 years. CUSA refers to the sample obtained from cavitron ultrasonic surgical aspirator. b, Cell class composition across age and region. The stacked bar plot shows the distribution of major cell classes (for example, neuronal, glial, epithelial, immune) across sampled regions at each developmental stage. c, UMAP projection of 301,515 single cells profiled using TotalX. Cells are colored by cell type (main plot), cell class (top right) and age (top left). Arcs connect the most cell-type-specific noncoding RNA markers (top 400 per cell type, selected based on log2(FC) > 1 and adjusted P value < 0.01, determined using a two-sided Wilcoxon rank-sum test with Benjamini–Hochberg FDR correction)—including miRNAs, lncRNAs, snRNAs and histone RNAs—to their genomic coordinates. Arcs are colored by gene biotype, illustrating relationships among RNA biotypes, chromosomal loci and cell classes. d, Selected cell-type-specific noncoding RNA markers. Dot plots show the expression of noncoding RNAs across brain cell types, grouped by broad cell classes. chplx, choroid plexus; CGE, MGE and LGE, caudal, medial and lateral ganglionic eminence, respectively; IPC, intermediate progenitor cell; SMC, smooth muscle cell.

We profiled 301,515 single cells using the TotalX-miRNA(+) protocol, which enabled robust detection of diverse noncoding RNA biotypes across a broad range of cell types and developmental stages. Dimensionality reduction revealed distinct transcriptional clusters corresponding to canonical neural (including inhibitory (IN) and excitatory (EN) neurons), glial, immune and non-neural cell types, with clear separation by developmental stage and brain region (Fig. 4c). Cell types were annotated based on established coding gene markers (Extended Data Fig. 8c)34,35,36,37 and further characterized by their noncoding RNA expression profiles.

Noncoding RNA expression varied markedly across cell types and developmental stages. Histone mRNAs were highly expressed in rapidly dividing progenitors such as radial glia, while snRNAs and lncRNAs were enriched in immune and differentiated neuronal populations. In total, we identified hundreds of differentially expressed non-polyadenylated transcripts, revealing their widespread association with cell identity and developmental state (Extended Data Fig. 8d).

To examine how noncoding RNA expression relates to genomic context, we mapped the top cell-type-specific noncoding RNA markers to their chromosomal coordinates (Fig. 4c). This analysis uncovered biotype-, time- and lineage-specific coordinated expression of entire noncoding RNA clusters: for instance, multiple histone genes clustered on chromosome 6 were co-expressed predominantly in progenitors, while multiple snRNAs within a chromosome 1 cluster showed coordinated enrichment in microglia. These findings show that cell-type-specific regulation of noncoding gene transcription extends beyond individual genes to encompass entire chromosomal clusters, requiring coordinated activation across extended genomic regions.

We next looked for cell-type-specific noncoding RNA markers. Figure 4d highlights selected noncoding RNAs across major cell types and RNA biotypes. Dot plots and heatmaps show the expression of representative lncRNAs, miRNAs and snoRNAs across neuronal, glial, immune, epithelial and mesenchymal populations. Several miRNAs showed striking cell-type specificity. For example, MIR17 was highly expressed in Cajal–Retzius (CR) cells38,39 and deep-layer excitatory neurons (EN-deep_layer), while MIR128-2 was detected in the same populations but showed peak expression in lateral ganglionic eminence-derived interneurons (IN-LGE-FOXP1). MIR9-1 was enriched in astrocytes, MIR92B in ependymal cells and MIR125B1 in early neurons. We also observed MIR147B expression in macrophages but not in microglia, and broad expression of MIR23A and MIR27A across immune, stromal and epithelial populations, as well as in a subset of radial glia (Extended Data Fig. 8e).

Beyond miRNAs, other biotypes showed similarly specific expression patterns. For example, LINC00299 was enriched in astrocytes and ependymal cells, SNORA7A and LNCAROD in microglia, and SNORA11F in neurons. ENSG00000258312 marked choroid plexus ependymal cells, while LINC01644 and PCAT19 were detected in endothelial cells. SNHG18 and SNORA78 were broadly expressed across mesenchymal and epithelial progenitor populations (Fig. 4d).

Together, these data show that noncoding RNA, along with coding genes, exhibit distinct and dynamic patterns of cell-type and temporal specificity. Their abundance deepens the layered regulatory complexity of brain development and highlights the utility of total RNA profiling for resolving the noncoding transcriptome at single-cell resolution.

Noncoding RNA dynamics across developing neuronal and glial lineages

To investigate the role of noncoding RNAs in neuronal lineage commitment, we focused on developing excitatory and inhibitory neurons (Fig. 5a,b). Using partition-based graph abstraction (PAGA) and pseudotime alignment, we reconstructed developmental trajectories from radial glia (RG) toward both glutamatergic neuronal lineages, and from ganglionic eminence progenitors toward GABAergic neuronal lineages40. While most neurons in our dataset remain in immature states, we observed robust expression of canonical early differentiation markers. These included EOMES in intermediate progenitors, DCX in newborn neurons, and SATB2 and SLC17A7 in developing excitatory neurons, as well as DLX1, DLX2 and GAD1 in GABAergic interneuron precursors (Fig. 5a,b and Extended Data Figs. 8a,c and 9a,b), consistent with previous studies of the developing human brain34,36.

Fig. 5: Cell-type-resolved expression of disease-relevant miRNAs across developing neuronal lineages.
Fig. 5: Cell-type-resolved expression of disease-relevant miRNAs across developing neuronal lineages.
Full size image

a, UMAP of neuronal lineages annotated by marker gene expression. Expression of canonical glutamatergic (SLC17A7) and GABAergic (GAD1) markers delineates excitatory and inhibitory trajectories across developing brain regions, including cortical, subplate and ganglionic eminence-derived neuroblasts. b, Pseudotime trajectories of glutamatergic and GABAergic neuronal lineages. Pseudotime was inferred using PAGA. CGE, MGE and LGE, caudal, medial and lateral ganglionic eminence, respectively. c, Selected gene expression across neuronal progenitors and mature neuron types. The heatmap shows relative expression of protein-coding genes, lncRNAs, snoRNAs, snRNAs and miRNAs across major excitatory and inhibitory neuronal lineages. Expression values represent normalized means across 100 pseudotime-defined bins. The color bar indicates pseudotime, consistent with b. d, Dot plot of disease-associated miRNA abundance across neuronal cell types. e, Read coverage profile of the MIR137 gene locus. Aggregate read depth across the MIR137 gene region (chr1(–): 98,046,070–98,046,171), showing precise coverage of the mature 3p arm by TotalX in developing neuronal lineages. f, Spearman correlation between MIR137 and its targets. Histogram of Spearman correlation coefficients across MIR137 and all other genes. Correlations were assessed using a two-sided Spearman rank correlation test. Some MIR137 validated targets are significantly anticorrelated (Benjamini–Hochberg adjusted P value (FDR) < 0.05), consistent with post-transcriptional repression. Vertical lines and gene names denote top anticorrelated protein-coding genes. g, GO enrichment for MIR137 targets. Reactome GO analysis of negatively correlated (Spearman ρ < −0.3, Benjamini–Hochberg adjusted P value (FDR) < 0.01) validated MIR137 targets reveals enrichment for pathways related to nervous system development, WNT signaling and axon guidance. TKI, tyrosine kinase inhibitors.

Along these trajectories, we identified dozens of developmentally dynamic noncoding RNAs—including miRNAs, lncRNAs and snRNAs (Fig. 5c). Several small RNAs with known neurological relevance, such as MIR222, MIR103A1 and MIR128-241,42,43, showed transient expression peaks in early neurons and declined with maturation, suggesting potential roles in fate specification and lineage progression.

We performed similar analyses for glial trajectories, reconstructing differentiation paths from RG into glia-restricted progenitor cells (GPCs)44 to astrocytes or oligodendrocyte precursor cells (OPCs), and mature oligodendrocytes (Extended Data Fig. 9c–f). We observed comparable dynamic patterns of noncoding RNA expression along glial lineages. LINC00299 was upregulated in late-stage astrocytes, while MIR219A2HG showed increased expression at later stages of OPC differentiation. MIR143 was enriched in mature astrocytes, whereas MIR568 showed a dynamic trajectory—highly expressed in radial glia, downregulated in astrocyte progenitors and then re-expressed during astrocyte and OPC maturation. Notably, MIR23A was also expressed in RG and re-emerged in mature oligodendrocytes, suggesting a possible reuse of regulatory programs at distinct stages of glial development.

miRNA dynamics and target repression across developing neuronal trajectories

CR cells are one of the earliest-born neurons in the developing cortex, although they largely disappear by birth. During development, they occupy the outermost cortical layer I and guide radial neuron migration by secretion of reelin (RELN)45,46. Among the developing neuronal populations, CR cells emerged as a hotspot of miRNA expression, showing higher overall miRNA abundance compared with other neuronal subtypes (Extended Data Fig. 10a). This suggests that CR cells—although transient—may rely on enhanced post-transcriptional regulation during early cortical development. Within this population, MIR137 stood out for its specificity: it was highly expressed in CR cells and nearly absent from other neuronal subtypes at GW19 (Fig. 5d). MIR137 is also one of the strongest genome-wide association studies (GWAS)-associated loci for schizophrenia and intellectual disability47,48, with established roles in synaptic development and epigenetic regulation49. Additional miRNAs, including MIR17 and MIR222—previously linked to neuronal survival and axon outgrowth50,51—also showed enriched expression across several populations during early neurogenesis.

To investigate whether these miRNAs exert post-transcriptional control within the same cells, we examined the expression of their validated targets. For MIR137, TotalX read coverage revealed detection of both 5p and 3p arms, with expression dominated by the 3p arm (Fig. 5e)—consistent with previous findings showing that miR-137-3p accounts for 97% of all miR-137-specific reads across 71 sequencing experiments compiled by miRBase52. MIR137 was selectively expressed in early neurons, particularly CR cells, whereas many of its known targets—genes involved in cell cycle progression, chromatin remodeling, RNA splicing and synaptic vesicle transport—were enriched in neuronal progenitors and showed mutually exclusive expression with MIR137. Spearman correlation analysis confirmed a significant anticorrelation between MIR137 and some of its validated targets (adjusted P value < 0.05) (Fig. 5f), consistent with miRNA-mediated repression53. Among the top anticorrelated genes were SLC38A2 and CDC42, the latter of which has previously been shown to be downregulated by MIR137 in cancer cells54. GO enrichment analysis further implicated MIR137 targets in nervous system development and Wnt signaling (Fig. 5g), underscoring their relevance to early neurogenesis.

We performed similar analyses for MIR17, MIR125B-2 and MIR128-2 (Extended Data Fig. 10c–e) and found that both validated and predicted targets53,55 were consistently anticorrelated with miRNA expression (adjusted P value < 0.05).

Together, these findings support a critical role for noncoding RNAs—particularly miRNAs—in shaping early neuronal lineage commitment, and position CR cells as a unique miRNA-regulated population during cortical development.

Discussion

This study introduces TotalX, a scalable and accessible framework for total RNA profiling in single cells that enables the simultaneous detection of both polyadenylated and non-polyadenylated transcripts. By building upon a widely adopted droplet scRNA-seq platform with minimal protocol and software modifications, TotalX overcomes long-standing limitations in single-cell transcriptomics in which noncoding RNAs—particularly short or non-polyadenylated species—have remained largely inaccessible. Through benchmarking, optimization and application across diverse biological contexts, we show that TotalX provides a biotype-rich molecular view of individual cells while preserving throughput and compatibility with existing pipelines.

Using TotalX, we detected diverse classes of noncoding RNAs, including miRNAs, tRNAs, lncRNAs, snoRNAs, snRNAs and histone RNAs, at single-cell resolution across immune cells, virally infected hepatocytes and the developing human brain. In PBMCs, we identified noncoding RNA markers of cell identity and co-expression modules linking coding and noncoding genes. In DENV2-infected cells, TotalX captured both coding and structured viral RNAs and identified a transcriptionally quiescent but high viral load infection state potentially linked to chromatin regulation. In the developing brain, we mapped the landscape of noncoding RNA expression across regions and developmental time points, uncovering biotype-specific enrichment patterns and chromosomal localization of cell-type-restricted noncoding RNAs. These findings highlight the biological value of simultaneously capturing coding and noncoding RNA programs in single cells.

CR cells, among the earliest-born neurons in the cerebral cortex, have a foundational role in brain development by secreting reelin and guiding the laminar architecture45. Although transient, they scaffold the future cortex. Our finding that CR cells show enriched expression of MIR137—a miRNA strongly associated with schizophrenia risk47—raises the possibility that post-transcriptional dysregulation in this population could have lasting consequences for cortical circuit assembly. Because MIR137 regulates genes involved in chromatin remodeling, synaptic function and neuronal maturation, its misexpression during the narrow developmental window when CR cells are active may contribute to neurodevelopmental disorders. These observations potentially position CR cells as both structural architects and a regulatory vulnerability point in psychiatric disease.

A key advantage of TotalX lies in its ability to resolve dynamic expression of small RNAs, such as miRNAs, and link them to functional outcomes. For instance, in developing neuronal lineages, we identified temporally restricted expression of several miRNAs and showed strong anticorrelation with its validated targets, suggesting direct post-transcriptional repression within the same cell. This level of resolution is difficult to achieve using traditional polyA-capture methods or indirect inferences from bulk data. More broadly, the ability to recover miRNA–target relationships, histone mRNA bursts and coordinated tRNA-codon programs opens new avenues for studying cell state transitions, stress responses and translational control. TotalX can, in principle, be applied to both single cells and single nuclei; however, in this study, we focused on cell-based profiling, as mature miRNAs are predominantly cytoplasmic56.

Despite these advances, TotalX has limitations. The addition of short RNA libraries, while improving small RNA detection, can reduce the overall proportion of mappable reads and requires careful optimization of library mixing ratios. Detection of circRNAs and very low-abundance transcripts remains limited, and future protocol iterations may benefit from dedicated capture strategies or enrichment steps. In particular, low-abundance noncoding RNAs—such as certain lncRNAs, snoRNAs and miRNAs—are also potentially underrepresented or missed entirely in the current workflow. Furthermore, although our modified Cell Ranger pipeline enables short read processing, specialized quantification tools may improve detection sensitivity and assignment accuracy for certain noncoding classes, such as miRNA.

Altogether, TotalX generates rich, integrative data fully compatible with widely used workflows, such as single-cell atlases and Perturb-seq screens on the 10x Genomics Chromium platform57,58. Resolving coding and noncoding elements simultaneously within single cells provides a more complete molecular phenotype, essential for uncovering subtle regulatory interactions, rare cell states and context-specific responses.

Methods

HEK293T cell isolation

HEK293T human embryonic kidney cells (female; ATCC, CRL-3216) were cultured in complete Dulbecco’s modified Eagle’s medium (DMEM) high-glucose medium (Thermo Fisher Scientific, 11965092) supplemented with 10% fetal bovine serum (Thermo Fisher Scientific, 16000044), 1 mM sodium pyruvate (Thermo Fisher Scientific, 11360070) and 100 μg ml−1 penicillin–streptomycin (Thermo Fisher Scientific, 15070063). Cells were dissociated using 0.25% Trypsin–EDTA (Thermo Fisher Scientific, 25200056) for 2–4 min at 37 °C and collected for analysis.

PBMC isolation

PBMCs were isolated from a leukoreduction system chamber collected from a healthy 61-year-old adult donor at Stanford Blood Center. The leukoreduction system chamber product was diluted 1:4 in phosphate-buffered saline (PBS) supplemented with 2% fetal bovine serum (FBS; Thermo Fisher Scientific, 16000044) and layered onto Ficoll-Paque Plus (GE Healthcare, 17-1440-02) for density gradient centrifugation at 400 × g for 30 min at room temperature, with the brake off. The mononuclear cell layer was collected and washed twice in PBS plus 2% FBS, and residual red blood cells were lysed using ACK lysis buffer (Thermo Fisher Scientific, A1049201) for 3 min at room temperature. Cells were then washed, counted and resuspended at 1 × 107 cells per ml in CryoStor CS10 cryopreservation medium (Stemcell Technologies, 07930), aliquoted into cryovials and frozen at −80 °C in a controlled-rate freezing container before long-term storage in liquid nitrogen.

Before single-cell capture, cryopreserved PBMCs were rapidly thawed at 37 °C, transferred to pre-warmed RPMI 1640 medium (Thermo Fisher Scientific, 11875093) supplemented with 10% FBS, and washed twice in PBS plus 0.04% BSA (Thermo Fisher Scientific, AM2616). Viability and concentration were assessed using the Luna automated cell counter (Logos Biosystems) with Trypan Blue exclusion. Final input cell concentrations were adjusted to 750–1,000 cells per µl.

Dengue virus infection of Huh7 cells

Huh7 human hepatoma cells (male; Apath LLC, catalog number AP-HUH7) were cultured in DMEM (Thermo Fisher Scientific, 11965092) supplemented with 10% FBS (Thermo Fisher Scientific, 16000044), 1 mM sodium pyruvate (Thermo Fisher Scientific, 11360070) and 100 U ml−1 penicillin–streptomycin (Thermo Fisher Scientific, 15140122) at 37 °C in a 5% CO2 humidified incubator. Cells were seeded in 6-well plates and infected at ~70% confluence with DENV2 (strain 16681) at an MOI of 1 as previously described28.

For infection, virus-containing medium was diluted in serum-free DMEM and added to cells for 1 h at 37 °C with gentle rocking every 15 min. After incubation, the inoculum was removed and cells were washed once with PBS (Thermo Fisher Scientific, 10010023) before the medium was replaced with fresh complete medium. Cells were incubated for 48 h postinfection to allow for robust viral replication. Mock-infected controls were treated identically with virus-free media.

At the endpoint, cells from two replicate wells (approximately 1 × 106 cells per well) were collected by trypsinization (0.25% Trypsin–EDTA; Thermo Fisher Scientific, 25200056), pelleted by centrifugation (300 × g, 5 min) and resuspended in PBS for downstream processing. Cell viability was assessed using the Luna automated cell counter (Logos Biosystems), and cells were diluted to 750–1,000 cells per µl before encapsulation for single-cell TotalX.

Human brain tissue processing

Fetal brain samples (19 gestational weeks) were obtained from Advanced Bioscience Resources and shipped overnight in cold preservation solution. Informed consent for the use of fetal brain tissue was obtained by the tissue provider from the donor’s legal representative, in accordance with applicable federal and state regulations. Samples were provided to the investigators in a fully deidentified form. As no identifiable private information was available to the research team, informed consent was not directly applicable to our study and no institutional review board approval was required under Stanford University guidelines. All subsequent experimental use of these tissues was conducted under preapproved Stem Cell Research Oversight protocols at Stanford University. Intact samples were dissected into anatomical regions by licensed neuropathologists, minced using sterile razor blades and digested in HBSS (Thermo Fisher, 24020117) containing 10 mg ml−1 Liberase (Roche, 5401119001) and 200 µg ml−1 DNase I (Worthington, LS002007) for 40 min at 37 °C with gentle agitation. The digestion was repeated once. Samples were then incubated in Accutase (Innovative Cell Technologies, AT104) supplemented with 200 µg ml−1 DNase I for 15 min at 37 °C. Red blood cells were removed using Histopaque-1077 (Sigma, 10771) by layering the suspension at a 2:1 ratio and centrifuging at 400 × g for 30 min at 25 °C (low acceleration, no brake). The buffy coat was collected and washed in HBSS containing 0.1% polyvinyl alcohol (Sigma, P8136).

Neurosurgical brain tissue samples (from 9-month-old, 8-year-old and 16-year-old individuals) were obtained during epilepsy surgeries at Stanford Hospital with informed consent from the donor’s legal representatives under a protocol approved by the Stanford Institutional Review Board. Samples were provided to the investigators with deidentified personal information, except for the age. Tissue was transferred on ice immediately from the operating room and processed within 1 h of resection. Samples were manually minced and enzymatically dissociated using papain (Worthington Biochemical, LK003176) or Accutase (Innovative Cell Technologies, AT104) supplemented with DNase I (Worthington, LS002007), depending on sample integrity and downstream application. Cell suspensions were filtered through 40-µm strainers (Corning, 352340) and used directly for downstream assays.

Following dissociation, cortical samples showed high levels of myelin debris and CD45+ vascular macrophages. These populations were depleted in two steps. First, magnetic bead-based negative selection was performed using anti-CD45 microbeads (Miltenyi Biotec, 130-045-801), followed by 30 min of incubation at 4 °C and magnetic separation using MS columns (Miltenyi Biotec, 130-096-433). Second, residual myelin was depleted by incubation with Myelin Removal Beads II following the manufacturer’s protocol. Post-depletion, cell suspensions were counted using a Luna automated cell counter (Logos Biosystems) and viability was assessed using LIVE/DEAD viability/cytotoxicity reagent (Thermo Fisher Scientific, L3224). Final cell concentrations were adjusted to 750–1,000 cells per µl before encapsulation.

Single-cell total RNA sequencing with TotalX

Single-cell total RNA sequencing was performed using TotalX, a modified 10x Genomics Chromium 3′ v3.1 workflow incorporating enzymatic strategies adapted from Smart-seq-total10. The protocol enables the simultaneous capture of polyadenylated and non-polyadenylated transcripts through a co-incubation approach integrating 3′ polyadenylation, 5′ capping and reverse transcription (RT) in a single reaction.

RT master mix was prepared by supplementing the standard 10x RT reagents with the addition of the following components: E. coli poly(A) polymerase (New England Biolabs, M0276S; final 1 U µl−1), ATP (10 mM), Vaccinia capping enzyme (NEB, M2080S; 0.1 U µl−1), S-adenosylmethionine (2 mM; NEB, M2080S), guanosine triphosphate (10 mM; NEB, M2080S) and a custom biotinylated template-switching oligonucleotide (5′ biotin-ATGGCUCGGAGAUGUGUAUAAGAGACAGUCUrGrG+G; obtained from Integrated DNA Technologies (IDT)). Reagents were mixed immediately before loading and kept on ice until droplet generation. Cells or nuclei were loaded into the 10x Genomics Chromium controller following the manufacturer’s protocol for 3′ v3.1 chemistry.

cDNA amplification and rRNA depletion

Following reverse transcription, emulsion droplets were broken according to 10x recovery steps. cDNA was then treated with UDG (NEB, M0280S) to remove dUTSO fragments at 37 °C for 50–60 min. This was followed by 5 cycles of amplification using 10x proprietary primers and a custom spike in primer (according to TotalX protocol, Supplementary Note). Pre-amped cDNA (for 5 cycles) was cleaned up using 1.8X SPRI Select (Beckman Coulter, B23317), and cDNA corresponding to rRNA was depleted by Cas9-mediated cleavage using guide RNAs targeting mitochondrial and cytoplasmic ribosomal sequences (DASH). CRISPR ribonucleoprotein complexes were assembled from Alt-R S.p. Cas9 Nuclease V3 (NEB, 1081059), tracrRNA (IDT, 1072532) and a custom pool of 57 crRNAs (IDT) designed based on Smart-seq-total DASH oligos (Supplementary Table 1) and obtained from Integrated DNA Technologies. The reaction was incubated at 37 °C for 60 min, followed by treatment with proteinase K (Thermo Fisher Scientific, EO0491) for 10 min at 56 °C. cDNA was then cleaned up using SPRI Select and amplified for an additional 7–10 cycles using 10x proprietary primers and a custom spike in primer (see Supplementary Note for the detailed TotalX protocol).

Library construction, size selection and sequencing

Amplified cDNA was separated into long- and short-fragment pools using SPRIselect magnetic beads (Beckman Coulter, catalog number B23318). To obtain the long-fragment pool, SPRIselect beads were added to amplified cDNA at a 0.6× bead-to-sample ratio, and bound fragments were retained according to the manufacturer’s instructions. Unbound fragments in the supernatant were further purified at a 1.2× bead ratio (1.8× final) to isolate the short-fragment pool. The long-fragment pool was processed according to the Chromium Single Cell 3′ v3.1 protocol (10x Genomics), using the Single Index Kit T Set A (PN-1000213/PN-2000240). Short fragments were indexed using custom primers through PCR amplification containing 10x Genomics 3′ v3.1 library amplification mix, SI primer and custom index primers (Supplementary Table 1) For each 40 μl reaction, the following components were combined: 20 μl Amp Mix, 4 μl SI primer, 1 μl custom index primer (10 μM), 1 μl amplified cDNA (1 ng μl−1) and 14 μl nuclease-free water. PCR conditions were as follows: 98 °C for 3 min; 12 cycles of 98 °C for 15 s, 62 °C for 20 s and 72 °C for 2 min; and a final extension at 72 °C for 1 min. PCR products were purified with a 1.2X SPRIselect bead cleanup and eluted in 30 μl nuclease-free water. The resulting libraries were either split for further purification to remove very short fragments (below 200–250 bp) and obtain the small RNA fraction or used for optional miRNA enrichment. The miRNA(+) fragments were further size selected from the short-fragment pool by enrichment of fragments corresponding to small RNAs (18–30 bp) using the Pippin Prep gel system (Sage Science, 3% agarose), following the manufacturer’s protocol. Short and long pools were recombined in desired ratios before final library quality control and sequencing. Resulting libraries were sequenced on a NovaSeq X instrument (Illumina) with the following configuration: read 1, 28 bp; index 1, 8 bp; index 2, 0 bp; and read 2, 91 bp. All custom oligonucleotides and guide RNA sequences generated in this study are provided in Supplementary Table 1 and were synthesized by IDT. No unique biological materials were generated. Requests for additional information should be directed to the corresponding author.

Data processing and alignment

Raw sequencing data were demultiplexed using Cell Ranger mkfastq (v8.0.1), and read 2 was trimmed using Cutadapt59 with the following parameters: -u 6 -a ‘AAAAAAAAAA;min_overlap=10’ -m 18. Trimmed reads were then processed using cellranger count with a dual-pass alignment strategy to enable quantification of both long and short RNA species. Two custom reference transcriptomes were constructed from the GRCh38 primary assembly (Ensembl release 109; GRCh38.p13). Gene annotations were sourced from GENCODE v44 (ref. 60) and manually curated to include an expanded set of biotypes.

The primary (long RNA) reference included the following biotypes: protein_coding, lncRNA, miRNA, snRNA, snoRNA, small Cajal body-specific RNA (scaRNA), tRNA, Mt_tRNA, vault_RNA, misc_RNA, antisense, scRNA, and immunoglobulin and T cell receptor genes (for example, IG_*, TR_*). tRNA annotations were appended using curated entries from the high-confidence set in GtRNAdb61. For viral infection studies, the DENV2 genome (GenBank NC_001474; GCF_000871845.1)62 and its corresponding annotation (modified_dengue.gtf) were appended.

The secondary (small RNA) reference was restricted to short RNA biotypes (miRNA, snoRNA, scaRNA) to enhance mapping specificity. tRNAs and viral transcripts were excluded from this reference.

Alignment, barcode correction, UMI collapsing and gene quantification were performed using Cell Ranger’s internal rules, with modified alignment parameters specified in the cellranger-8.0.1/lib/bin/parameters.toml file to improve the capture of short RNA species (≤18 bp). Key modified parameters included the following:

star_parameters = ‘–outFilterMismatchNoverLmax=0.05–outFilterMatchNmin=18–outFilterScoreMinOverLread=0–outFilterMatchNminOverLread=0’

Many small noncoding transcripts—including miRNAs and other short RNAs—were found to be located within introns, exons or overlapping lncRNAs (Extended Data Fig. 1e). In the standard Cell Ranger pipeline, reads overlapping multiple features are not counted, leading to underrepresentation of these transcripts. To address this, a two-step mapping and quantification strategy was used.

Initial barcode calling and quality filtering were performed using the long RNA reference. Barcodes passing these filters were then used to extract matching data from the short RNA alignment. In accordance with the cellranger logic, only reads containing the tag xf:Z:25 in the BAM file were used for UMI counting and generation of the final gene expression matrix. The final gene expression matrix was constructed by substituting miRNA and other short RNA feature counts from the short-reference run into the long-reference matrix based on barcode concordance. Only barcodes confidently detected in the long RNA run were retained for downstream analysis.

Custom Python scripts were developed to merge the count matrices, resolve feature overlaps, ensure UMI count accuracy and annotate transcript biotypes (see GitHub repository63).

Comparison with other single-cell methods

Publicly available HEK293T scRNA-seq datasets generated using the VASA-seq droplet-based protocol were downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GSE176588). For the 10x Genomics Chromium v3.1 dataset, data files were obtained directly from the 10x Genomics dataset page. To enable direct comparison, all datasets were first harmonized by mapping genes to Ensembl gene IDs, and only genes present in all datasets were retained for downstream analyses. Gene symbols were standardized by removing version suffixes (for example, ‘.1’, ‘.2’), and gene detection rates were estimated based on unique, nonambiguous gene symbols within each dataset.

For the TotalX dataset, to minimize the impact of putative doublets, barcodes corresponding to the top 8% of total UMI counts (reflecting the predicted doublet rate) were excluded. For datasets containing more than 1,000 cells, only cells with at least 5,000 detected UMIs were retained; smaller datasets were included in full. To compare gene detection rates and account for differences in sequencing depth between protocols, UMI counts for each cell were randomly downsampled to a fixed target of 20,000 UMIs per cell using a custom script. Cells with fewer than the target UMI count were excluded, except in datasets with fewer than 1,000 cells, in which all cells were kept. Gene detection was assessed as the number of genes with at least one UMI detected per cell following downsampling. For comparisons of mean expression values between protocols, only genes shared across datasets were included.

Comparison with bulk RNA-seq and bulk miRNA-seq

Pseudo-bulk expression profiles were generated from TotalX, TotalX miRNA(+) and VASA-drop HEK293T datasets by summing UMI counts across all cells. These profiles were compared with two bulk total RNA-seq datasets (GSE198050, GSE106476) and two bulk miRNA-seq datasets (GSE261577, PRJNA994505). Genes detected in both datasets being compared were log-transformed (log1p), and Pearson correlations were computed separately for each RNA biotype.

Doublet removal, cell-type annotation and downstream analysis

Doublets were removed using Scrublet64 with thresholds selected to match the expected doublet rate. Integration across brain samples was performed using scVI65 using protein-coding highly variable genes. Dimensionality reduction and clustering were conducted in Scanpy66 using the scVI latent space, followed by principal component analysis and uniform manifold approximation and projection (UMAP)67 for visualization. Differential gene expression analysis was performed using the Wilcoxon rank-sum test with Benjamini–Hochberg false discovery rate (FDR) correction. Noncoding RNA expression was quantified using log-normalized counts. Gene co-expression modules were identified using the Python implementation of weighted gene co-expression network analysis68, with soft thresholding and module detection as previously described.

For pseudotime analyses, developmental trajectories were inferred using PAGA and diffusion pseudotime in Scanpy. For further details, refer to the GitHub page63.

miRNA–mRNA interaction analysis in PBMCs

We identified miRNAs expressed in at least 100 cells and present in ≥2% of at least one PBMC cell type. For each retained miRNA, validated targets (miRTarBase55) and predicted targets (TargetScan53) were retrieved, and only targets expressed in PBMCs (log1p(expression) > 0.1 and detected in >1% of cells) were kept for downstream analysis. Correlations between miRNA and mRNA expression were computed using Pearson correlation across all single cells, and interactions were categorized as anticorrelated, positively correlated or uncorrelated using a significance threshold of |r| > 0.2 and P < 0.05. For visualization, cells were ordered by miRNA expression and binned in groups of 20 cells, and average miRNA and target-gene expression were computed within each bin to generate expression trajectories. All analysis steps and parameters follow the workflow implemented in the accompanying notebook on the GitHub page63.

Quantification of tRNA availability and demand

To estimate cell-type-specific amino acid demand, we first obtained protein-coding sequences from GENCODE v44. For each protein-coding gene, the corresponding protein sequence was extracted, and the frequency of each standard amino acid was tabulated. Within each cell type, gene-level UMI counts were summed across all cells, yielding total expression for each gene per cell type. For genes with a mapped protein sequence, we calculated the total number of each amino acid required by multiplying the gene’s summed expression by the corresponding amino acid counts from the protein sequence. The aggregate for each amino acid per cell type thus reflects the overall translational demand for each amino acid. Amino acid demand matrices were stored as tables with amino acids as rows and cell types as columns. To quantify tRNA availability, we identified all tRNA genes present in the reference transcriptome (filtered for names containing the ‘tRNA’ string). For each tRNA gene, the associated amino acid was inferred from the gene name, using regular expression parsing (for example, extracting the amino acid label preceding ‘_tRNA’). We then summed UMI counts for all tRNA genes by cell type. For each amino acid, total tRNA expression was computed as the sum of all tRNA gene UMIs corresponding to that amino acid within each cell type. This yielded a matrix of tRNA ‘supply’ per amino acid per cell type.

Quantification of DENV2

DENV2 viral RNA abundance was quantified at the single-cell level using two complementary approaches. First, for each cell, the total viral RNA load was computed as the sum of normalized expression values for five DENV2-derived genes: sfRNA1, sfRNA2, sfRNA3, sfRNA4 and POLY. The resulting value (viral_load) was further normalized to the total cellular transcript count to calculate the percentage of viral transcripts per cell (viral_pct).

In parallel, a composite viral load score was generated using the score_genes function in Scanpy (v1.11.0), which computes the average expression of the specified viral genes after subtracting the average expression of a control gene set matched for overall expression distribution. This was performed for both the full viral gene set (viral_load_score, based on sfRNA1–4 and POLY) and for the noncoding flaviviral RNAs alone (sfrna_load, based on sfRNA1–4).

These viral burden metrics—absolute, relative and control adjusted—were used to stratify cells for all downstream analyses, including response clustering and pathway enrichment. Unless otherwise specified, viral_load was used for most analyses.

Gene Ontology and Reactome pathway enrichment analyses of differentially expressed genes were performed using the Python package GSEApy69 and g:Profiler70, respectively, applying a hypergeometric test with FDR correction to identify significantly enriched pathways among top-ranked genes.

Correlation between miRNA and its targets

To assess the relationship between miRNA and target gene expression at single-cell resolution, only cells expressing the miRNA of interest were included in the analysis. For each cell, miRNA expression was quantified, and cells were partitioned into 30 quantile-based bins according to miRNA abundance using pandas.qcut. This approach enabled a robust estimation of monotonic relationships between miRNA and target gene expression across the full dynamic range of miRNA levels observed.

Validated and predicted targets for each miRNA were compiled from TarBase55 and TargetScan 8.0 (ref. 53), respectively, and subsequently filtered to retain only those genes detected in the dataset. For each target gene, the mean expression within each miRNA expression bin was calculated. The Spearman rank correlation coefficient (ρ) between the binned mean miRNA expression and the corresponding target gene expression profile was then computed using the scipy.stats.spearmanr function. Only gene–miRNA pairs with sufficient coverage (at least five bins with non-missing values) were retained for further analysis. An analogous procedure was applied to all non-target genes to generate a background distribution of correlation coefficients, providing a reference for target specificity.

To elucidate the functional relevance of strongly anticorrelated miRNA targets, pathway enrichment analysis was performed. Validated target genes showing significant negative correlation (ρ < 0, adjusted P < 0.05) were subjected to pathway enrichment using the Reactome 2022 human gene set via the Enrichr interface (gseapy v0.10.8). Pathways with an FDR below 0.05 were considered significantly enriched. The top 20 enriched pathways were visualized as a scatterplot of –log10(FDR) versus the pathway name, with the point size proportional to the odds ratio.

Ethics statement

All human materials used in this study were obtained and used in accordance with applicable institutional and regulatory requirements. Fetal brain tissue samples were obtained from Advanced Bioscience Resources (ABR) as fully de-identified biospecimens; informed consent for donation and research use was obtained by the tissue provider from the donor’s legal representative. Under Stanford University guidelines, use of these de-identified samples was classified as not human subjects research and did not require additional Stanford IRB review. This study did not involve human embryonic stem cells, embryo culture or stem cell derivation. Tissue processing was performed under Stanford Stem Cell Research Oversight (SCRO) protocol no. 735. PBMCs were commercially sourced from the Stanford Blood Center donor bank as de-identified donor material collected for research use under the bank’s ethics oversight procedures. Neurosurgical samples were collected under Stanford IRB protocol IRB-12625 with informed consent.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.