An improved reference library and method for accurate cell-type deconvolution of bulk-tissue miRNA data

Zhu, Shaoying; Yang, Hui; Liu, Jun; Fu, Qingsheng; Huang, Wei; Chen, Qi; Teschendorff, Andrew E.; He, Yungang; Yang, Zhen

doi:10.1038/s41467-025-60521-x

Download PDF

Article
Open access
Published: 01 July 2025

An improved reference library and method for accurate cell-type deconvolution of bulk-tissue miRNA data

Nature Communications volume 16, Article number: 5508 (2025) Cite this article

4744 Accesses
6 Citations
1 Altmetric
Metrics details

Subjects

Abstract

MicroRNAs (miRNAs) play key roles in development and disease, and have great biomarker potential. However, because miRNA expression is highly cell-type specific, identifying miRNA biomarkers from complex tissues is hampered by the underlying cell-type heterogeneity. Due to that current single-cell RNA-Seq protocols are lagging behind for quantification of miRNA expression, and most miRNA profiling samples do not have matched mRNA expression or DNA methylation data for cell-type deconvolution, it is an urgent need to develop computational methods for cell-type proportion estimation of bulk-tissue miRNA data. Here we present a novel miRNA expression reference library and deconvolution tool for cell-type composition estimation of complex tissues. We show that our tool is accurate and robust for deconvolution in whole blood as well as in different solid tissues. By applying this tool to a range of different biological contexts, we demonstrate its value for screening of age-associated miRNAs, for monitoring the immune landscape in infectious diseases like COVID-19, as well as for identifying cell-type-specific miRNA biomarkers for early diagnosis and prognosis of human cancers. Our work establishes a computational framework for accurate cell-type mixture deconvolution of miRNA data.

Systematic evaluation of multiple qPCR platforms, NanoString and miRNA-Seq for microRNA biomarker discovery in human biofluids

Article Open access 24 February 2021

miRNA activity inferred from single cell mRNA expression

Article Open access 28 April 2021

Integrated analysis of miRNAs and mRNAs in thousands of single cells

Article Open access 10 January 2025

Introduction

MicroRNAs (miRNAs) are a class of small noncoding RNA that serve as expression regulators to direct the cleavage, degradation, and/or translational inhibition of their target transcripts¹. They play vital roles in broad spectrums of physiological processes including cell cycle, proliferation, differentiation, development, and apoptosis in multicellular organisms². A substantial proportion of miRNAs present highly tissue/cell-type-specific expression pattern that are predictive of cell identity^3,4,5,6. Altered expression levels of miRNAs are widely observed in tissues or body fluids between healthy and disease conditions, or across different disease stages, and often correlate with clinical phenotypes and outcomes, indicating their capability to serve as predictive, diagnostic or prognostic biomarkers^7,8. Therefore, the comprehensive and accurate cataloguing of miRNA-based biomarkers across tissues and cell types is indispensable for identifying the functional status of cells, as well as for understanding the underlying gene regulation mechanism in physiological and pathological conditions in complex tissues.

With the advent of high-throughput techniques, large amounts of miRNA profiling data derived from different tissues in normal and diseased states has accumulated in public repositories, such as The Cancer Genome Atlas (TCGA)⁹ and the Encyclopedia of DNA Elements (ENCODE) project¹⁰. The number of miRNA biomarkers has been increasing at an unprecedented rate with intensive exploration of these troves of publicly available data. However, the vast majority of these molecular profiles are derived from bulk tissue sequencing of cell mixtures where cellular heterogeneity usually acts as a confounding factor. This intra-sample heterogeneity is altered by physiological or functional activity changes of the samples, which impose great challenges to the omics study¹¹. For example, in the setting of case-control or longitudinal studies, observed changes for particular miRNA could result from either the altered expression in any particular cell type, or because of alterations of cell-type composition¹². This uncertainty in the source of data variation could lead to decreased statistical power and an increased false positive rate. On the other hand, shifts in the relative cell composition play an important role in various physiological processes, such as embryogenesis, morphogenesis and immune activities^13,14,15. Cell type quantification explains etiology of many diseases and could be their pathological indications^16,17,18. Notably in cancer research, the tumor microenvironment involving density of infiltrating lymphocytes is critical for cancer progression, patient prognosis and therapy response^19,20,21. Therefore, resolving cell types and compositions from these miRNA expression profiling data is extremely valuable for identifying miRNAs as disease markers and enhancing our understanding of their underlying biological mechanisms.

Over the past decade, the burgeoning single-cell RNA sequencing (scRNA-seq) technologies have been developed that allows for characterization and quantification of cellular heterogeneity by providing an overview of global gene expression within a population of cells²². Unfortunately, current single-cell RNA-Seq protocols still lag for quantification of miRNA expression, only very limited number of studies have been published towards this area^23,24,25. As an alternative, in-silico cell type deconvolution is attractive for identifying cell-type compositions. Computational deconvolution has the advantage that it is cheap and fast, whist also offering a degree of robustness to noise and confounding variables²⁶. Currently, gene expression and DNA methylation profiles are widely used for reference-based cell type deconvolution to resolve cell mixtures of complex tissues²⁷. These approaches rely on priori reference profiling information about specific markers from constitutive cell types, which usually generated from sorted cell types or more elaborate single-cell profiling data²⁸. The well-known deconvolution method CIBERSORT and its successor CIBERSORTx for instance, utilize support vector regression approach and references generated from purified cell populations or single cell data for cell composition estimation^29,30. Although cell type deconvolution can be achieved for samples with matched RNA-seq or DNA methylation data, this is the rare case and most miRNA data in the public domain does not have such matched multi-omics profiling available. The Gene Expression Omnibus (GEO) database for instance, there are already 116,400 miRNA expression profiling samples available, whereas only about 5% have corresponding mRNA profiling data. This situation seriously hinders the in-depth analysis of miRNA data. Therefore, it is urgent need to develop deconvolution tools based on miRNA expression profiles for the analysis of cell heterogeneity in complex tissues and to identify miRNA biomarkers.

In this study, we constructed miRNA reference matrices from sorted cells and developed DeconmiR, a deconvolution method that resolves relative proportions of different cell types from miRNA profiles of mixture samples. We evaluated our method from various experiments and clinical samples, it presents high prediction accuracy for many evaluation metrics. We further applied DeconmiR to different scenarios, such as deconvolution of the data from samples of acute myeloid leukemia and solid tumors, and found that cell proportions estimated by DeconmiR is applicable for patient prognosis. In addition, it also shows great application value for screening of age-associated miRNAs, as well as for monitoring the immune landscape in infectious diseases like COVID-19. By adjusting the estimated cell type composition, we identified smoking-associated differentially expressed miRNAs occurring in the epithelial compartment, which were validated in smoking-related lung cancer. These results indicated that our designed method is an effective tool to infer cell type fractions by using miRNA profiling data, and also demonstrate that miRNAome data is of great significant for accurate cellular heterogeneity analysis of complex tissues.

Results

Building the miRNA expression reference matrix for blood sample

Reference-dependent deconvolution methods rely entirely on a library for cell-type-specific markers. Ideally, a reference matrix should consist of miRNAs whose expression variation be maximized across distinct cell types but minimized within each cell type^31,32,33. We first considered the case of peripheral blood, for which the main constituent cell types are well known. We collected miRNA-seq data for a total of 257 samples encompassing six purified leukocyte subtypes (CD4 + T cell, CD8 + T cell, B cell, NK cell, Monocyte and Neutrophil) obtained from 47 individuals from Juzenas et al. (Supplementary Data 1-2)³⁴. The average purity reported by FACS sorting was 90.2% across all samples (Fig. S1). We selected a total of 197 samples with purity higher than 85% for reference construction and in-silico mixture sample generation by a leave-one-out strategy. We used limma approach to identify differentially expressed miRNAs (DEMs) by comparing each cell type against all others (see Methods)³⁵, which leads to 145 non-redundant miRNAs selected. For sanity check, we used the t-Stochastic Neighbor Embedding (t-SNE) for clustering of the samples based on the signature miRNAs to reveal the structure of the data. We verify that samples are segregated distinctly according to cell types (Fig. 1A). Then the reference matrix was constructed by calculating the average expression for each miRNA from every individual cell type (Supplementary Data 3). Fig. 1B depicts the clustering heatmap generated from the reference libraries, which adequately discriminate the lymphoid-derived cells and myeloid-derived cells, and are also validated by cell-type similarity heatmap by computing pair-wise Kendall rank correlation coefficients (Fig. S2A, B). Many of the signature miRNAs have been identified that are specifically expressed in immune cells, reflecting the function of growth and differentiation of the cells. For instance, miR-155 was found highly expressed and mediates important functions within lymphocytes (both B and T cells)^36,37. miR-181 family has profound impact on the development of human NK cells from hematopoietic progenitor cells³⁸. The Gene Ontology (GO) enrichment analysis of the miRNA targets revealed a strong and specific enrichment of the terms representing the functional phenotype of the corresponding cell types (Fig. 1C). For example, “Regulation of T cell differentiation” and “T cell receptor signaling” are enriched in T-lymphocytes, “Regulation of myeloid cell differentiation” and “Myeloid cell development” are mainly enriched in monocyte and neutrophil. This suggests that our method could accurately and reasonably identify miRNA markers that discriminate different cell types, and reflecting the function of growth and differentiation for the cells.

**Fig. 1: Reference library construction for accurate cell type composition estimation of real and simulated datasets.**

Concordance between true and estimated cell proportions

The application scenario of the identified miRNA-based cell markers is for cell type deconvolution. We designed a method of DeconmiR for miRNA profiling based cell type deconvolution by using Robust Partial Correlation (RPC), whose validity has been demonstrated by applying to DNA methylation-based deconvolution from our previous work^39,40. We first assessed our method by using 257 miRNA profiling data from immune cells with known proportions, which is indicated by the purity of the corresponding cell types. In this benchmark, the predicted cell type proportion by our method could indicate the major cell composition from the pure-cell samples. For example, for the pure-cell samples consisting of only CD4+ T cell, our method consistently predicted that CD4+ T cell is the dominant cell type of all immune cells, with the average predicted cell proportion of 0.91 (Fig. 1D), whereas the predicted proportions for other cell types are generally very low (Fig. 1E). In addition, the predictions also agree well with the ground truth for other cell types, as illustrated in Fig. S3. In general, we observed high agreement between the predicted and FACS-estimated cell proportions in this benchmark, with an average Root-mean-square error (RMSE) of 0.11 for the six cell types (Fig. 1F).

As flow-cytometric estimates of the cell type proportions are also subject to error, to further evaluate whether our method could capture the cell proportion by taking the miRNA profiling data of cell mixtures, we prepared simulated bulk tissues by generating in-silico mixtures of immune cells where the exact mixing proportions are known. One hundred mixtures were generated with a randomly assigned cell weight for each mixture and the proportions exhibited a reasonable dynamic range. For all six cell types, estimated cell fractions displayed high agreement with the true proportions. DeconmiR obtained near-perfect R² values across all cell types, with RMSEs ranging from 1% to 3.2% (Fig. 1G), confirming our method can accurately quantify blood cell type proportions by using the reference library.

DeconmiR could reasonably capture rare cell components

To further evaluate the capability of DeconmiR to deal with rare cell components, we also performed the deconvolution based on mixture data of individual cell types at varying proportions. We generated 100 in silico mixture samples by setting the proportion of one cell type to 1%, 3%, 5%, 7% and 9%, respectively, and the proportions of other components were randomly assigned. The results indicated that the estimation of DeconmiR is close to the ground truth with the smallest variance across different proportions for the six cell types (Fig. 2A). It presents average small RMSE of 0.05 and high R² values above 0.5 for the six cell types (Fig. 2B), which illustrates the robustness of DeconmiR on inferring the fraction of rare components.

**Fig. 2: Accuracy evaluation of DeconmiR and comparison with other methods.**

Independent validation of the miRNA profiling based cell type deconvolution

To further validate our estimation using independent actual samples, we next applied our method to another in-house miRNA profiling dataset generated from whole blood of 14 adults with known proportions of the six immune cell types. Again, our method achieved highly accurate cell-type fraction estimates when compared to the ground truth cell-type composition (Fig. 2C). The validation dataset showed low RMSE and high R² values and across the different cell types, particularly for CD4 + T cell, B Cell, NK cell and monocyte (Fig. 2C). Therefore, our method retained the ability to accurately deconvolute cell fractions for samples from independent datasets.

DeconmiR compares favorably to other deconvolution methods

Having validated our method, we next performed a detailed comparison to other reference-based deconvolution methods that have been applied to gene expression data. Specifically, we compared our method to an approach based on solving linear equations (SLE), to NNLS (used by methods such as GEDIT⁴¹), to state-of-the-art deconvolution algorithms like CIBERSORT²⁹, and to the quadratic programming (QP) used by DeconRNAseq⁴². For each of the algorithms, we used the reference library to compare the prediction accuracy by estimating the in silico mixture sample proportions, and comparing the algorithms in terms of RMSE and R² values. For RMSE, DeconmiR outperformed all other methods across five cell types except monocyte, where the QP performed mildly better than DeconmiR (Fig. 2D). DeconmiR outperformed all other methods in terms of the average RMSE across all cell types (Fig. 2E). For algorithms performance as assessed by R² value, all methods attained relatively high R² values, with DeconmiR achieving marginally higher ones (Fig. 2D, E). These results sufficiently indicate the high prediction accuracy of DeconmiR for estimation, and they consistently outperform other methods.

DeconmiR reveals robustness to external noise in different cell type context

In order to survey the miRNA based deconvolution in the context of other cell types, we constructed references by using miRNA profiles from epithelial and non-epithelial cell types obtained from FANTOM5⁴³, and applied our method to the mixture scenarios of three epithelial cell types consists of human mammary epithelial cell (HMEC), renal proximal tubular epithelial cell (RPTEC) and bronchial epithelial cell (BEC) from RNA_atlas⁴⁴, as well as three non-epithelial cell types consists of aortic endothelial cell (AEC), cardiac fibroblast (CF) and prostate smooth muscle cell (PSMC) from microRNAome⁴⁵. Once again, DeconmiR achieves an accurate prediction both in epithelial and non-epithelial cell contexts (Fig. S4A and B), with an averaged R² of 0.96 and RMSE of 0.11 for epithelial cell types, and an averaged R² of 0.93 and RMSE of 0.10 for non-epithelial cell types, respectively. As this analysis was performed on the reconstructed in-silico mixtures generated from miRNA profiles that are independent from those used to build the reference signature matrices, which already implicitly assess the robustness of the algorithm to natural levels of noise, as encountered for instance between different sequencing platforms or experimental protocols. However, in order to further improve our understanding of the robustness of the method, we next investigated their relative performance under increasingly higher levels of noise (see Methods). By adding increasing levels of noise to the reconstructed mixtures for epithelial, we observed that our method performs relatively stable in terms of RMSE, whereas in terms of R², the performance decreases as the noise level increases, which indicated that it is optimal at low level noise but sensitive to high level noise (Fig. S4A). Similar observations were also presented for the non-epithelial cell types (Fig. S4B). These data indicate that the robustness of the miRNA based approaches for estimating cell proportions and confirms its value in cell heterogeneity analysis for complex solid tissues.

Besides the deep sequencing methods for miRNAs quantitative study, microarray is also widely used for parallel expression analysis of defined miRNAs. The accuracy of the deconvolution methods is therefore probably subject to the technical variation when applied to different platforms. Thus, we implemented our method to the application of miRNA profiling data derived from microarrays for cross platform effectiveness assessment. We collected the miRNA profiling data from the Affymetrix Multispecies miRNA-1 Array platform for six purified leukocyte subtypes and prepared the in-silico mixture samples. Deconvolution of the simulated datasets comprise six cell types indicates that our method could predict cell proportions for samples based on microarray profiling with a fair degree of accuracy, with averaged R² value of 0.74 and RMSE of 0.09 for six cell types (Fig. S4C). A relative higher level of prediction accuracy was observed for four cell types of B cell, NK cell, CD4+ and CD8 + T cell, whereas results for Monocyte and Neutrophil were somewhat deviated from the real data. Here we also access the robustness of our methods under increasingly higher levels of noise. This time we found rapid decreasing of R² and RMSE values with increasing levels of noise added, which indicated the high-level noise may affect the robustness across platforms (Fig. S4C). In summary, our analysis indicated that relative performance for deconvolution in heterogeneous mixtures is dependent on cell types and noise level within the data.

Potential diagnostic value of DeconmiR for acute myeloid leukemia

Having demonstrated the accuracy and robustness of our method in cellular heterogeneity deconvolution, we investigated the changes of cell fractions in human malignancies of acute myeloid leukemia (AML), which were obtained from TCGA (Supplementary Data 4). We first estimated the cell type composition for these samples by applying our method to the miRNA profiling data. As depicted in Fig. 3A, AML samples present a heterogeneous composition for the six immune cell types. We then applied CIBERSORT to transcriptome data, and the EpiDISH to DNA methylome data from the matched samples for comparison. The CIBERSORT method based on LM22 library distinguishes 22 hematopoietic cell types including seven T cell subtypes, two B cell subtypes, plasma cells, NK cell, and also the myeloid subsets (Fig. S5A). We aggregated the subdivided cell types into the 6 major immune cell types to calculate their proportions. For instance, the cell proportions of CD4 naïve, CD4 memory resting, CD4 memory activated, follicular helper and Tregs were summed to obtain the total proportion of CD4 + T cell. For EpiDISH-based results, proportions for seven different cell types were given (Fig. S5B). The result showed a moderate but significant concordance between deconvolution results obtained from different deconvolution methods. An estimated Pearson Correlation Coefficient (PCC) of 0.24 and a p-value of 1.4e-10 was observed for the estimated cell compositions obtained from DeconmiR and CIBERSORT (Fig. 3B). Whereas for results based on DeconmiR and EpiDISH, an estimated PCC of 0.26 and p-value of 4.3e-12 were achieved (Fig. 3B). Relative better concordance was observed for B cell and NK cell according to the estimated RMSE values as depicted (Fig. 3B). One possible reason for the relatively low consistency between results obtained from different platforms can be the less flexibility across multiomics data (given that miRNA levels and mRNA levels are not highly correlated). The second explanation is that CIBERSORT yields absolute cell fractions for 22 cell types in a given sample, and there is lack of an exact correspondence to the cell types provided by DeconmiR. Nevertheless, these results still indicate the cell type proportions estimated are consistent across different types of omics data.

**Fig. 3: Deconvolution of AML and healthy samples using DeconmiR.**

We further compared the cellular fraction from AML samples with that from healthy control. To do this, we estimated cell proportions of 77 additional whole blood samples from Juzenas et al. (Fig. S5C). We found a significant difference for cell proportions of CD8+T cell, NK cell and neutrophil between AML and healthy samples. Specifically, the CD8+ T cell presented significantly higher fractions in AML samples, while the fractions of NK cell and neutrophil were significantly lower in AML. Notably, we can observe a higher heterogeneity level in AML samples, which shows a much wider range of cellular composition distribution for all six cell types (Fig. 3C). Considering the important role of cell composition in cancer prognosis, we made a hierarchical clustering for AML samples based on the estimated cellular fractions to define subgroups. Two different groups (Group A vs. Group B) can be observed for apparent difference in the fractions of four cell types including CD4+ T Cell, CD8+ T Cell, NK Cell and neutrophil. Higher fractions of CD8+ T Cell and NK Cell are more enriched in Group A, whereas Group B presents higher fractions of CD4+ T Cell and neutrophil (Fig. 3D). Kaplan–Meier analysis indicated that the cellular fractions well characterize the differences in overall survival between the two groups (Fig. 3E). These results indicated the critical roles for miRNA based cell type deconvolution in cancer prognosis and potential clinical usage of our method for patients’ stratification.

DeconmiR improves sensitivity for aging related miRNA selection

Age-related changes in miRNA expression have been previously reported and functionally described⁴⁶. In order to illustrate our method for application in aging-associated miRNA identification, we apply DeconmiR to a miRNA profiling data in a cohort of 38 samples, with age ranging from 24 to 79, for which whole blood samples were taken and miRNA profiles were generated using Affymetrix Multispecies miRNA-1 Array (Supplementary Data 5)⁴⁷. We obtained the cell proportions for these samples by using our method. The results indicated that lymphocytes including CD4+ T cell and B cell constitutes the majority of the samples and the neutrophil makes up the rest (Fig. 4A). Then we obtained the top components of data variation by using singular value decomposition (SVD). By using a principal component regression analysis, the top components were correlated against sample ages and other potential confounders of the inferred cell weights, with the aim of assessing these factors that contribute to the inter-sample variability. We found that the top components of data variance were explained more by cell type heterogenity than that by age (Fig. 4B). The first principal component is significantly associated with cell composition of B cell, NK cell and monocyte, and the third principal component is significantly associated with cell composition of neutrophil. While confounders of age and gender are only weakly associated with top principal components.

**Fig. 4: DeconmiR improves detection sensitivity of aging related miRNAs in blood and cell type composition aberrance with SARS-Cov-2 infection.**

To determine effects of the observed confounding between age and cell composition at individual miRNA level, we examined age associated miRNA with and without cell type composition adjusted. By using a linear regression model, a total of 14 miRNAs were identified that are associated with age, with 8 positive correlated and 6 negative correlated (q-value < 0.05, Fig. 4C and Supplementary Data 6). By adjusting cell type composition estimated by DeconmiR, we observed 27 age associated miRNAs, doubling the number without cell type composition adjusted. For these miRNAs, a total of 17 of them are positively age-correlated and 10 are negatively correlated (q-value < 0.05, Fig. 4D and Supplementary Data 6). Importantly, among the 14 age-miRNAs identified without cell type adjustment, 13 are also included in those identified with adjustment (Fig. 4D and Fig. S6). This analysis confirms that DeconmiR greatly improves the sensitivity for age related miRNA screen over the unadjusted analysis.

Cell type composition change in the blood samples of COVID-19 patients

The pandemic of COVID-19 caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to enormous infections and loss of life since its origin in 2019. The susceptibility and clinical manifestations to the coronavirus vary greatly among individuals. Many studies have indicated that the immune landscape aberration could be the key mechanisms causing the clinical manifestation differences of COVID-19 patients^48,49. To test if our method could capture the immune properties of COVID-19 patients, we obtained miRNA-seq data for whole blood from a cohort of 47 individuals (Supplementary Data 7)⁵⁰. By applying DeconmiR to miRNA-seq data and CIBERSORT to RNA-seq data from matched samples, we observed concordant results for results from DeconmiR and CIBERSORT as before (Fig. S7A). Then we checked the composition difference between COVID-19 patients and control samples for immune cell types and found a significant difference between the two groups. The proportions of the lymphoid lineage including CD4+ T cell, CD8+ T cell and NK cell decreased significantly in COVID-19 patients, whereas the myeloid lineage increased significantly. Notable differences can be observed for the CD4+ T cell and neutrophil (Fig. 4F). To further explore whether these cellular compositions could characterize clinical features of COVID-19 patients, we also collected another cohort data for 47 PBMC samples from 18 patients with different clinical features (Supplementary Data 8)⁵¹. Here again, concordant estimations are obtained by using DeconmiR and CIBERSORT, respectively (Fig. S7B). When checking the cellular compositions among groups of patients with different clinical features, we found the proportion of monocytes were gradually elevated, notably in serious COVID-19 patients, while the CD4+ T cell were decreased significantly (Fig. 4G). This observation was verified by some other studies using the elaborate single cell methods⁵². Collectively, this analysis further confirmed the accuracy of our deconvolution analysis in dissecting the pathogenesis and revealed the dynamic changes for intrinsic immune landscape in the inflamed state of COVID-19 patients.

DeconmiR reveals immune cell infiltration in breast cancer

Having demonstrated our methods in blood tissues, we aim to assess the effectiveness of miRNA based deconvolution on real mixtures of epithelial and immune subtypes. To do this, we apply our method to the breast tissue and constructed a reference matrix containing 10 cell types from four categories of epithelial, fibroblast, adipocyte and immune cells. We used miRNA profiling data for the 7 immune cell subtypes from FANTOM5⁴³, and further collected human breast epithelial cell (HMEC) fibroblast, adipocyte. Then we repeated the same procedure as before to obtain 271 cell-type-specific miRNAs, which were demonstrated to discriminate samples from different cell types accurately (Fig. S8A, Supplementary Data 9). The reference matrix constructed based on these miRNAs could also discriminate immune and other cell types (Fig. 5A and Fig. S8B). Comparison of these miRNAs to the blood references indicated a significant part of them are breast sample specific (Fig. S8C). We applied this reference to 1000 in silico mixture samples and observed high consistency between predicted and true cell-type fractions with R² values above 0.95 and RMSE less than 0.06 (Fig. S8D). Then we apply our method to an independent dataset of four samples of purified breast epithelium cell line and 36 whole blood samples from the microRNAome project⁴⁵. When estimating the fractions for the breast epithelium samples, we are able to confirm that epithelial cell is estimated as dominant for the four samples, in line with the high epithelial purity expected (Fig. 5B). Whereas for the 36 whole blood samples, immune cells constitute the major categories (Fig. 5C). All these results demonstrate the reliability of our reference matrix for cell proportion estimation for solid tissues like breast.

**Fig. 5: Reference construction and cell type deconvolution of human breast tissues.**

Next, we aim to assess the immune cell infiltration in breast cancer samples. To do this, we estimated the epithelial and immune cell contents for 1182 normal and breast cancer samples from TCGA (Supplementary Data 10). We can observe that both normal and tumor samples exhibit substantial immune cell infiltration, whereas all major cell categories present significant changes between normal and tumor samples. Generally, an increased fraction of epithelial is observed in tumor samples, but with the fibroblast and adipocyte fractions decreased. For the immune content, both the lymphoid and myeloid lineages present decreased fraction, whereas the macrophage increased dramatically (Fig. 5D). An in-depth analysis indicated a more heterogeneous variance of different immune cell types. The CD8+ T cell, B cell and NK cell present an increased fraction, CD4+ T cell and neutrophil decreased significantly. Notably, the monocytes were absent from both normal and tumor samples as expected (Fig. S9A). As the tumor-associated macrophages (TAMs) are indicated that associated with a poor outcome of cancer patients, we examined whether the infiltration level of macrophage estimated by our methods can be used for prognosis of the patients^53,54. By classifying all tumor samples into two groups of high and low based on macrophage fraction, we validate that a higher proportion of macrophages was associated with worse prognosis (Fig. 5E). Thus, the immune cell fractions obtained using DeconmiR can serve as an effective prognostic indicator for breast cancer.

Finally, we further explored the characteristics of the immune cell landscape across different breast cancer subtypes defined by immunohistochemistry (IHC) or PAM50. Fig. S9B shows that ER+ tumors present a greater proportion of total immune cells, but with different trends in lymphoid and myeloid lineages, where the lymphoid fraction is higher and the myeloid fraction is lower in ER+ tumors. Whereas for intrinsic subtypes defined by PAM50 clustering, both total immune cell and lymphoid lineage were highest in the Luminal A subtype (Fig. S9C). In contrast, the myeloid was highest in Luminal B tumors, and also in the Her2+ and Basal subtypes, which indicated the significant heterogeneity of immune infiltrate among different subgroups. We further investigated immune infiltration patterns among different tumor samples. As depicted in Fig. S9D, ER+ or PR+ subtypes, and the Luminal A subtype are found have higher fractions of CD4+ T cell and NK cell, but lower fraction of neutrophil. These results suggested that our method could provide important insights on the dynamic immune cell infiltration for solid tumors and offer valuable indicators for potential molecular classification and patient prognosis.

Cell type deconvolution across tumor types to identify cancer biomarkers

In order to further demonstrate the suitability of our method in the context of other solid epithelial tissues (e.g. lung, liver, kidney, etc.), we then constructed tissue-type-specific references matrices by using the corresponding epithelial and fibroblast cell lines and the purified immune cell subtypes used earlier. In addition, we also constructed a generic matrix for a hierarchical two-step procedure deconvolution mode, which was adopted by the hEpiDISH method for its effectiveness has been validated in complex epithelial tissues⁴⁰. In such case, we used different epithelial and fibroblast cell lines with reasonable sample numbers (n = 10) as well as immune cell types to construct the miRNA reference database for the three main cell types (Fig. S10 and Supplementary Data 11). We performed deconvolution on 16 epithelial cancer types for which miRNA profiling data are available from TCGA. We would expect consistency with independent cell-type fraction estimates obtained using a different molecular assay, such as RNA-seq. To do this, we applied the ESTIMATE algorithms on the matched RNA-seq profiles of normal-adjacent and cancer samples profiled as part of TCGA⁵⁵. The comparison of the total immune fraction estimated using DeconmiR on the miRNA profile, with the corresponding immune score obtained using ESTIMATE on the matched RNA-seq profile reveals a remarkable good agreement for both normal and cancer tissue and across all cancer types (Fig. S11). These findings attest to the general pan-cancer applicability of DeconmiR.

We next demonstrate the effectiveness of our method to infer cell-type-specific DEMs in the context of solid epithelial tissue as cancer biomarkers. We first considered the case of DEMs between normal and cancer, using breast cancer as an example for the availability of miRNA profiles from breast cancer cell lines. We constructed an approximate gold standard set of breast cancer DEMs by intersecting the lists of DEMs derived from a comparison of normal to breast cancer cell lines from both the FANTOM5 and microRNAome project, with a total of 372 and 408 DEMs were identified, respectively (Supplementary Data 12). The overlap of DEMs from the two comparisons defines a high-confidence set of 160 true positives occurring within the epithelial compartment of breast cancers, as well as a list of 1504 true negatives, which are not differentially expressed in neither of them (Fig. 6A and Fig. S12). Treating the proportion of three cell type types (epithelial, fibroblast and immune) as covariate in the differential expression analysis of TCGA breast cancer cohort data, we obtained a sensitivity of 30% and specificity of 74% (Fig. 6B). An unadjusted analysis achieved a higher sensitivity (41%), but at a lower specificity (62%), whereas the state-of-the-art reference-free approach surrogate variable analysis (SVA)⁵⁶ achieved a similar performance for sensitivity (42%) and specificity (61%). Thus, the increased specificity supports the view that DeconmiR is able to remove false positives attributable to changes in cell-type composition between normal and cancer. This approach provides reasonably accurate estimates of cell-type fractions, while also controlling for the false-positive rate for cancer biomarker identification in complex epithelial tissues.

**Fig. 6: Applications of DeconmiR in solid tumor to identify cancer biomarkers.**

Finally, to demonstrate how miRNA-based cell type deconvolution may help to provide new biological insights, for instance, to reveal smoking-associated miRNA changes in epithelial cells, we collected miRNA profiles from a cohort of 347 bronchial airway samples with different statuses of smoking exposure (GSE93284)⁵⁷. A set of 175 significantly differentially expressed miRNAs between current and former smokers was obtained as those with smoking-related expression changes in bronchial epithelium, with 89 of them being upregulated and 86 downregulated (Supplementary Data 13). We inspected whether these smoking-associated miRNAs can be observed in epithelial compartment of lung cancer, so that can be used as early diagnostic markers by using bronchial epithelium as a surrogate tissue. For this purpose, we further collected miRNA profile from another cohort of 48 pairs of lung adenocarcinoma and non-malignant lung tissue (GSE110907)⁵⁸ and applied our method to infer cell type fractions. As expected, lung cancer samples mainly contain epithelial and immune cells and the elevated epithelial in tumor samples was observed (Fig. S13A). Then we try to identify cell-type-specific DEMs in these two cell types by applying the CellDMC method⁵⁹, which was originally designed for identifying cell-type-specific differentially methylated positions from DNA methylome data, whereas its effectiveness for transcriptome data has also been validated⁶⁰. We verify that many top-ranked smoking-associated miRNAs in bronchial epithelium were predicted to be expressed specifically in the epithelial or immune compartments of the lung tissue (Fig. 6C, D). The epithelial-specific miR-183-3p for instance, has been found significantly up-regulated in lung cancer tissues, which is associated with lymph node metastasis and poor prognosis of lung cancer patients^61,62. The immune-specific miR-139-5p has been identified as the tumor suppressor downregulated in lung adenocarcinoma⁶³. The miR-139 family members were found as the regulators of the PD-1 and other Immunocytokines, therefore to play an important role in immune evasion of cancer^64,65. The up- and downregulated smoking miRNAs exhibited a highly significant trend toward positive correlations between their expression levels and the estimated epithelial as well as immune cell-type fractions, respectively (Fig. 6E). To further validate these epithelial-specific smoking miRNAs, we also estimated fractions of different cell types for 559 lung adenocarcinoma samples from TCGA, with similar patterns was observed (Fig. S13B). We confirmed this correlation pattern for the top-ranked upregulated epithelial smoking miRNAs between their expression and epithelial cell fractions (Fig. 6F), and also for immune miRNAs (Fig. S13C, D). Moreover, their variance in expression levels increases with the fraction of epithelial cells in cancer, while also exhibiting a concomitant decrease in variance in samples with a higher immune cell content. Besides, the top-ranked epithelial-specific miRNA with respect to lung cancer exhibited a highly significant trend toward positive and negative correlations from independent validation (Fig. 6G). Up- and downregulated epithelial-specific miRNAs exhibited increased and decreased levels of expression in lung cancer, respectively (Fig. 6H). Thus, the bronchial smoking dysregulated miRNAs identified here represent alterations occurring in the epithelial cell of origin of smoking-related lung cancer and therefore may mark cells that are selected during lung carcinogenesis.

Discussion

Computational cell type deconvolution has been demonstrated as a useful strategy for dissecting heterogeneity of complex tissues^66,67,68. However, in-silico deconvolution methods that apply to miRNA profiling data remain unavailable. Although cell type fraction information can be obtained for those samples with multi-omics data available, such as mRNA profiling, whereas only about 5% of the samples have such data. In such case, we generated DeconmiR, a deconvolution predictor for depicting cell type composition by employing miRNA expression profiling data. Using both simulated and experimental data, we demonstrate that our method produces high-accuracy predictions for a diverse range of cell types and is robust to data noise. It provides approximate precision compared with other gene expression or DNA methylation-based deconvolution tools. Its success indicates that noncoding RNA has great potential for exploring the cellular composition of complex tissues.

Genome-wide miRNA profiling data have been accumulating rapidly in the past few years⁶⁹. However, a key challenge for the interpretation of the biological meaning of these data remains in that expression measurements are generally performed on heterogeneous sample specimen. The ensemble measurement on bulk samples would be severely affected by cell-type composition, which could vary substantially between individuals⁷⁰. Consequently and surprisingly, studies still continue to ignore this important confounding factor when investigating miRNA profiling data^71,72,73. Although recently emerged single-cell small RNA sequencing is promising to facilitate understanding the sample heterogeneity, the application of such methods to various tissues is still limited. Many of these single cell miRNA profiles were performed on purified cell lines but without real tissue sample available, in addition, most of these data have very limited miRNAs or cells profiled^23,24,25. Hence, we still lack a systematic survey of the cellular heterogeneity and their influence on RNA omics study from the prospective of miRNA profiling. Our designed tool is therefore necessary for capturing the spatiotemporal complexity behind the diversity of RNA biology.

To guarantee its performance of cell type quantification for complex tissues, DeconmiR utilized several approaches to obtain the highly cell-specific expressed miRNAs and build the reference signature matrix that is most informative for deconvolution. We first collected a large cohort dataset which profiled sufficient number of samples for different immune cell types in our study. More replicates included in each cell type give rise to a robust signature matrix for deconvolution. Second, we filter the most informative miRNAs that discriminate different cell types by applying the pair-wise comparison with t-test analogous approaches. Our results indicated that a good performance for discriminating cell types can be achieved for the library size with miRNAs less than 200, much less than that of the genes recruited by transcriptome-based deconvolution methods, such as the LM22 library, which was created and utilized by CIBERSORT²⁹, or the number of CpGs for DNA methylome-based methods, which generally has hundreds or even thousands of sites included^31,39,74,75. Finally, we verify that the performance can be improved by logarithmic transformation of the data for miRNA profiling-based deconvolution. Data scaling has been indicated as important to reduce the omics data noise. There are, however, still debates on whether the accuracy can be improved for transcriptome-based deconvolution methods. Several tools require that a log-transformation is needed for performing RNA-seq data-based computational deconvolution^76,77, whereas others consider that log-transformed data violated the linearity assumption for regression and leading to a worse estimation^66,78. They show the necessity of data scaling prior to the deconvolution but conclude that non-logarithmic transformed TPM, RSEM or FPKM values are more appropriate for accurate estimation of cell type proportions. From our experience, a more accurate deconvolution can be achieved in logarithmic scaling when deconvolving miRNA expression profiles. We speculate it is probably because the inflated nature of RPM value leads to the extremely high noise level, outliers and abnormal distribution of the profiling data greatly affect the model performance, sometimes even make the correct estimation impossible. In addition, we notice that many newly developed methods utilize the weighted least squares approaches and combine with scRNA-seq data derived references when analyzing the RNA-seq sample with a highly skewed distribution^79,80,81. These methods present high accuracy for the estimation of rare cell types, However, they are easily subject to errors when applying to data with noise, which is exact the case of miRNA profiling data. In such a case, the PRC methods were selected as it is more robust for miRNA data with high-level noise.

We assessed the performance of DeconmiR on a set of benchmark datasets, it produces accurate results and robustness against noise when tested on mixtures of human immune cells. We also showcased the effectiveness of DeconmiR in different application contexts. By applying our method to human malignancy of AML, DeconmiR indicates the differences in the compositions of different cell types between disease and control samples. Cellular fractions predicted by our method are able to discriminate samples into subgroups, which suggests the potential value for clinical diagnosis and prognosis of AML. When applied to an age-related miRNA profiling study, it could identify double more age-related miRNAs compares to the cell type unadjusted analysis, confirming DeconmiR could dramatically increase the screen sensitivity. In addition, it successfully captures the dynamic changes of major cell types in the blood of COVID-19 patients, which shows its important application value in exploring immune response for infectious diseases.

We also extended the success of the reference-based framework to complex tissues other than blood. In doing this, we not only provide detailed miRNA reference databases for individual tissue types using several independent datasets, but also a generic reference which is able to reliably estimate fractions of epithelial, immune and stromal (mainly fibroblasts) cells in complex tissues by using a hierarchical deconvolution strategy⁴⁰. Using our novel reference matrix, we demonstrate that complex tissues exhibit widespread contamination by immune cells. By using the breast tissue as an example, our analysis shows the reference matrix can be used not only to estimate the cell content of samples, but also to infer cancer-specific miRNA biomarkers occurring in the epithelial compartment by controlling the sensitivity and specificity. In the case of lung cancer, we show that many of top-ranked smoking-associated miRNAs detected from a profiling performed in bronchial airway can be detected specifically in the epithelial compartment of lung cancer samples, while many others also occur in the immune compartment. These cell-type-specific miRNA changes in the bronchial and lung epithelium may affect or reflect deregulation of regulatory networks that contribute causally to lung cancer development. In addition, these results have profound implications for future miRNA profiling study that want to use easily accessible surrogate tissues to identify miRNA alterations that are representative of that originate in the epithelial or immune compartments of solid tissues. We believe that with more work demonstrating the value of miRNA profiling data in this area, it could help improve biological interpretation, prioritize candidates that require functional validation, and elucidate causal pathways to different diseases.

We also indicate there are limitations for border applications of this method. One such is the collinearity features observed for some close related cell types in the miRNA-seq data, which may lead to confounded deconvolution⁸². In addition, we observe that some miRNAs in the reference library belong to the same family that share common seed regions or clusters. For example, the miR-181 family members miR-181a-5p, miR-181b-5p, miR-181c-5p, and miR-181d-5p in the blood reference. In total, ten miRNAs were identified as originating from the same clusters or sharing identical seed sequences, representing 7% of the miRNAs in the library. These highly correlated miRNAs may introduce redundant information into the reference library, potentially leading to higher weights for specific cell types. Thus, a careful interpretation for the deconvolution results is needed to best eliminate potential bias for particular cell types. Nevertheless, we foresee that the applications of our tool will be a wide benefit for scientific and clinical fields.

In summary, we present a novel reference-based deconvolution algorithm, DeconmiR, for in-silico dissecting cell type composition of complex tissues on miRNA profiling data. It provides advantages to researchers in terms of cell type and platform flexibility, and constitutes a useful addition to the existing tools for tissue decomposition that utilize transcriptome or DNA methylome data. We expect the broad applications of the proposed method to different scenarios so as to increase our understanding of the roles of cellular heterogeneity in different biological processes, and facilitate the miRNA biomarker identification in physiological and disease samples.

Methods

Data collection and reference profile generation from circulating cells

The first critical step involves the selection and assembly of a library that incorporates miRNA signatures characteristic to the component cell types using pure cell-type profiling data. Therefore, we collected miRNA-seq data from FANTOM5, TCGA, as well as other miRNA profiling studies for individual cell types and tissues. We first try our method for peripheral blood as it is the most commonly used biospecimen for biological analyses and is easily accessible. We used a publicly available miRNA-seq profiling data from Juzenas et al., which generated from FACS sorted cells of peripheral blood from 43 individuals (GEO Accession: GSE100467)³⁴. Specifically, we used the purified leukocyte subtypes of lymphoid lineage including CD4+ T cell (T helper cell), CD8+ T cell (cytotoxic T cell), B cell (CD19+), Natural Killer cell (NK cell, CD56+), and myeloid lineage including Monocyte (CD14+) and Neutrophil (CD15+). There are 257 samples collected for these six cell types in total. The genome-wide miRNA profiling data quantified as RPM was log-transformed with pseudo count 1 added in order to regularize the data for downstream analysis. Here we used the leave-one-out strategy to construct reference profile for immune cell types. we first picked those samples with purity higher than 85% for reference construction and in silico mixture sample generation. Given n samples for a particular cell type within the dataset, n-1 samples were used to derive the signature matrix, and the remaining sample was used to synthesis the in silico mixture samples. We used a well-known empirical Bayes framework of moderated t-statistics (limma) to derive DEMs between one of the six cell types and the remaining. This tool has been widely used to identify differential features between sample groups, and providing reliable inference even with small sample sizes³⁵. Those miRNAs were selected as differentially expressed at the false discovery rate (FDR) threshold of 0.05, whilst also requiring that the expression fold change be larger than 2 between the two groups. For each cell type, we picked the top 50 DEMs across all cell types and the non-redundant set of miRNAs were kept. Then the expression centroids were calculated for each miRNA to generate the final signature matrix.

To generate reference matrices for solid tissues, we gathered miRNA profiling data for 28 purified cell types or cell lines from a separate publicly available study of FANTOM5⁴³, which mainly covers epithelial, endothelial and fibroblast cells from ten different tissues, including the breast, lung, liver, esophagus, kidney, prostate, respiratory tract, skin and vessel, and purified samples representing seven major immune cell types (CD4+ and CD8+ T cells, B cell, NK cell, neutrophil, monocyte and macrophage), and also adipocyte from breast tissue. We require at least three samples for each cell type. To derive signature matrices, the custom limma-based wrapper function was used for all pairwise comparisons between candidate cell types within each tissue. In addition, we have also constructed a generic matrix that is used for a hierarchical deconvolution strategy and for cell-type deconvolution of general epithelial tissues. In doing this, these miRNA profiles were categorized into major cell groups of epithelial, fibroblast and immune, then differentially expressed miRNAs were selected by comparisons among the three groups, each comparison comparing one cell type to the other two. Then, the reference was obtained by averaging expression values for the miRNAs and for each of the three main cell groups. Finally, a second reference matrix consists of only seven immune types was constructed to estimate their relative proportions. We require the second matrix shares no overlap with the first one making up three cell type groups. This strategy ensures that the inference of immune subtype fractions is not significantly confounded by variation in the epithelial and fibroblast proportions, and has been validated by a previous study for DNA methylation-based deconvolution.

Deconvolution method development for bulk tissue data

As mentioned above, an application scenario of the identified marker miRNAs and the associated reference profiles is for cellular deconvolution, in which we used the reference profiles generated in the first stage to estimate the cellular fractions for bulk samples. We leveraged the statistical framework of Robust Partial Correlation (RPC) which has been used for DNA methylation or gene expression data as benchmark method. This method has presented high accuracy and robustness to noise in many scenarios. In addition, four other widely used statistical methods, including solve linear equations (SLE), non-negative least squares (NNLS), support vector regression (SVR) and quadratic programming (QP) are also included for the purpose of comparison. In addition, we also use an additional tree-based strategy to estimate cell-type fractions for all nine cell types (epithelial, fibroblast and the seven immune subtypes listed above) in any given tissue. Briefly, we apply DeconmiR with RPC and the generic reference to obtain cell-type fractions for the epithelial, fibroblast and total immune components. Then, we reapply DeconmiR with RPC to the immune reference database to estimate relative proportions for all seven immune cell subtypes. Finally, these relative proportions for the seven immune cell subtypes are rescaled by multiplying them with the total immune fraction estimated in the first step to obtain absolute proportions of all seven immune subtypes in the sample of interest. This hierarchical approach has shown its advantages over trying to construct one-single reference that encompasses all cell types⁴⁰. All these functions and references were wrapped into R package DeconmiR (https://github.com/zyangx/DeconmiR). We have also developed a comprehensive and interactive web resource (https://www.compgenelab.info/DeconmiR/). With this platform, users can easily obtain the cell fraction estimations by using miRNA profiling in a specific tissue of interest. In order to evaluate the concordance between deconvolution results across multi-omics data for matched AML samples from TCGA, we obtained the RNA-seq data to calculate cell fractions for the six immune cell types by applying the well-known deconvolution method of CIBERSORT and the associated LM22 reference library²⁹, as well as the DNA methylome data by applying EpiDISH and the reference library, which is based on DNase Hypersensitive Site (DHS) screened cell-type-specific CpGs³⁹. Furthermore, the total immune cell fractions obtained by DeconmiR for solid tumors from TCGA were compared to RNA-seq data-based estimates from ESTIMATE algorithm⁵⁵ in the absence of flow-cytometry-based estimates of the different cell types. To evaluate the deconvolution performance, the Root-mean-square error (RMSE) and Pearson correlation coefficient (PCC) were calculated for the simulated or real-world datasets.

Generation of pseudo bulk mixtures for model evaluation

To evaluate if our method could capture the cell composition by taking the miRNA expression profiles of cell mixtures, we generated in-silico mixtures of immune and non-immune cells for simulated bulk tissues. First, we prepared 100 different in-silico mixtures of the purified immune cell miRNA profiles, with their weight randomly chosen from a uniform (0,1) distribution, and subject to the constraints of sum weight of 1. RMSE between the estimated and the known weight of corresponding cell types were calculated to assess the performance of each algorithm. R² values between estimated and true weights are also calculated.

Flow cytometry and miRNA expression analysis of adult whole blood samples

We further evaluated our deconvolution method from an independent cohort of in-house collected samples for validation. To do this, we collected whole blood samples of 14 adult donors of 12 males and 2 females, with an average of 51.9 years old from the First Affiliated Hospital, Wannan Medical College (Wuhu, Anhui, P. R. China) with their informed consent obtained. Each of the 14 whole blood samples was divided into two equal aliquots, each containing a different anticoagulant: heparin, citrate or EDTA. For each sample, portions of the aliquot in heparin were used to perform conventional immune profiling methods, including flow cytometry, manual 5-part white blood cell differential and CBC with automated 5-part white blood cell differential, so that cell proportions for eight different cell types (proportions of CD4+ T, CD8+ T, B cell, NK cell, Monocyte, Neutrophil, Eosinophil and Basophil) were acquired. In parallel, another aliquot for each sample was subjected to high-throughput miRNA sequencing. Details for the RNA extraction and post processing of the miRNA profiling data can be seen in Supplementary Methods. All surveys and experiments were approved by the Ethics Committee for Clinical Research of the First Affiliated Hospital of Wannan Medical College. miRNA expression profiling quantified as RPM was obtained and log2 transformed. miRNA sequencing data from this experiment have been deposited in GEO at NCBI and are accessible through accession number GSE268516.

Public external datasets for in-silico mixture construction and validation

Dataset 1 was obtained from RNAatlas project⁴⁴. This corresponds to miRNA-seq profile of HMEC, RPTEC and BEC, which was used for in-silico mixture sample generation.
Dataset 2 was obtained from microRNAome project⁴⁵. This corresponds to miRNA-seq profile of AEC, CF and PSMC, which was also used for in-silico mixture sample generation, and 4 HMEC samples and whole blood samples from 36 individuals for purity estimation.
Dataset 3 was obtained from Florence et al. (GEO Accession: GSE28489)⁸³. This dataset includes miRNA expression profiling for six sorted immune cells (CD4+ T cell, CD8+ T cell, B cell, NK cell, monocyte, neutrophil) based on microarray.
Dataset 4 was obtained from TCGA acute myeloid leukemia (AML) and other 16 solid cancer cohort samples including BLCA, BRCA, CESC, COAD, ESCA, HNSC, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PRAD, READ, THCA and UCEC⁹. For the AML cohort, miRNA expression profiling data from whole blood samples was obtained. For the solid cancer cohort, miRNA expression profiling data from both normal and tumor samples are used.
Dataset 5 was obtained from Maider et al. (GEO Accession: GSE89042)⁴⁷. This dataset includes miRNA microarray data profiled on whole blood from 38 healthy individuals with age from 24 to 79.
Dataset 6 was obtained from Togami et al. (GEO Accession: GSE182152)⁵⁰. This corresponds to miRNA profiling data from whole blood of 62 coronavirus disease 2019 (COVID-19) patients and health controls, which profiled based on two different sequencing platforms.
Dataset 7 was obtained from Zheng et al. (GEO Accession: GSE157859)⁵¹. This corresponds to miRNA profiling data from peripheral blood mononuclear cell (PBMC) of 18 patients of COVID-19 with different clinical manifestations at different clinical stages.
Dataset 8 was obtained from Pavel et al. (GEO Accession: GSE93284)⁵⁷. This corresponds to miRNA profiling data from a cohort of 347 bronchial airway samples with different status of smoking exposure.
Dataset 9 was obtained from Yu et al. (GEO Accession: GSE110907)⁵⁸. This corresponds to miRNA profile from a cohort of 48 pairs of lung adenocarcinoma and non-malignant lung tissue.

For the TCGA AML cohort of Dataset 4, we also obtained the genome-wide RNA-seq and DNA methylome 450 K data for the matched samples. For other cancer types from TCGA as well as dataset 6 and 7, the RNA-seq data for matched samples were also obtained. The detailed description, data processing and reference construction procedures for all above datasets can be found in Supplementary Methods.

Data quality control and cell type adjustment for biomarker identification

To determine the associations between inter-sample variability and cell type heterogeneity of the dataset, the top components of data variation were obtained by using the singular value decomposition (SVD) method. The number of significantly variable singular vectors was determined by Random Matrix Theory (RMT). Then the significant components of variation were correlated to phenotypic and technical factors, as well as inferred cell type fractions, to assess the relative contributions of the biological variables to data variability and represented in a P-value heatmap between components and different factors. To identify the miRNAs that truly associated with biological factors, such as aging or smoking, P-values of association were determined by multiple linear regressions in each dataset, with cell-type fractions estimated as covariates for adjustment.

miRNA target prediction and functional enrichment analysis

For each immune cell-specific miRNA set obtained, we collected the experimentally validated targets from miRTarBase⁸⁴. Then we performed the enrichment analysis of the miRNA targets on Gene Ontology terms by using clusterProfiler package to facilitate the functional annotation of cell type-dependent miRNA⁸⁵. Enriched GO terms for biological processes were selected where FDR-adjusted P-value < 0.05.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The newly generated miRNA-seq data in this study has been deposited in the NCBI’s GEO database under accession number GSE268516. Other data analyzed in this manuscript are publicly available from GEO the TCGA data portal (https://gdc.cancer.gov), the FANTOM5 (https://fantom.gsc.riken.jp/5/suppl/De_Rie_et_al_2017/), the RNA Atlas (http://r2platform.com/rna_atlas/) and the microRNAome (https://bioconductor.org/packages/devel/data/experiment/html/microRNAome.html). Accession codes for data from GEO include GSE100467, GSE138734, GSE28489, GSE89042, GSE182152, GSE157859, GSE93284 and GSE110907. All the public available datasets used in this work are summarized in Supplementary Data 1. Source data are provided with this paper.

Code availability

The open source R package of DeconmiR is available from the GitHub repository (https://github.com/zyangx/DeconmiR/). The DeconmiR web server is available from: (https://www.compgenelab.info/DeconmiR/). We have also made a CodeOcean Capsule available from https://codeocean.com/capsule/2943926/tree.

References

Bartel, D. P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281–297 (2004).
Article CAS PubMed Google Scholar
Hwang, H. W. & Mendell, J. T. MicroRNAs in cell proliferation, cell death, and tumorigenesis. Br. J. Cancer 94, 776–780 (2006).
Article CAS PubMed PubMed Central Google Scholar
Landgraf, P. et al. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401–1414 (2007).
Article CAS PubMed PubMed Central Google Scholar
Ludwig, N. et al. Distribution of miRNA expression across human tissues. Nucleic Acids Res. 44, 3865–3877 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schwarzer, A. et al. The non-coding RNA landscape of human hematopoiesis and leukemia. Nat. Commun. 8, 218 (2017).
Article PubMed PubMed Central Google Scholar
Yu, H. R. et al. Comparison of the functional microRNA expression in immune cell subsets of neonates and adults. Front Immunol. 7, 615 (2016).
Article PubMed PubMed Central Google Scholar
Wang, J., Chen, J. & Sen, S. MicroRNA as Biomarkers and Diagnostics. J. Cell Physiol. 231, 25–30 (2016).
Article CAS PubMed PubMed Central Google Scholar
Condrat, C. E. et al. miRNAs as biomarkers in disease: latest findings regarding their role in diagnosis and prognosis. Cells 9 https://doi.org/10.3390/cells9020276 (2020).
Chu, A. et al. Large-scale profiling of microRNAs for The Cancer Genome Atlas. Nucleic Acids Res. 44, e3 (2016).
Article PubMed Google Scholar
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article Google Scholar
Shen-Orr, S. S. et al. Cell type-specific gene expression differences in complex tissues. Nat. Methods 7, 287–289 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wang, N. et al. Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues. Sci. Rep. 6, 18909 (2016).
Article CAS PubMed PubMed Central Google Scholar
Moore, J. L., Du, Z. & Bao, Z. Systematic quantification of developmental phenotypes at single-cell resolution during embryogenesis. Development 140, 3266–3274 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lyck, L., Kroigard, T. & Finsen, B. Unbiased cell quantification reveals a continued increase in the number of neocortical neurones during early post-natal development in mice. Eur. J. Neurosci. 26, 1749–1764 (2007).
Article PubMed Google Scholar
Altboum, Z. et al. Digital cell quantification identifies global immune cell dynamics during influenza infection. Mol. Syst. Biol. 10, 720 (2014).
Article PubMed PubMed Central Google Scholar
Capurro, A., Bodea, L. G., Schaefer, P., Luthi-Carter, R. & Perreau, V. M. Computational deconvolution of genome wide expression data from Parkinson’s and Huntington’s disease brain tissues using population-specific expression analysis. Front. Neurosci. 8, 441 (2014).
PubMed Google Scholar
Carr, E. J. et al. The cellular composition of the human immune system is shaped by age and cohabitation. Nat. Immunol. 17, 461–468 (2016).
Article CAS PubMed PubMed Central Google Scholar
Piera-Velazquez, S. & Jimenez, S. A. Endothelial to Mesenchymal Transition: Role in Physiology and in the Pathogenesis of Human Diseases. Physiol. Rev. 99, 1281–1324 (2019).
Article CAS PubMed PubMed Central Google Scholar
Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–1964 (2006).
Article CAS PubMed Google Scholar
Croci, D. O. et al. Dynamic cross-talk between tumor and immune cells in orchestrating the immunosuppressive network at the tumor microenvironment. Cancer Immunol. Immunother. 56, 1687–1700 (2007).
Article PubMed PubMed Central Google Scholar
Fridman, W. H., Pages, F., Sautes-Fridman, C. & Galon, J. The immune contexture in human tumours: impact on clinical outcome. Nat. Rev. Cancer 12, 298–306 (2012).
Article CAS PubMed Google Scholar
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
Article CAS PubMed PubMed Central Google Scholar
Faridani, O. R. et al. Single-cell sequencing of the small-RNA transcriptome. Nat. Biotechnol. 34, 1264–1266 (2016).
Article CAS PubMed Google Scholar
Wang, N. et al. Single-cell microRNA-mRNA co-sequencing reveals non-genetic heterogeneity and mechanisms of microRNA regulation. Nat. Commun. 10, 95 (2019).
Article PubMed PubMed Central Google Scholar
Isakova, A., Neff, N. & Quake, S. R. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc. Natl. Acad. Sci. USA 118 https://doi.org/10.1073/pnas.2113568118 (2021).
Erkkila, T. et al. Probabilistic analysis of gene expression measurements from heterogeneous tissues. Bioinformatics 26, 2571–2577 (2010).
Article PubMed PubMed Central Google Scholar
Avila Cobos, F., Alquicira-Hernandez, J., Powell, J. E., Mestdagh, P. & De Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun. 11, 5650 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sturm, G. et al. Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology. Bioinformatics 35, i436–i445 (2019).
Article CAS PubMed PubMed Central Google Scholar
Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015).
Article CAS PubMed PubMed Central Google Scholar
Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37, 773–782 (2019).
Article CAS PubMed PubMed Central Google Scholar
Koestler, D. C. et al. Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL). BMC Bioinform. 17, 120 (2016).
Article Google Scholar
Gervin, K. et al. Systematic evaluation and validation of reference and library selection methods for deconvolution of cord blood DNA methylation data. Clin. Epigenetics 11, 125 (2019).
Article PubMed PubMed Central Google Scholar
Bell-Glenn, S., Thompson, J. A., Salas, L. A. & Koestler, D. C. A novel framework for the identification of reference dna methylation libraries for reference-based deconvolution of cellular mixtures. Front. Bioinform. 2 https://doi.org/10.3389/fbinf.2022.835591 (2022).
Juzenas, S. et al. A comprehensive, cell specific microRNA catalogue of human peripheral blood. Nucleic Acids Res. 45, 9290–9301 (2017).
Article CAS PubMed PubMed Central Google Scholar
Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, 3 (2004).
Seddiki, N., Brezar, V., Ruffin, N., Levy, Y. & Swaminathan, S. Role of miR-155 in the regulation of lymphocyte immune function and disease. Immunology 142, 32–38 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chen, L., Gao, D., Shao, Z., Zheng, Q. & Yu, Q. miR-155 indicates the fate of CD4(+) T cells. Immunol. Lett. 224, 40–49 (2020).
Article CAS PubMed Google Scholar
Cichocki, F. et al. Cutting edge: microRNA-181 promotes human NK cell development by regulating Notch signaling. J. Immunol. 187, 6171–6175 (2011).
Article CAS PubMed Google Scholar
Teschendorff, A. E., Breeze, C. E., Zheng, S. C. & Beck, S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinforma. 18, 105 (2017).
Article Google Scholar
Zheng, S. C. et al. A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix. Epigenomics 10, 925–940 (2018).
Article CAS PubMed Google Scholar
Nadel, B. B. et al. The Gene Expression Deconvolution Interactive Tool (GEDIT): accurate cell type quantification from gene expression data. Gigascience 10 https://doi.org/10.1093/gigascience/giab002 (2021).
Gong, T. & Szustakowski, J. D. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics 29, 1083–1085 (2013).
Article CAS PubMed Google Scholar
de Rie, D. et al. An integrated expression atlas of miRNAs and their promoters in human and mouse. Nat. Biotechnol. 35, 872–878 (2017).
Article PubMed PubMed Central Google Scholar
Lorenzi, L. et al. The RNA Atlas expands the catalog of human non-coding RNAs. Nat. Biotechnol. 39, 1453–1465 (2021).
Article CAS PubMed Google Scholar
McCall, M. N. et al. Toward the human cellular microRNAome. Genome Res. 27, 1769–1781 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kinser, H. E. & Pincus, Z. MicroRNAs as modulators of longevity and the aging process. Hum. Genet. 139, 291–308 (2020).
Article CAS PubMed Google Scholar
Munoz-Culla, M. et al. Progressive changes in non-coding RNA profile in leucocytes with age. Aging (Albany NY) 9, 1202–1218 (2017).
Article CAS PubMed Google Scholar
Zhang, J. Y. et al. Single-cell landscape of immunological responses in patients with COVID-19. Nat. Immunol. 21, 1107–1118 (2020).
Article CAS PubMed Google Scholar
Wen, W. et al. Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing. Cell Discov. 6, 31 (2020).
Article CAS PubMed PubMed Central Google Scholar
Togami, Y. et al. Significance of interferon signaling based on mRNA-microRNA integration and plasma protein analyses in critically ill COVID-19 patients. Mol. Ther. Nucleic Acids 29, 343–353 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zheng, H. Y. et al. Longitudinal transcriptome analyses show robust T cell immunity during recovery from COVID-19. Signal Transduct. Target Ther. 5, 294 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ren, X. et al. COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas. Cell 184, 1895–1913.e1819 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bingle, L., Brown, N. J. & Lewis, C. E. The role of tumour-associated macrophages in tumour progression: implications for new anticancer therapies. J. Pathol. 196, 254–265 (2002).
Article CAS PubMed Google Scholar
Lewis, C. E. & Pollard, J. W. Distinct role of macrophages in different tumor microenvironments. Cancer Res. 66, 605–612 (2006).
Article CAS PubMed Google Scholar
Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
Article PubMed Google Scholar
Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).
Article CAS PubMed Google Scholar
Pavel, A. B. et al. Alterations in Bronchial Airway miRNA Expression for Lung Cancer Detection. Cancer Prev. Res. (Philos.) 10, 651–659 (2017).
Article CAS Google Scholar
Yu, N. et al. Identification of tumor suppressor miRNAs by integrative miRNA and mRNA sequencing of matched tumor-normal samples in lung adenocarcinoma. Mol. Oncol. 13, 1356–1368 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zheng, S. C., Breeze, C. E., Beck, S. & Teschendorff, A. E. Identification of differentially methylated cell types in epigenome-wide association studies. Nat. Methods 15, 1059–1066 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jaakkola, M. K. & Elo, L. L. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform. 23, https://doi.org/10.1093/bib/bbab433 (2022).
Xu, F. et al. Up-regulation of microRNA-183-3p is a potent prognostic marker for lung adenocarcinoma of female non-smokers. Clin. Transl. Oncol. 16, 980–985 (2014).
Article CAS PubMed Google Scholar
Zhang, T. et al. Clinical significance of miR-183-3p and miR-182-5p in NSCLC and their correlation. Cancer Manag. Res. 13, 3539–3550 (2021).
Article PubMed PubMed Central Google Scholar
Hagihara, Y. et al. IdeNtification of Tumor-suppressive Mir-139-3p-Regulated Genes: TRIP13 as a therapeutic target in lung adenocarcinoma. Cancers (Basel) 15 https://doi.org/10.3390/cancers15235571 (2023).
Chen, X. et al. Paeonol inhibits melanoma growth by targeting PD1 through upregulation of miR-139-5p. Biochem. Biophys. Res. Commun. 656, 86–96 (2023).
Article CAS PubMed Google Scholar
Zheng, X., Zhang, Y., Wu, S., Jiang, B. & Liu, Y. MiR-139-3p Targets CHEK1 modulating DNA repair and cell viability in lung squamous carcinoma cells. Mol. Biotechnol. 64, 832–840 (2022).
Article CAS PubMed Google Scholar
Chen, Z. & Wu, A. Progress and challenge for computational quantification of tissue immune cells. Brief Bioinform. 22 https://doi.org/10.1093/bib/bbaa358 (2021).
Nadel, B. B. et al. Systematic evaluation of transcriptomics-based deconvolution methods and references using thousands of clinical samples. Brief Bioinform. 22 https://doi.org/10.1093/bib/bbab265 (2021).
Song, J. & Kuan, P. F. A systematic assessment of cell type deconvolution algorithms for DNA methylation data. Brief Bioinform. 23 https://doi.org/10.1093/bib/bbac449 (2022).
Zhou, L., Li, X., Liu, Q., Zhao, F. & Wu, J. Small RNA transcriptome investigation based on next-generation sequencing technology. J. Genet. Genomics 38, 505–513 (2011).
Article CAS PubMed Google Scholar
Altschuler, S. J. & Wu, L. F. Cellular heterogeneity: do differences make a difference?. Cell 141, 559–563 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kennedy, E. M. et al. Placental microRNA expression associates with birthweight through control of adipokines: results from two independent cohorts. Epigenetics 16, 770–782 (2021).
Article PubMed Google Scholar
Heinemann, F. G. et al. Serum miR-122-5p and miR-206 expression: non-invasive prognostic biomarkers for renal cell carcinoma. Clin. Epigenetics 10, 11 (2018).
Article PubMed PubMed Central Google Scholar
Ormseth, M. J. et al. Development and Validation of a MicroRNA Panel to Differentiate Between Patients with Rheumatoid Arthritis or Systemic Lupus Erythematosus and Controls. J. Rheumatol. 47, 188–196 (2020).
Article CAS PubMed Google Scholar
Salas, L. A. et al. An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray. Genome Biol. 19, 64 (2018).
Article PubMed PubMed Central Google Scholar
Salas, L. A. et al. Enhanced cell deconvolution of peripheral blood using DNA methylation for high-resolution immune profiling. Nat. Commun. 13, 761 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hunt, G. J., Freytag, S., Bahlo, M. & Gagnon-Bartsch, J. A. dtangle: accurate and robust cell type deconvolution. Bioinformatics 35, 2093–2099 (2019).
Article CAS PubMed Google Scholar
Andrade Barbosa, B. et al. Bayesian log-normal deconvolution for enhanced in silico microdissection of bulk gene expression data. Nat. Commun. 12, 6106 (2021).
Article CAS PubMed PubMed Central Google Scholar
Avila Cobos, F., Vandesompele, J., Mestdagh, P. & De Preter, K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 34, 1969–1979 (2018).
Article PubMed Google Scholar
Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10, 380 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tsoucas, D. et al. Accurate estimation of cell-type composition from gene expression data. Nat. Commun. 10, 2975 (2019).
Article PubMed PubMed Central Google Scholar
Cobos, F. A. et al. Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes. Genome Biol. 24, 177 (2023).
Article CAS PubMed PubMed Central Google Scholar
Garmire, L. X. et al. Challenges and perspectives in computational deconvolution of genomics data. Nat. Methods 21, 391–400 (2024).
Article CAS PubMed Google Scholar
Allantaz, F. et al. Expression profiling of human immune cell subsets identifies miRNA-mRNA regulatory relationships correlated with cell type specific expression. PLoS ONE 7, e29979 (2012).
Article CAS PubMed PubMed Central Google Scholar
Huang, H. Y. et al. miRTarBase 2020: updates to the experimentally validated microRNA-target interaction database. Nucleic Acids Res. 48, D148–D154 (2020).
CAS PubMed Google Scholar
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Camb.) 2, 100141 (2021).
CAS PubMed Google Scholar

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (91959106 to Z.Y.), the Foundation of the Shanghai Municipal Education Commission (24RGZNC02 to Z.Y.), Key University Science Research Project of Anhui Province (2023AH051768 to J.L.), Climbing Peak Training Program for Innovative Technology team of Yijishan Hospital, Wannan Medical College (PF201904 to H.Y.), Peak Training Program for Scientific Research of Yijishan Hospital, Wannan Medical College (GF2019G15 to H.Y.), the Open Project of Key Laboratory of Non-coding RNA Transformation Research of Anhui Higher Education Institution, Wannan Medical College (RNA202205 to H.Y.), the talent project of the First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College) (YR202422 to H.Y.).

Author information

These authors contributed equally: Shaoying Zhu, Hui Yang.

Authors and Affiliations

Center for Medical Research and Innovation of Pudong Hospital, Fudan University Pudong Medical Center, and Shanghai Key Laboratory of Medical Epigenetics, International Co-laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, 200032, Shanghai, China
Shaoying Zhu & Zhen Yang
Anhui Province Key Laboratory of Non-coding RNA Basic and Clinical Transformation, Tissue Bank, Central Laboratory, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China
Hui Yang, Wei Huang & Qi Chen
Department of Gastrointestinal Surgery, The First Affiliated Hospital of Wannan Medical College (Yijishan Hospital of Wannan Medical College), Wuhu, Anhui, China
Jun Liu & Qingsheng Fu
CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, 200031, Shanghai, China
Andrew E. Teschendorff
Shanghai Fifth People’s Hospital, Fudan University, 200032, Shanghai, China
Yungang He
Intelligent Medicine Institute, Shanghai Medical College, Fudan University, 131 Dongan Road, 200032, Shanghai, China
Yungang He & Zhen Yang

Authors

Shaoying Zhu
View author publications
Search author on:PubMed Google Scholar
Hui Yang
View author publications
Search author on:PubMed Google Scholar
Jun Liu
View author publications
Search author on:PubMed Google Scholar
Qingsheng Fu
View author publications
Search author on:PubMed Google Scholar
Wei Huang
View author publications
Search author on:PubMed Google Scholar
Qi Chen
View author publications
Search author on:PubMed Google Scholar
Andrew E. Teschendorff
View author publications
Search author on:PubMed Google Scholar
Yungang He
View author publications
Search author on:PubMed Google Scholar
Zhen Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.Y. conceived and designed the study. Z.Y. wrote the manuscript. S.Z. and Y.H. performed the statistical and bioinformatics analyses. J.L., Q.F., W.H., and Q.C. collected and processed samples. H.Y. performed experimental validation analysis. A.E.T. provided useful feedback.

Corresponding author

Correspondence to Zhen Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1-13

Reporting Summary

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, S., Yang, H., Liu, J. et al. An improved reference library and method for accurate cell-type deconvolution of bulk-tissue miRNA data. Nat Commun 16, 5508 (2025). https://doi.org/10.1038/s41467-025-60521-x

Download citation

Received: 26 June 2024
Accepted: 27 May 2025
Published: 01 July 2025
Version of record: 01 July 2025
DOI: https://doi.org/10.1038/s41467-025-60521-x

This article is cited by

Elucidating the risk factors and oncogene drivers of acute myeloid leukemia
- Daniel Ruben Akiola Sanya
- Djamila Onésime
Human Cell (2025)