Abstract
Single cell RNA-seq (scRNA-seq) technologies provide unprecedented resolution representing transcriptomics at the level of single cell. One of the biggest challenges in scRNA-seq data analysis is the cell type annotation, which is usually inferred by cell separation approaches. In-silico algorithms that accurately identify individual cell types in ongoing single-cell sequencing studies are crucial for unlocking cellular heterogeneity and understanding the biological basis of diseases. In this study, we focus on robustly identifying cell types in single-cell RNA sequencing data; we conduct a comparative analysis using methods established in biology, like Seurat, Leiden, and WGCNA, as well as network-based methods Infomap, statistical inference via Stochastic Block Models (SBM), and single-cell Graph Neural Networks (scGNN). We also analyze preprocessing pipelines to identify and optimize key components in the process, explicitly considering their role in mitigating inherent data noise and potential batch effects for robust cell type identification. Leveraging three independent datasets, PBMC, ROSMAP, and MOp, we employ clustering algorithms on cell-cell networks derived from gene expression data. Our findings reveal that clusters identified by multiresolution Infomap and Leiden show a closer alignment, with Infomap standing out as a particularly effective approach. Infomap notably offers valuable insights for the precise characterization of cellular landscapes related to neurodegeneration and immunology in scRNA-seq.
Similar content being viewed by others
Acknowledgements
ROSMAP is supported by P30AG10161, P30AG72975, R01AG15819, R01AG17917, U01AG46152, and U01AG61356. This work utilized Indiana University Jetstream2 CPU through allocation BIO230158 from the Advanced Cyber-infrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. The instance has 32 CPU cores and 125 GB of RAM.
Funding
F. N. S. was supported by NIH grant R01-AI175239. S. L. was supported by CLEAR-AD Diversity Scholarship (NIH U19 AG074879). S. C. was supported by ADNI Health Equity Scholarship (ADNI HESP) a sub-award of NIA grant (U19 AG024904). M. Y. was supported by the Alzheimer’s Association: AARF-22-722571. A. J. S. was supported by multiple NIH grants (P30 AG010133, P30 AG072976, R01 AG019771, R01 AG057739, U19 AG024904, R01 LM013463, R01 AG068193, T32 AG071444, U01 AG068057, U01 AG072177, and U19 AG074879). K. N. was supported by NIH grants R01LM012535, U01AG072177, and U19AG0748790. J. W. was supported by NIH grant R01DK138504. S. F. was supported by NIH grants U19 AG074879, U01AG072177, and R01-AI175239.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The Omega Index \(\Omega (s_1, s_2)\) measures the similarity between two clustering solutions that may include overlapping clusters. To calculate it, we first determine the observed agreement, \(\text {Obs}(s_1, s_2)\), by summing the proportion of pairs that both clustering solutions agree to assign to the same number of clusters. This is expressed as
where \(A_j\) is the count of pairs that both solutions agree to assign to \(j\) clusters, and \(N\) is the total number of pairs. We then calculate the expected agreement, \(\text {Exp}(s_1, s_2)\), as
where \(N_{j1}\) and \(N_{j2}\) represent the total pairs assigned to \(j\) clusters in solutions 1 and 2, respectively. Finally, the Omega Index is calculated as
This index ranges from 0 to 1, with 1 indicating perfect agreement between the clustering solutions.
ARI obtained using different methods in ROSMAP: Adjusted Rand Index (ARI) between cell types and detected clusters for SBM, Seurat, Infomap, Leiden, and WGCNA in the ROSMAP full dataset. Both the weighted and unweighted versions of the same networks were considered for algorithms that can handle both. The zoomed-in panel illustrates the ARI across different Markov times using Infomap.
ROSMAP network: Illustration of the networks obtained from ROSMAP dataset. These networks are generated using Seurat and the alternative pipelines.
ARI vs tuning parameter - ROSMAP: ARI across different resolution parameters for the network generated from the ROSMAP dataset.
MOp network: Illustration of the networks obtained from the MOp dataset. These networks are generated using Seurat and the alternative pipelines.
ARI vs tuning parameter - MOp: ARI across different resolution parameters for the network generated from the MOp dataset.
ARI across different configurations of alternative preprocessing pipelines for the MOp dataset. Same as Fig. 10 but for the top 5000 procedures.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nasrollahi, F.S.F., Silva, F.N., Liu, S. et al. Network clustering algorithms and preprocessing pipelines for robust cell type identification in single-cell RNA sequencing data. Sci Rep (2026). https://doi.org/10.1038/s41598-026-49033-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-49033-w








