Network clustering algorithms and preprocessing pipelines for robust cell type identification in single-cell RNA sequencing data

Nasrollahi, Fatemeh Sadat Fatemi; Silva, Filipi Nascimento; Liu, Shiwei; Chaudhuri, Soumilee; Yu, Meichen; Wang, Juexin; Nho, Kwangsik; Saykin, Andrew J.; Bennett, David A.; Sporns, Olaf; Fortunato, Santo

doi:10.1038/s41598-026-49033-w

Download PDF

Article
Open access
Published: 15 May 2026

Network clustering algorithms and preprocessing pipelines for robust cell type identification in single-cell RNA sequencing data

Fatemeh Sadat Fatemi Nasrollahi¹,
Filipi Nascimento Silva¹,
Shiwei Liu²,
Soumilee Chaudhuri²,
Meichen Yu²,
Juexin Wang³,
Kwangsik Nho²,
Andrew J. Saykin²,
David A. Bennett⁴,
Olaf Sporns⁵ &
…
Santo Fortunato¹

Scientific Reports (2026) Cite this article

342 Accesses
1 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Single cell RNA-seq (scRNA-seq) technologies provide unprecedented resolution representing transcriptomics at the level of single cell. One of the biggest challenges in scRNA-seq data analysis is the cell type annotation, which is usually inferred by cell separation approaches. In-silico algorithms that accurately identify individual cell types in ongoing single-cell sequencing studies are crucial for unlocking cellular heterogeneity and understanding the biological basis of diseases. In this study, we focus on robustly identifying cell types in single-cell RNA sequencing data; we conduct a comparative analysis using methods established in biology, like Seurat, Leiden, and WGCNA, as well as network-based methods Infomap, statistical inference via Stochastic Block Models (SBM), and single-cell Graph Neural Networks (scGNN). We also analyze preprocessing pipelines to identify and optimize key components in the process, explicitly considering their role in mitigating inherent data noise and potential batch effects for robust cell type identification. Leveraging three independent datasets, PBMC, ROSMAP, and MOp, we employ clustering algorithms on cell-cell networks derived from gene expression data. Our findings reveal that clusters identified by multiresolution Infomap and Leiden show a closer alignment, with Infomap standing out as a particularly effective approach. Infomap notably offers valuable insights for the precise characterization of cellular landscapes related to neurodegeneration and immunology in scRNA-seq.

Multi-level cellular and functional annotation of single-cell transcriptomes using scPipeline

Article Open access 28 October 2022

Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data

Article Open access 07 January 2022

Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding

Article 04 June 2024

Acknowledgements

ROSMAP is supported by P30AG10161, P30AG72975, R01AG15819, R01AG17917, U01AG46152, and U01AG61356. This work utilized Indiana University Jetstream2 CPU through allocation BIO230158 from the Advanced Cyber-infrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. The instance has 32 CPU cores and 125 GB of RAM.

Funding

F. N. S. was supported by NIH grant R01-AI175239. S. L. was supported by CLEAR-AD Diversity Scholarship (NIH U19 AG074879). S. C. was supported by ADNI Health Equity Scholarship (ADNI HESP) a sub-award of NIA grant (U19 AG024904). M. Y. was supported by the Alzheimer’s Association: AARF-22-722571. A. J. S. was supported by multiple NIH grants (P30 AG010133, P30 AG072976, R01 AG019771, R01 AG057739, U19 AG024904, R01 LM013463, R01 AG068193, T32 AG071444, U01 AG068057, U01 AG072177, and U19 AG074879). K. N. was supported by NIH grants R01LM012535, U01AG072177, and U19AG0748790. J. W. was supported by NIH grant R01DK138504. S. F. was supported by NIH grants U19 AG074879, U01AG072177, and R01-AI175239.

Author information

Authors and Affiliations

Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Fatemeh Sadat Fatemi Nasrollahi, Filipi Nascimento Silva & Santo Fortunato
Center for Neuroimaging and the Indiana Alzheimer’s Disease Research Center, Indiana University, IN, USA
Shiwei Liu, Soumilee Chaudhuri, Meichen Yu, Kwangsik Nho & Andrew J. Saykin
Luddy School of Informatics, Computing, and Engineering, Indiana University, Indianapolis, IN, USA
Juexin Wang
Rush Alzheimer’s Disease Center (Drs. Bennett, Schneider, and Wilson) and Rush Institute for Healthy Aging (Drs. Bienias and Evans), Rush University Medical Center, Chicago, IL, USA
David A. Bennett
Department of Psychology, Indiana University, IN, USA
Olaf Sporns

Authors

Fatemeh Sadat Fatemi Nasrollahi
View author publications
Search author on:PubMed Google Scholar
Filipi Nascimento Silva
View author publications
Search author on:PubMed Google Scholar
Shiwei Liu
View author publications
Search author on:PubMed Google Scholar
Soumilee Chaudhuri
View author publications
Search author on:PubMed Google Scholar
Meichen Yu
View author publications
Search author on:PubMed Google Scholar
Juexin Wang
View author publications
Search author on:PubMed Google Scholar
Kwangsik Nho
View author publications
Search author on:PubMed Google Scholar
Andrew J. Saykin
View author publications
Search author on:PubMed Google Scholar
David A. Bennett
View author publications
Search author on:PubMed Google Scholar
Olaf Sporns
View author publications
Search author on:PubMed Google Scholar
Santo Fortunato
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Fatemeh Sadat Fatemi Nasrollahi or Santo Fortunato.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The Omega Index $\Omega (s_1, s_2)$ measures the similarity between two clustering solutions that may include overlapping clusters. To calculate it, we first determine the observed agreement, $\text {Obs}(s_1, s_2)$, by summing the proportion of pairs that both clustering solutions agree to assign to the same number of clusters. This is expressed as

$$\text {Obs}(s_1, s_2) = \sum _{j=0}^{\min (J,K)} \frac{A_j}{N}$$

where $A_j$ is the count of pairs that both solutions agree to assign to $j$ clusters, and $N$ is the total number of pairs. We then calculate the expected agreement, $\text {Exp}(s_1, s_2)$, as

$$\text {Exp}(s_1, s_2) = \sum _{j=0}^{\min (J,K)} \frac{N_{j1} \cdot N_{j2}}{N^2}$$

where $N_{j1}$ and $N_{j2}$ represent the total pairs assigned to $j$ clusters in solutions 1 and 2, respectively. Finally, the Omega Index is calculated as

$$\Omega (s_1, s_2) = \frac{\text {Obs}(s_1, s_2) - \text {Exp}(s_1, s_2)}{1 - \text {Exp}(s_1, s_2)}$$

This index ranges from 0 to 1, with 1 indicating perfect agreement between the clustering solutions.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nasrollahi, F.S.F., Silva, F.N., Liu, S. et al. Network clustering algorithms and preprocessing pipelines for robust cell type identification in single-cell RNA sequencing data. Sci Rep (2026). https://doi.org/10.1038/s41598-026-49033-w

Download citation

Received: 08 July 2025
Accepted: 13 April 2026
Published: 15 May 2026
DOI: https://doi.org/10.1038/s41598-026-49033-w