Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Network clustering algorithms and preprocessing pipelines for robust cell type identification in single-cell RNA sequencing data
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 15 May 2026

Network clustering algorithms and preprocessing pipelines for robust cell type identification in single-cell RNA sequencing data

  • Fatemeh Sadat Fatemi Nasrollahi1,
  • Filipi Nascimento Silva1,
  • Shiwei Liu2,
  • Soumilee Chaudhuri2,
  • Meichen Yu2,
  • Juexin Wang3,
  • Kwangsik Nho2,
  • Andrew J. Saykin2,
  • David A. Bennett4,
  • Olaf Sporns5 &
  • …
  • Santo Fortunato1 

Scientific Reports (2026) Cite this article

  • 342 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Biological techniques
  • Computational biology and bioinformatics

Abstract

Single cell RNA-seq (scRNA-seq) technologies provide unprecedented resolution representing transcriptomics at the level of single cell. One of the biggest challenges in scRNA-seq data analysis is the cell type annotation, which is usually inferred by cell separation approaches. In-silico algorithms that accurately identify individual cell types in ongoing single-cell sequencing studies are crucial for unlocking cellular heterogeneity and understanding the biological basis of diseases. In this study, we focus on robustly identifying cell types in single-cell RNA sequencing data; we conduct a comparative analysis using methods established in biology, like Seurat, Leiden, and WGCNA, as well as network-based methods Infomap, statistical inference via Stochastic Block Models (SBM), and single-cell Graph Neural Networks (scGNN). We also analyze preprocessing pipelines to identify and optimize key components in the process, explicitly considering their role in mitigating inherent data noise and potential batch effects for robust cell type identification. Leveraging three independent datasets, PBMC, ROSMAP, and MOp, we employ clustering algorithms on cell-cell networks derived from gene expression data. Our findings reveal that clusters identified by multiresolution Infomap and Leiden show a closer alignment, with Infomap standing out as a particularly effective approach. Infomap notably offers valuable insights for the precise characterization of cellular landscapes related to neurodegeneration and immunology in scRNA-seq.

Similar content being viewed by others

Multi-level cellular and functional annotation of single-cell transcriptomes using scPipeline

Article Open access 28 October 2022

Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data

Article Open access 07 January 2022

Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding

Article 04 June 2024

Acknowledgements

ROSMAP is supported by P30AG10161, P30AG72975, R01AG15819, R01AG17917, U01AG46152, and U01AG61356. This work utilized Indiana University Jetstream2 CPU through allocation BIO230158 from the Advanced Cyber-infrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296. The instance has 32 CPU cores and 125 GB of RAM.

Funding

F. N. S. was supported by NIH grant R01-AI175239. S. L. was supported by CLEAR-AD Diversity Scholarship (NIH U19 AG074879). S. C. was supported by ADNI Health Equity Scholarship (ADNI HESP) a sub-award of NIA grant (U19 AG024904). M. Y. was supported by the Alzheimer’s Association: AARF-22-722571. A. J. S. was supported by multiple NIH grants (P30 AG010133, P30 AG072976, R01 AG019771, R01 AG057739, U19 AG024904, R01 LM013463, R01 AG068193, T32 AG071444, U01 AG068057, U01 AG072177, and U19 AG074879). K. N. was supported by NIH grants R01LM012535, U01AG072177, and U19AG0748790. J. W. was supported by NIH grant R01DK138504. S. F. was supported by NIH grants U19 AG074879, U01AG072177, and R01-AI175239.

Author information

Authors and Affiliations

  1. Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA

    Fatemeh Sadat Fatemi Nasrollahi, Filipi Nascimento Silva & Santo Fortunato

  2. Center for Neuroimaging and the Indiana Alzheimer’s Disease Research Center, Indiana University, IN, USA

    Shiwei Liu, Soumilee Chaudhuri, Meichen Yu, Kwangsik Nho & Andrew J. Saykin

  3. Luddy School of Informatics, Computing, and Engineering, Indiana University, Indianapolis, IN, USA

    Juexin Wang

  4. Rush Alzheimer’s Disease Center (Drs. Bennett, Schneider, and Wilson) and Rush Institute for Healthy Aging (Drs. Bienias and Evans), Rush University Medical Center, Chicago, IL, USA

    David A. Bennett

  5. Department of Psychology, Indiana University, IN, USA

    Olaf Sporns

Authors
  1. Fatemeh Sadat Fatemi Nasrollahi
    View author publications

    Search author on:PubMed Google Scholar

  2. Filipi Nascimento Silva
    View author publications

    Search author on:PubMed Google Scholar

  3. Shiwei Liu
    View author publications

    Search author on:PubMed Google Scholar

  4. Soumilee Chaudhuri
    View author publications

    Search author on:PubMed Google Scholar

  5. Meichen Yu
    View author publications

    Search author on:PubMed Google Scholar

  6. Juexin Wang
    View author publications

    Search author on:PubMed Google Scholar

  7. Kwangsik Nho
    View author publications

    Search author on:PubMed Google Scholar

  8. Andrew J. Saykin
    View author publications

    Search author on:PubMed Google Scholar

  9. David A. Bennett
    View author publications

    Search author on:PubMed Google Scholar

  10. Olaf Sporns
    View author publications

    Search author on:PubMed Google Scholar

  11. Santo Fortunato
    View author publications

    Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Fatemeh Sadat Fatemi Nasrollahi or Santo Fortunato.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The Omega Index \(\Omega (s_1, s_2)\) measures the similarity between two clustering solutions that may include overlapping clusters. To calculate it, we first determine the observed agreement, \(\text {Obs}(s_1, s_2)\), by summing the proportion of pairs that both clustering solutions agree to assign to the same number of clusters. This is expressed as

$$\text {Obs}(s_1, s_2) = \sum _{j=0}^{\min (J,K)} \frac{A_j}{N}$$

where \(A_j\) is the count of pairs that both solutions agree to assign to \(j\) clusters, and \(N\) is the total number of pairs. We then calculate the expected agreement, \(\text {Exp}(s_1, s_2)\), as

$$\text {Exp}(s_1, s_2) = \sum _{j=0}^{\min (J,K)} \frac{N_{j1} \cdot N_{j2}}{N^2}$$

where \(N_{j1}\) and \(N_{j2}\) represent the total pairs assigned to \(j\) clusters in solutions 1 and 2, respectively. Finally, the Omega Index is calculated as

$$\Omega (s_1, s_2) = \frac{\text {Obs}(s_1, s_2) - \text {Exp}(s_1, s_2)}{1 - \text {Exp}(s_1, s_2)}$$

This index ranges from 0 to 1, with 1 indicating perfect agreement between the clustering solutions.

Fig. 21
Fig. 21The alternative text for this image may have been generated using AI.
Full size image

ARI obtained using different methods in ROSMAP: Adjusted Rand Index (ARI) between cell types and detected clusters for SBM, Seurat, Infomap, Leiden, and WGCNA in the ROSMAP full dataset. Both the weighted and unweighted versions of the same networks were considered for algorithms that can handle both. The zoomed-in panel illustrates the ARI across different Markov times using Infomap.

Fig. 22
Fig. 22The alternative text for this image may have been generated using AI.
Full size image

ROSMAP network: Illustration of the networks obtained from ROSMAP dataset. These networks are generated using Seurat and the alternative pipelines.

Fig. 23
Fig. 23The alternative text for this image may have been generated using AI.
Full size image

ARI vs tuning parameter - ROSMAP: ARI across different resolution parameters for the network generated from the ROSMAP dataset.

Fig. 24
Fig. 24The alternative text for this image may have been generated using AI.
Full size image

MOp network: Illustration of the networks obtained from the MOp dataset. These networks are generated using Seurat and the alternative pipelines.

Fig. 25
Fig. 25The alternative text for this image may have been generated using AI.
Full size image

ARI vs tuning parameter - MOp: ARI across different resolution parameters for the network generated from the MOp dataset.

Fig. 26
Fig. 26The alternative text for this image may have been generated using AI.
Full size image

ARI across different configurations of alternative preprocessing pipelines for the MOp dataset. Same as Fig. 10 but for the top 5000 procedures.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nasrollahi, F.S.F., Silva, F.N., Liu, S. et al. Network clustering algorithms and preprocessing pipelines for robust cell type identification in single-cell RNA sequencing data. Sci Rep (2026). https://doi.org/10.1038/s41598-026-49033-w

Download citation

  • Received: 08 July 2025

  • Accepted: 13 April 2026

  • Published: 15 May 2026

  • DOI: https://doi.org/10.1038/s41598-026-49033-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing