Abstract
Cytometry-based single-cell proteomics (SCP) has emerged as a powerful technique that greatly advances our understanding of complex biological systems with a new level of granularity. Various methods have been developed to process cytometry-based SCP data. However, it remains extremely challenging to identify the well-performing processing workflows for specific datasets. Here, we develop ANPELA, an out-of-the-box method for navigating the proteomic data processing based on large-scale screening. It enables a comparison among the performances of thousands of the processing workflows in identifying cell subpopulations and inferring pseudo-time trajectories based on machine learning. Several cases are then analyzed, highlighting its ability to identify the optimal ways of data processing for cytometry-based SCP studies. A new package is also deployed to ensure multiscenario usability (such as desktop software, R package and online server), data security (enabling local and open-source execution) and a user-friendly interface (realizing interactive and visualizable applications). Overall, ANPELA can be utilized by a broad audience, including those without coding skills, and is freely accessible and downloadable at https://idrblab.org/anpela/. Its execution time may range from minutes to hours depending on the size of the analyzed data.
Key points
-
ANPELA is a tool for evaluating the utility of proteomic data processing workflows and is intended to facilitate the automatic selection of the most appropriate processing methods for single cell proteomic data.
-
ANPELA enables a systematic assessment of existing data processing methods for cell subpopulation identification and pseudo-time trajectory inference.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Data availability
All datasets that were analyzed within this protocol had been made downloadable on the website https://idrblab.org/anpela/ANPELA_exampledata.zip. These datasets were also accessible in the SingPro database (https://idrblab.org/singpro/) through IDs SCP57021, SCP11272, SCP43132, SCP77365, SCP80719, SCP47065, SCP37430, SCP36391, SCP96723 and SCP93731.
Code availability
All source codes of this protocol are available for use under a GPL v3 license and can be acquired via GitHub at https://github.com/idrblab/ANPELA. ANPELA web platform is freely available for academic purposes at https://idrblab.org/anpela. ANPELA desktop software is available for academic use at https://idrblab.org/anpela/ANPELA-Setup.exe.
References
Doerr, A. Single-cell proteomics. Nat. Methods 16, 20 (2019).
Vistain, L. F. & Tay, S. Single-cell proteomics. Trends Biochem. Sci. 46, 661–672 (2021).
Bennett, H. M., Stephenson, W., Rose, C. M. & Darmanis, S. Single-cell proteomics enabled by next-generation sequencing or mass spectrometry. Nat. Methods 20, 363–374 (2023).
Hartmann, F. J. & Bendall, S. C. Immune monitoring using mass cytometry and related high-dimensional imaging approaches. Nat. Rev. Rheumatol. 16, 87–99 (2020).
Labib, M. & Kelley, S. O. Single-cell analysis targeting the proteome. Nat. Rev. Chem. 4, 143–158 (2020).
Su, Y., Shi, Q. & Wei, W. Single cell proteomics in biomedicine: high-dimensional data acquisition, visualization, and analysis. Proteomics 17, 1600267 (2017).
Kanno, H. et al. High-throughput fluorescence lifetime imaging flow cytometry. Nat. Commun. 15, 7376 (2024).
Fürstenau, M. et al. High resolution assessment of minimal residual disease (MRD) by next-generation sequencing (NGS) and high-sensitivity flow cytometry (hsFCM) in the phase 3 GAIA (CLL13) trial. Blood 138, 72 (2021).
Lian, X. et al. SingPro: a knowledge base providing single-cell proteomic data. Nucleic Acids Res. 52, D552–D561 (2024).
Hutton, C. et al. Single-cell analysis defines a pancreatic fibroblast lineage that supports anti-tumor immunity. Cancer Cell 39, 1227–1244 (2021).
Arnett, L. P. et al. Reagents for mass cytometry. Chem. Rev. 123, 1166–1205 (2023).
Bandura, D. R. et al. Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, 6813–6822 (2009).
Liechti, T. et al. An updated guide for the perplexed: cytometry in the high-dimensional era. Nat. Immunol. 22, 1190–1197 (2021).
Kröger, C. et al. Unveiling the power of high-dimensional cytometry data with cyCONDOR. Nat. Commun. 15, 10702 (2024).
Roca, C. P. et al. AutoSpill is a principled framework that simplifies the analysis of multichromatic flow cytometry data. Nat. Commun. 12, 2890 (2021).
Chevrier, S. et al. Compensation of signal spillover in suspension and imaging mass cytometry. Cell Syst. 6, 612–620 (2018).
Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).
Spitzer, M. H. & Nolan, G. P. Mass cytometry: single cells, many features. Cell 165, 780–791 (2016).
Van Gassen, S. et al. FlowSOM: using self-organizing maps for visualization and interpretation of cytometry data. Cytom. A 87, 636–645 (2015).
Zunder, E. R., Lujan, E., Goltsev, Y., Wernig, M. & Nolan, G. P. A continuous molecular roadmap to iPSC reprogramming through progression analysis of single-cell mass cytometry. Cell Stem Cell 16, 323–337 (2015).
Quintelier, K. et al. Analyzing high-dimensional cytometry data using FlowSOM. Nat. Protoc. 16, 3775–3801 (2021).
Ko, M. E. et al. FLOW-MAP: a graph-based, force-directed layout algorithm for trajectory mapping in single-cell time course datasets. Nat. Protoc. 15, 398–420 (2020).
Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016).
Liu, X. et al. A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biol. 20, 297 (2019).
Brooks, T. G., Lahens, N. F., Mrčela, A. & Grant, G. R. Challenges and best practices in omics benchmarking. Nat. Rev. Genet. 25, 326–339 (2024).
Aghaeepour, N. et al. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228–238 (2013).
Zhang, Y., Sun, H., Lian, X., Tang, J. & Zhu, F. ANPELA: significantly enhanced quantification tool for cytometry-based single-cell proteomics. Adv. Sci. 10, e2207061 (2023).
Navarro, P. et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat. Biotechnol. 34, 1130–1136 (2016).
Tang, J. et al. ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief. Bioinform. 21, 621–636 (2020).
Tang, J. et al. Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains. Mol. Cell Proteom. 18, 1683–1699 (2019).
Liu, P. et al. Comprehensive evaluation and practical guideline of gating methods for high-dimensional cytometry data: manual gating, unsupervised clustering, and auto-gating. Brief. Bioinform. 26, bbae633 (2024).
Keeler, A. B. et al. A developmental atlas of somatosensory diversification and maturation in the dorsal root ganglia by single-cell mass cytometry. Nat. Neurosci. 25, 1543–1558 (2022).
Lammel, D. R., Meierhofer, D., Johnston, P., Mbedi, S. & Rillig, M. C. The effects of arbuscular mycorrhizal fungi (AMF) and Rhizophagus irregularis in soil microorganisms accessed by metatranscriptomics and metaproteomics. Preprint at bioRxiv https://doi.org/10.1101/860932 (2019).
Jurburg, S. D. et al. The community ecology perspective of omics data. Microbiome 10, 225 (2022).
Shen, S. et al. High-quality and robust protein quantification in large clinical/pharmaceutical cohorts with IonStar proteomics investigation. Nat. Protoc. 18, 700–731 (2023).
Wang, S. et al. NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res. 48, e83 (2020).
Cui, X. et al. Assessing the effectiveness of direct data merging strategy in long-term and large-scale pharmacometabonomics. Front. Pharmacol. 10, 127 (2019).
Islam, M. A., Majumder, M. Z. H., Miah, M. S. & Jannaty, S. Precision healthcare: a deep dive into machine learning algorithms and feature selection strategies for accurate heart disease prediction. Comput. Biol. Med. 176, 108432 (2024).
Andersen, T. O., Kunath, B. J., Hagen, L. H., Arntzen, M. & Pope, P. B. Rumen metaproteomics: closer to linking rumen microbial function to animal productivity traits. Methods 186, 42–51 (2021).
Zhang, T. et al. Block design with common reference samples enables robust large-scale label-free quantitative proteome profiling. J. Proteome Res. 19, 2863–2872 (2020).
Louta, M., Banti, K. & Karampelia, I. Emerging technologies for sustainable agriculture: the power of humans and the way ahead. IEEE Access 12, 98492–98529 (2024).
Lundberg, E. & Borner, G. H. H. Spatial proteomics: a powerful discovery tool for cell biology. Nat. Rev. Mol. Cell Biol. 20, 285–302 (2019).
Mund, A., Brunner, A. D. & Mann, M. Unbiased spatial proteomics with single-cell resolution in tissues. Mol. Cell 82, 2335–2349 (2022).
Chang, Q. et al. Imaging mass cytometry. Cytom. A 91, 160–169 (2017).
Rius Rigau, A. et al. Characterization of vascular niche in systemic sclerosis by spatial proteomics. Circ. Res. 134, 875–891 (2024).
Phongpreecha, T. et al. Single-cell peripheral immunoprofiling of Alzheimer’s and Parkinson’s diseases. Sci. Adv. 6, eabd5575 (2020).
Lee, H. C., Kosoy, R., Becker, C. E., Dudley, J. T. & Kidd, B. A. Automated cell type discovery and classification through knowledge transfer. Bioinformatics 33, 1689–1695 (2017).
Somol, P. & Novovicová, J. Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1921–1939 (2010).
Ji, Z. & Ji, H. TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, e117 (2016).
Ahmed, S., Rattray, M. & Boukouvalas, A. GrandPrix: scaling up the Bayesian GPLVM for single-cell data. Bioinformatics 35, 47–54 (2019).
Qiu, P. et al. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat. Biotechnol. 29, 886–891 (2011).
Verrou, K. M., Tsamardinos, I. & Papoutsoglou, G. Learning pathway dynamics from single-cell proteomic data: a comparative study. Cytom. A 97, 241–252 (2020).
Chen, H. et al. Cytofkit: a bioconductor package for an integrated mass cytometry data analysis pipeline. PLoS Comput. Biol. 12, e1005112 (2016).
Monaco, G. et al. flowAI: automatic and interactive anomaly discerning tools for flow cytometry data. Bioinformatics 32, 2473–2480 (2016).
Fletez-Brant, K., Špidlen, J., Brinkman, R. R., Roederer, M. & Chattopadhyay, P. K. flowClean: automated identification and removal of fluorescence anomalies in flow cytometry data. Cytom. A 89, 461–471 (2016).
Meskas, J., Yokosawa, D., Wang, S., Segat, G. C. & Brinkman, R. R. flowCut: an R package for automated removal of outlier events and flagging of files based on time versus fluorescence analysis. Cytom. A 103, 71–81 (2023).
Hahne, F. et al. Per-channel basis normalization methods for flow cytometry data. Cytom. A 77, 121–131 (2010).
Finak, G., Perez, J. M., Weng, A. & Gottardo, R. Optimizing transformations for automated, high throughput analysis of flow cytometry data. BMC Bioinform. 11, 546 (2010).
Azad, A., Rajwa, B. & Pothen, A. flowVS: channel-specific variance stabilization in flow cytometry. BMC Bioinform. 17, 291 (2016).
Emmaneel, A. et al. PeacoQC: peak-based selection of high quality cytometry data. Cytom. A 101, 325–338 (2022).
Guazzini, M., Reisach, A. G., Weichwald, S. & Seiler, C. spillR: spillover compensation in mass cytometry data. Bioinformatics 40, btae337 (2024).
Chen, T. J. & Kotecha, N. Cytobank: providing an analytics platform for community cytometry data analysis and collaboration. Curr. Top. Microbiol. Immunol. 377, 127–157 (2014).
Qian, Y. et al. FCSTrans: an open source software system for FCS file conversion and data transformation. Cytom. A 81, 353–356 (2012).
Hahne, F. et al. flowCore: a bioconductor package for high throughput flow cytometry. BMC Bioinform. 10, 106 (2009).
Arunachalam, P. S. et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science 369, 1210–1220 (2020).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Cui, G. et al. Spatial molecular anatomy of germ layers in the gastrulating cynomolgus monkey embryo. Cell Rep. 40, 111285 (2022).
Fu, J. et al. Optimization of metabolomic data processing using NOREVA. Nat. Protoc. 17, 129–151 (2022).
Ingelfinger, F. et al. Single-cell profiling of myasthenia gravis identifies a pathogenic T cell signature. Acta Neuropathol. 141, 901–915 (2021).
Candia, J. et al. From cellular characteristics to disease diagnosis: uncovering phenotypes with supercells. PLoS Comput. Biol. 9, e1003215 (2013).
Hartmann, F. J. et al. Comprehensive immune monitoring of clinical trials to advance human immunotherapy. Cell Rep. 28, 819–831 (2019).
Suwandi, J. S. et al. Multidimensional analyses of proinsulin peptide-specific regulatory T cells induced by tolerogenic dendritic cells. J. Autoimmun. 107, 102361 (2020).
Dai, Y. et al. CytoTree: an R/Bioconductor package for analysis and visualization of flow and mass cytometry data. BMC Bioinform. 22, 138 (2021).
Barone, S. M. et al. Unsupervised machine learning reveals key immune cell subsets in COVID-19, rhinovirus infection, and cancer therapy. eLife 10, e64653 (2021).
Bodenmiller, B. et al. Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. Nat. Biotechnol. 30, 858–867 (2012).
Gaudillière, B. et al. Clinical recovery from surgery correlates with single-cell immune signatures. Sci. Transl. Med. 6, 255ra131 (2014).
Bagwell, C. B. & Adams, E. G. Fluorescence spectral overlap compensation for any number of flow cytometry parameters. Ann. NY Acad. Sci. 677, 167–184 (1993).
Folcarelli, R. et al. Transformation of multicolour flow cytometry data with OTflow prevents misleading multivariate analysis results and incorrect immunological conclusions. Cytom. A 101, 72–85 (2022).
den Braanker, H., Bongenaar, M. & Lubberts, E. How to prepare spectral flow cytometry datasets for high dimensional data analysis: a practical workflow. Front. Immunol. 12, 768113 (2021).
Weber, L. M., Nowicka, M., Soneson, C. & Robinson, M. D. diffcyt: differential discovery in high-dimensional cytometry via high-resolution clustering. Commun. Biol. 2, 183 (2019).
Sibbertsen, F. et al. Phenotypic analysis of the pediatric immune response to SARS-CoV-2 by flow cytometry. Cytom. A 101, 220–227 (2022).
Wang, S. & Brinkman, R. R. Data-driven flow cytometry analysis. Methods Mol. Biol. 1989, 245–265 (2019).
Acknowledgements
We acknowledge the National Natural Science Foundation of China (grant nos. 22220102001, 82373790, and 82404511); Natural Science Foundation of Zhejiang (grant no. RG25H300001); National Key R&D Programs of China (grant no. 2024YFA1307503); Information Technology Center and State Key Lab of CAD&CG, Zhejiang University.
Author information
Authors and Affiliations
Contributions
F.Z. conceived the idea and designed the entire research. H.C.S., Y. Zhou., R.Y.J. and Y.X.L. wrote codes. H.C.S., Y. Zhou., C.B.G. and Z.Q.P. conducted benchmark studies. H.C.S., Y. Zhou., M.J.M., X.C.L., B.H.C., T.L.N., Y. Zhang., Y.T.Z., X.N.S., H.Y., X.S., W.Q.X and B.L.Z. finished statistical analysis. Y.B.D., J.N.D., S.Q.L., T.T.F., Y. Zhang., M.X. and Q.X.Y. visualized the results. F.Z. and T.T.F. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Protocols thanks Florian Ingelfinger and the other, anonymous reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references
Zhang, Y. et al. Adv. Sci. 10, e2207061 (2023): https://doi.org/10.1002/advs.202207061
Jurburg, S. D. et al. Microbiome 10, 225 (2022): https://doi.org/10.1186/s40168-022-01423-8
Tang, J. et al. Brief. Bioinform. 21, 621–636 (2020): https://doi.org/10.1093/bib/bby127
Tang, J. et al. Mol. Cell. Proteomics 18, 1683–1699 (2019): https://doi.org/10.1074/mcp.RA118.001169
Cui, X. et al. Front. Pharmacol. 10, 127 (2019): https://doi.org/10.3389/fphar.2019.00127
Extended data
Extended Data Fig. 1 Graphical User Interface of ANPELA and Preparation of Required Data.
(a) The navigation bar of ANPELA software included ‘HOME’, ‘Single-cell Proteomics’, ‘Textual Tutorial’, and ‘Interactive Tutorial’. Clicking on ‘Single-cell Proteomics’ initiated data upload and processing. Clicking on ‘Textual Tutorial’ permitted downloading the textual tutorial. Clicking on ‘Interactive Tutorial’ opened a step-by-step interactive tutorial. (b) The essential data required by ANPELA included FCS files (i.e., raw data files) generated from cytometry-based SCP experiments and a metadata file describing the correlation between the raw data and experimental conditions. The metadata file was a user-created file named ‘metadata.csv’, containing key information in two columns.
Extended Data Fig. 2 GUI of Processing & Assessment and Outcomes of the Assessment.
(a) Simplified GUI for data processing and performance assessment in the desktop software of ANPELA. (b) Results of performance assessment consist of ‘Ranking_Table.csv’ and ‘Ranking_Figure.pdf’, which respectively recorded criteria values and performance levels for all executed data processing workflows.
Supplementary information
Supplementary Information
Supplementary Figs. 1–6, Tables 1–3 and Methods 1–3.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, H., Zhou, Y., Jiang, R. et al. Navigating the data processing for cytometry-based single-cell proteomics. Nat Protoc (2025). https://doi.org/10.1038/s41596-025-01257-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41596-025-01257-2