Abstract
Recent advances have enabled the generation of both unpaired (separate profiling) and paired (simultaneous measurement) single-cell multi-modal datasets, driving rapid development of single-cell multi-modal integration tools. Nevertheless, there is a pressing need for a comprehensive benchmark to assess algorithms under varying integrated dataset types, integrated modalities, dataset sizes and data quality. Here we present a systematic benchmark for 40 single-cell multi-modal integration algorithms involving modalities of DNA, RNA, protein and spatial omics for paired, unpaired and mosaic datasets (a mixture of paired and unpaired datasets). We evaluated usability, accuracy and robustness to assist researchers in selecting suitable integration methods tailored to their datasets and applications. Our benchmark provides valuable guidance in the ever-evolving field of single-cell multi-omics.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The BMMC 10X Multiome and CITE-seq datasets52 analyzed in this study are available at https://openproblems.bio/events/2021-09_neurips. The raw sequencing files of BMMC Multiome datasets used in this study are available at GEO database under accession number GSE194122. The HSPC 10X Multiome and CITE-seq datasets53 are available at https://www.kaggle.com/competitions/open-problems-multimodal/data. The SHARE-seq skin data54 can be downloaded from GEO database under accession numbers GSM4156608 and GSM4156597. The COVID19 CITE-seq data55 are available at E-MTAB-10026 (ArrayExpress). The human WBC CITE-seq data38 are available at https://atlas.fredhutch.org/nygc/multimodal-pbmc/. The 10X NSCLC CITE-seq, 10X kidney cancer CITE-seq, 10X mouse brain Multiome and 10X PBMC Multiome datasets were downloaded from the 10X Genomics website (https://www.10xgenomics.com/datasets/). For spatial multi-omic integration tasks, we obtained SPOTS mouse spleen data46 from GSE198353, mouse thymus data41 from https://zenodo.org/records/10362607 (ref. 63) and human lymph node data41 from https://drive.google.com/drive/folders/1RlU3JmHg_LZM1d-o6QORvykYPoulWWMI. The processed input datasets for all benchmark methods are available at a publicly available figshare repository (https://figshare.com/projects/Single-cell_multimodal_integration_benchmark_SCMMIB_register_report_Stage_2_study_/221476).
Code availability
We have uploaded the source code for the evaluation metrics Python package and the scripts for reproducing figures in the stage 2 manuscript to a GitHub repository: https://github.com/bm2-lab/SCMMI_Benchmark/. Additionally, a pipeline for running all benchmark methods has been uploaded to https://github.com/bm2-lab/SCMMIB_pipeline. The interactive website for detailed supplementary results of this study is available at https://bm2-lab.github.io/SCMMIB-reproducibility/. Code is also available in the Zenodo repository: https://doi.org/10.5281/zenodo.14792951(ref. 64).
References
Method of the Year 2019: single-cell multimodal omics. Nat.Methods 17, 1 (2020).
Baysoy, A., Bai, Z., Satija, R. & Fan, R. The technological landscape and applications of single-cell multi-omics. Nat. Rev. Mol. Cell Biol. 24, 695–713 (2023).
Efremova, M. & Teichmann, S. A. Computational methods for single-cell omics across modalities. Nat. Methods 17, 14–17 (2020).
Zhang, D. et al. Spatial epigenome-transcriptome co-profiling of mammalian tissues. Nature 616, 113–122 (2023).
Vickovic, S. et al. SM-Omics is an automated platform for high-throughput spatial multi-omics. Nat. Commun. 13, 795 (2022).
Lee, M. Y. Y., Kaestner, K. H. & Li, M. Benchmarking algorithms for joint integration of unpaired and paired single-cell RNA-seq and ATAC-seq data. Genome Biol. 24, 244 (2023).
Dou, J. et al. Bi-order multimodal integration of single-cell data. Genome Biol. 23, 112 (2022).
Kim, H. J., Lin, Y., Geddes, T. A., Yang, J. Y. H. & Yang, P. CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics 36, 4137–4143 (2020).
Gong, B., Zhou, Y. & Purdom, E. Cobolt: integrative analysis of multimodal single-cell sequencing data. Genome Biol. 22, 351 (2021).
Zuo, C., Dai, H. & Chen, L. Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data. Bioinformatics 37, 4091–4099 (2021).
Ma, A. et al. Single-cell biological network inference using a heterogeneous graph transformer. Nat. Commun. 14, 964 (2023).
Gao, H. et al. A universal framework for single-cell multi-omics data integration with graph convolutional networks. Brief. Bioinform. 24, bbad081 (2023).
Cao, Z.-J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).
Kriebel, A. R. & Welch, J. D. UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization. Nat. Commun. 13, 780 (2022).
Gao, C. et al. Iterative single-cell multi-omic integration using online learning. Nat. Biotechnol. 39, 1000–1007 (2021).
Chen, S. et al. Integration of spatial and single-cell data across modalities with weakly linked features. Nat Biotechnol. 42, 1096–1106 (2023).
Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat. Methods 19, 179–186 (2022).
He, Z. et al. Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS. Nat. Biotechnol. 42, 1594–1605 (2024).
Argelaguet, R. et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 21, 111 (2020).
Litinetskaya, A. et al. Integration and querying of multimodal single-cell data with PoE-VAE. Preprint at bioRxiv https://doi.org/10.1101/2022.03.16.484643 (2025).
Jain, M. S. et al. MultiMAP: dimensionality reduction and integration of multimodal data. Genome Biol. 22, 346 (2021).
Ashuach, T. et al. MultiVI: deep generative model for the integration of multimodal data. Nat. Methods 20, 1222–1231 (2023).
Cao, K., Hong, Y. & Wan, L. Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona. Bioinformatics 38, 211–219 (2021).
Cao, Y. et al. Integrated analysis of multimodal single-cell data with structural similarity. Nucleic Acids Res. 50, e121 (2022).
Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol. 21, 25 (2020).
Xiong, L., Tian, K., Li, Y. & Zhang, Q. C. Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space. Nat. Commun. 13, 6118 (2022).
Liu, Q., Chen, S., Jiang, R. & Wong, W. H. Simultaneous deep generative modelling and clustering of single-cell genomic data. Nat. Mach. Intell. 3, 536–544 (2021).
Lakkis, J. et al. A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nat. Mach. Intell. 4, 940–952 (2022).
Ren, L., Wang, J., Li, Z., Li, Q. & Yu, G. scMCs: a framework for single-cell multi-omics data integration and multiple clusterings. Bioinformatics 39, btad133 (2023).
Lin, X., Tian, T., Wei, Z. & Hakonarson, H. Clustering of single-cell multi-omics data with a multimodal deep learning method. Nat. Commun. 13, 7705 (2022).
Minoura, K., Abe, K., Nam, H., Nishikawa, H. & Shimamura, T. A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data. Cell Rep. Methods 1, 100071 (2021).
Zhang, Z. et al. scMoMaT jointly performs single cell mosaic integration and multi-modal bio-marker detection. Nat. Commun. 14, 384 (2023).
Zuo, C. & Chen, L. Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data. Brief. Bioinform. 22, bbaa287 (2021).
Li, G. et al. A deep generative model for multi-view profiling of single-cell RNA-seq and ATAC-seq data. Genome Biol. 23, 20 (2022).
Du, J. H., Cai, Z. & Roeder, K. Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT. Proc. Natl Acad. Sci. USA 119, e2214414119 (2022).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021).
Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024).
Chen, H., Ryu, J., Vinyard, M. E., Lerer, A. & Pinello, L. SIMBA: single-cell embedding along with features. Nat. Methods 21, 1003–1013 (2023).
Long, Y. et al. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nat. Methods 21, 1658–1667 (2024).
Ghazanfar, S., Guibentif, C. & Marioni, J. C. Stabilized mosaic single-cell data integration using unshared features. Nat. Biotechnol. 42, 284–292 (2024).
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).
Cao, K., Bai, X., Hong, Y. & Wan, L. Unsupervised topological alignment for single-cell multi-omics integration. Bioinformatics 36, i48–i56 (2020).
Cao, K., Gong, Q., Hong, Y. & Wan, L. A unified computational framework for single-cell data integration with optimal transport. Nat. Commun. 13, 7419 (2022).
Ben-Chetrit, N. et al. Integration of whole transcriptome spatial profiling with protein markers. Nat. Biotechnol. 41, 788–793 (2023).
Sha, L. et al. Integrated spatial transcriptomic and proteomic analysis of fresh frozen tissue based on Stereo-seq. Preprint at bioRxiv https://doi.org/10.1101/2023.04.28.538364 (2023).
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Zhou, X., Dong, K. & Zhang, S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat. Comput Sci. 3, 894–906 (2023).
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).
Hu, Y. et al. Benchmarking algorithms for single-cell multi-omics prediction and integration. Nat. Methods 21, 2182–2194 (2024).
Luecken M. D. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (NeurIPS, 2021).
Burkhardt, D. et al. Open problems — multimodal single-cell integration. Kaggle https://kaggle.com/competitions/open-problems-multimodal (2022).
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116 (2020).
Stephenson, E. et al. Single-cell multi-omics analysis of the immune response in COVID-19. Nat. Med. 27, 904–916 (2021).
Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).
Lance, C. et al. Multimodal single cell data integration challenge: results and lessons learned. In Proc. NeurIPS 2021 Competitions and Demonstrations Track Vol. 176 (eds Douwe, K., Marco, C. & Barbara, C) 162–176 (PMLR, 2022).
Liu, J., Huang, Y., Singh, R., Vert, J. P. & Noble, W. S. Jointly embedding multiple single-cell omics measurements. Algorithms Bioinform. 143, 10 (2019).
Wagner, F., Yan, Y. & Yanai, I. K-nearest neighbor smoothing for high-throughput single-cell RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/217737 (2018).
Long, Y. Dataset for SpatialGlue. Zenodo https://doi.org/10.5281/zenodo.7857496 (2023).
Fu, S. et al. Benchmarking single-cell multi-modal data integrations. Zenodo https://doi.org/10.5281/zenodo.14792951 (2025).
Acknowledgements
Q. L. was supported by the National Natural Science Foundation of China (grant no. T2425019, 32341008, 62088101), the National Key Research and Development Program of China (grant no. 2021YFF1201200, no. 2021YFF1200900), Shanghai Pilot Program for Basic Research, Shanghai Science and Technology Innovation Action Plan-Key Specialization in Computational Biology, Shanghai Shuguang Scholars Project, Shanghai Excellent Academic Leader Project, Shanghai Municipal Science and Technology Major Project (grant no. 2021SHZDZX0100) and Fundamental Research Funds for the Central Universities. S. F. was supported by National Natural Science Foundation of China (grant no. 32400521) and China Postdoctoral Science Foundation (grant no. 2023M742651, GZC20231946).
Author information
Authors and Affiliations
Contributions
S.F., S.W. and Q.L. conceived the project. S.F. and S.W. designed the stage 1 proposal and performed stage 2 data analysis with help from D.S., G.L. and Y.G. S.F., S.W. and Q.L. wrote the paper with input from all authors. Q.L. supervised the entire project. All authors read and approved the final paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Laura Cantini and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Extended data
Extended Data Fig. 1 Summary table of scRNA and scATAC multimodal datasets analyzed in the SCMMIB study.
Details of cell numbers, feature counts and non-zero ratios in all scRNA and scATAC datasets for paired or unpaired integration methods evaluations. Datasets used in unpaired scRNA and scATAC diagonal integration were further evaluated using the attributes of gene activity matrix (GAM).
Extended Data Fig. 2 Summary table of scRNA and ADT multimodal datasets analyzed in the SCMMIB study.
Details of cell numbers, feature counts and non-zero ratios in all scRNA and ADT datasets for paired scRNA and ADT integration benchmark.
Extended Data Fig. 3 Summary table of multimodal datasets analyzed in mosaic multimodal integration tasks in the SCMMIB study.
Details of cell numbers, feature counts and non-zero ratios of datasets for unpaired mosaic integration benchmark. The characteristics of paired and unpaired simulation datasets were separated with comma.
Extended Data Fig. 4 Details of scalability, accuracy, and robustness metrics for the paired scRNA and scATAC integration methods.
(a) Scalability line plot of running time, peak memory, and GPU memory for all paired scRNA and scATAC integration algorithms. Algorithms using GPU acceleration are plotted with dashed lines. (b) Heatmap of summarized accuracy metrics for paired scRNA and scATAC integration methods in Fig. 2b. The summarized metric scores are shown in each cell of the heatmap. (c) Evaluation for algorithms with optional batch parameters. The metric results calculated with batch information input are labeled with the ‘(batch)’ suffix in figure. (d) Heatmap of summarized robustness metrics for paired scRNA and scATAC integration methods in Fig. 2c. The summarized metric scores are shown in each cell of the heatmap. (e) Non-zero ratios of the ground truth data used for paired scRNA and scATAC imputation evaluation across 5 repeated runs. Box plots show the median (centre line), the 25th and 75th percentiles (bounds of the box), and whiskers extend to 1.5 × IQR (interquartile range).
Extended Data Fig. 5 Details of scalability, accuracy, and robustness metrics for the paired scRNA and ADT integration methods.
(a) Scalability line plot of running time, peak memory, and GPU memory for all paired scRNA and ADT integration algorithms. Algorithms using GPU acceleration are plotted with dashed lines. (b) Heatmap of summarized accuracy metrics for paired scRNA and ADT integration methods in Fig. 3b. The summarized metric scores are shown in each cell of the heatmap. (c) Heatmap of summarized robustness metrics for paired scRNA and ADT integration methods in Fig. 3d.
Extended Data Fig. 6 Details of scalability, accuracy, and robustness metrics for unpaired scRNA and scATAC diagonal integration methods.
(a) Scalability line plot of running time, peak memory, and GPU memory for all unpaired scRNA and scATAC diagonal integration algorithms. Algorithms using GPU acceleration are plotted with dashed lines. (b) Heatmap of summarized accuracy metrics for unpaired scRNA and scATAC diagonal integration methods in Fig. 4c. The summarized metric scores are shown in each cell of the heatmap. (c) Heatmap of summarized robustness metrics for unpaired scRNA and scATAC diagonal integration methods in Fig. 4f.
Extended Data Fig. 7 Details of usability, accuracy, and robustness metrics for the unpaired scRNA and scATAC mosaic integration methods.
(a) Scalability line plot of running time, peak memory, and GPU memory for all unpaired scRNA and scATAC mosaic integration algorithms. Algorithms using GPU acceleration are plotted with dashed lines. (b) Heatmap of summarized accuracy metrics for unpaired scRNA and scATAC mosaic integration methods in Fig. 5b. The summarized metric scores are shown in each cell of the heatmap. (c) Heatmap of summarized robustness metrics for unpaired scRNA and scATAC mosaic integration methods in Fig. 5e. The summarized metric scores are shown in each cell of the heatmap. (d) Non-zero ratios of the ground truth data used for mosaic scRNA and scATAC imputation evaluation in 5 repeated runs. e-f. Stability of algorithm embedding output in 5 repeated runs, evaluated with (e) absolute coefficient of variation and (f) standard deviation of metric values for all accuracy metrics. Box plots show the median (centre line), the 25th and 75th percentiles (bounds of the box), and whiskers extend to 1.5 × IQR (interquartile range). Box plots show the median (centre line), the 25th and 75th percentiles (bounds of the box), and whiskers extend to 1.5 × IQR (interquartile range).
Extended Data Fig. 8 Details of usability, accuracy and robustness metrics for the unpaired scRNA and ADT mosaic integration methods.
(a) Scalability line plot of running time, peak memory, and GPU memory for all unpaired scRNA and ADT mosaic integration algorithms. Algorithms using GPU acceleration are plotted with dashed lines. (b) Heatmap of summarized accuracy metrics for unpaired scRNA and ADT mosaic integration methods in Fig. 6b. The summarized metric scores are shown in each cell of the heatmap. (c) Heatmap of summarized robustness metrics for unpaired scRNA and ADT mosaic integration methods in Fig. 6c. The summarized metric scores are shown in each cell of the heatmap. d-e. Stability of algorithm embedding output in 5 repeated runs, evaluated with (d) absolute coefficient of variation and (e) standard deviation of metric values for all accuracy metrics.
Extended Data Fig. 9 Guidelines for single-cell multimodal integrations.
Recommendations for the optimal method in certain integration task. The methods were recommended based on overall rankings in usability, accuracy, and robustness. For more specific details, users should refer to the conclusions in the corresponding results section. SOTA: state-of-the-art method.
Extended Data Fig. 10 Summary table of accuracy metrics used in this study.
The groups, characteristics and required input format for using these accuracy metrics in this study are summarized in this figure. ASW, average silhouette width; GC, graph connectivity; iLISI, integration local inverse Simpson’s index; ARI, adjusted Rand index; FOSCTTM, fraction of samples closer than true match; AUPR, area under the precision recall curve; AUROC, area under the receiver operating characteristics curve.
Supplementary information
Supplementary Information
Supplementary Figures 1–22 and Supplementary Notes 1–3.
Source data
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fu, S., Wang, S., Si, D. et al. Benchmarking single-cell multi-modal data integrations. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02737-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41592-025-02737-9