Abstract
Single-cell datasets are routinely collected to investigate changes in cellular state between control cells and the corresponding cells in a treatment condition, such as exposure to a drug or infection by a pathogen. To better understand heterogeneity in treatment response, it is desirable to deconvolve variations enriched in treated cells from those shared with controls. However, standard computational models of single-cell data are not designed to explicitly separate these variations. Here, we introduce contrastive variational inference (contrastiveVI; https://github.com/suinleelab/contrastiveVI), a framework for deconvolving variations in treatment–control single-cell RNA sequencing (scRNA-seq) datasets into shared and treatment-specific latent variables. Using three treatment–control scRNA-seq datasets, we apply contrastiveVI to perform a variety of analysis tasks, including visualization, clustering and differential expression testing. We find that contrastiveVI consistently achieves results that agree with known ground truths and often highlights subtle phenomena that may be difficult to ascertain with standard workflows. We conclude by generalizing contrastiveVI to accommodate joint transcriptome and surface protein measurements.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
All datasets analyzed in this paper are publicly available. The simulated dataset was generated using the scsim package found at https://github.com/dylkot/scsim. The MIX-seq dataset from McFarland et al.2 was downloaded from the authors’ Figshare repository (https://figshare.com/articles/dataset/MIX-seq_data/10298696). The Haber et al.20, Norman et al.4 and Papalexi et al.30 datasets were downloaded from the National Institutes of Health GEO (accession codes GSE92332, GSE133344 and GSE153056, respectively). Our code for downloading and preprocessing these datasets is available at https://github.com/suinleelab/contrastiveVI-reproducibility.
Code availability
Our Python software package with scvi-tools44 implementations of the contrastiveVI and totalContrastiveVI models is available at https://github.com/suinleelab/contrastiveVI. Code for reproducing the specific results in this paper is available at https://github.com/suinleelab/contrastiveVI-reproducibility.
References
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).
Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).
Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).
McGinnis, C. S. et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods 16, 619–626 (2019).
Zou, J. Y., Hsu, D. J., Parkes, D. C. & Adams, R. P. Contrastive learning using spectral methods. Adv. Neural Inf. Process. Syst. 26, 2238–2246 (2013).
Abid, A., Zhang, M. J., Bagaria, V. K. & Zou, J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 9, 2134 (2018).
Jones, A., Townes, W. F., Li, D. & Engelhardt, B. E. Contrastive latent variable modeling with application to case–control sequencing experiments. Ann. Appl. Stat. 16, 1268–1291 (2022).
Li, D., Jones, A. & Engelhardt, B. Probabilistic contrastive principal component analysis. Preprint at arXiv https://doi.org/10.48550/arXiv.2012.07977 (2020).
Severson, K. A., Ghosh, S. & Ng, K. Unsupervised learning with contrastive latent variable models. In Proceedings of the AAAI Conference on Artificial Intelligence 33, 4862–4869 (AAAI, 2019).
Abid, A. & Zou, J. Contrastive variational autoencoder enhances salient features. Preprint at arXiv https://doi.org/10.48550/arXiv.1902.04601 (2019).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J.-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).
Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2021).
Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations (ICLR, 2015).
Vassilev, L. T. et al. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science 303, 844–848 (2004).
DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst. 12, 446–456 (2021).
Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).
Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).
Loonen, L. M. et al. Reg3γ-deficient mice have altered mucus distribution and increased mucosal inflammatory responses to the microbiota and enteric pathogens in the ileum. Mucosal Immunol. 7, 939–947 (2014).
Farr, L. et al. Cd74 signaling links inflammation to intestinal epithelial cell regeneration and promotes mucosal healing. Cell. Mol. Gastroenterol. Hepatol. 10, 101–112 (2020).
Koeberle, S. C. et al. Distinct and overlapping functions of glutathione peroxidases 1 and 2 in limiting NF-κB-driven inflammation through redox-active mechanisms. Redox Biol. 28, 101388 (2020).
Gerbe, F. et al. Intestinal epithelial tuft cells initiate type 2 mucosal immunity to helminth parasites. Nature 529, 226–230 (2016).
Campello, R. J., Moulavi, D., Zimek, A. & Sander, J. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10, 1–51 (2015).
ENCODE Project Consortium. The ENCODE (Encyclopedia of DNA Elements) Project. Science 306, 636–640 (2004).
ENCODE Project Consortium. A user’s guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol. 9, e1001046 (2011).
Rouillard, A. D. et al. The Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 2016, baw100 (2016).
Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).
Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).
Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).
Chanput, W., Mes, J. J. & Wichers, H. J. THP-1 cell line: an in vitro cell model for immune modulation approach. Int. Immunopharmacol. 23, 37–45 (2014).
Bhat, M. Y. et al. Comprehensive network map of interferon γ signaling. J. Cell Commun. Signal. 12, 745–751 (2018).
Garcia-Diaz, A. et al. Interferon receptor signaling pathways regulating PD-L1 and PD-L2 expression. Cell Rep. 19, 1189–1201 (2017).
Crabbé, J. & van der Schaar, M. Label-free explainability for unsupervised models. In International Conference on Machine Learning 4391–4420 (PMLR, 2022).
Lin, C., Chen, H., Kim, C. & Lee, S.-I. Contrastive corpus attribution for explaining representations. In 11th Int. Conf. Learn. Rep. (ICLR 2023).
Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).
Gut, G., Stark, S. G., Rätsch, G. & Davidson, N. R. pmVAE: learning interpretable single-cell representations with pathway modules. Preprint at bioRxiv https://doi.org/10.1101/2021.01.28.428664 (2021).
Rybakov, S., Lotfollahi, M., Theis, F. J. & Wolf, F. A. Learning interpretable latent autoencoder representations with annotations of feature sets. Preprint at bioRxiv https://doi.org/10.1101/2020.12.02.401182 (2020).
Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112, 859–877 (2017).
Villani, C. Optimal Transport: Old and New, Vol. 338 (Springer, 2009).
Weinberger, E., Lopez, R., Hutter, J.-C. & Regev, A. Disentangling shared and group-specific variations in single-cell transcriptomics data with multiGroupVI. Preprint at bioRxiv https://doi.org/10.1101/2022.12.13.520349 (2022).
Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).
Boyeau, P. et al. Deep generative models for detecting differential expression in single cells. In Machine Learning in Computational Biology (MLCB, 2019).
Khatri, P., Sirota, M. & Butte, A. J. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput. Biol. 8, e1002375 (2012).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodol. 57, 289–300 (1995).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR, 2015).
Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-seq. eLife 8, e43803 (2019).
Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 (2017).
Buitinck, L. et al. API design for machine learning software: experiences from the scikit-learn project. Preprint at arXiv https://doi.org/10.48550/arXiv.1309.0238 (2013).
Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).
Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Acknowledgements
We thank members of the Lee laboratory for their helpful feedback on this work. This work was funded by NSF DBI-1552309 and DBI-1759487 (E.W., C.L. and S.-I.L.), NIH R35-GM-128638 and R01-NIA-AG-061132 (E.W., C.L. and S.-I.L.). E.W. was supported by the National Science Foundation Graduate Research Fellowship under grant no. DGE-2140004.
Author information
Authors and Affiliations
Contributions
E.W. and C.L. contributed equally. E.W. conceived the study with input from S.-I.L. E.W. implemented an initial prototype of contrastiveVI, and C.L. wrote the final refactored scvi-tools implementation and associated tests. E.W. and C.L. both applied the model to analyze the datasets considered in this work with input from S.-I.L. S.-I.L. supervised the work. All authors participated in writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Methods thanks Natalie Davidson, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Lei Tang and Lin Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–32, Tables 1–21 and Notes 1–15
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Weinberger, E., Lin, C. & Lee, SI. Isolating salient variations of interest in single-cell data with contrastiveVI. Nat Methods 20, 1336–1345 (2023). https://doi.org/10.1038/s41592-023-01955-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41592-023-01955-3
This article is cited by
-
Interpretation, extrapolation and perturbation of single cells
Nature Reviews Genetics (2026)
-
iVAE: an interpretable representation learning framework enhances clustering performance for single-cell data
BMC Biology (2025)
-
Benchmarking deep learning methods for biologically conserved single-cell integration
Genome Biology (2025)
-
Deep generative modeling of sample-level heterogeneity in single-cell genomics
Nature Methods (2025)
-
Scvi-hub: an actionable repository for model-driven single-cell analysis
Nature Methods (2025)


