Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Isolating salient variations of interest in single-cell data with contrastiveVI

Abstract

Single-cell datasets are routinely collected to investigate changes in cellular state between control cells and the corresponding cells in a treatment condition, such as exposure to a drug or infection by a pathogen. To better understand heterogeneity in treatment response, it is desirable to deconvolve variations enriched in treated cells from those shared with controls. However, standard computational models of single-cell data are not designed to explicitly separate these variations. Here, we introduce contrastive variational inference (contrastiveVI; https://github.com/suinleelab/contrastiveVI), a framework for deconvolving variations in treatment–control single-cell RNA sequencing (scRNA-seq) datasets into shared and treatment-specific latent variables. Using three treatment–control scRNA-seq datasets, we apply contrastiveVI to perform a variety of analysis tasks, including visualization, clustering and differential expression testing. We find that contrastiveVI consistently achieves results that agree with known ground truths and often highlights subtle phenomena that may be difficult to ascertain with standard workflows. We conclude by generalizing contrastiveVI to accommodate joint transcriptome and surface protein measurements.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of contrastiveVI.
Fig. 2: contrastiveVI isolates idasanutlin-induced variations in cancer cell lines.
Fig. 3: contrastiveVI uncovers cell type-specific responses to pathogen infections in mouse intestinal epithelial cells.
Fig. 4: contrastiveVI’s salient latent space isolates CRISPR perturbation-induced variations in a large-scale Perturb-seq experiment.
Fig. 5: totalContrastiveVI isolates perturbation-induced variations in joint RNA and protein measurements.

Similar content being viewed by others

Data availability

All datasets analyzed in this paper are publicly available. The simulated dataset was generated using the scsim package found at https://github.com/dylkot/scsim. The MIX-seq dataset from McFarland et al.2 was downloaded from the authors’ Figshare repository (https://figshare.com/articles/dataset/MIX-seq_data/10298696). The Haber et al.20, Norman et al.4 and Papalexi et al.30 datasets were downloaded from the National Institutes of Health GEO (accession codes GSE92332, GSE133344 and GSE153056, respectively). Our code for downloading and preprocessing these datasets is available at https://github.com/suinleelab/contrastiveVI-reproducibility.

Code availability

Our Python software package with scvi-tools44 implementations of the contrastiveVI and totalContrastiveVI models is available at https://github.com/suinleelab/contrastiveVI. Code for reproducing the specific results in this paper is available at https://github.com/suinleelab/contrastiveVI-reproducibility.

References

  1. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. McFarland, J. M. et al. Multiplexed single-cell transcriptional response profiling to define cancer vulnerabilities and therapeutic mechanism of action. Nat. Commun. 11, 4296 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167, 1853–1866 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Norman, T. M. et al. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science 365, 786–793 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. McGinnis, C. S. et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods 16, 619–626 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Zou, J. Y., Hsu, D. J., Parkes, D. C. & Adams, R. P. Contrastive learning using spectral methods. Adv. Neural Inf. Process. Syst. 26, 2238–2246 (2013).

    Google Scholar 

  7. Abid, A., Zhang, M. J., Bagaria, V. K. & Zou, J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 9, 2134 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Jones, A., Townes, W. F., Li, D. & Engelhardt, B. E. Contrastive latent variable modeling with application to case–control sequencing experiments. Ann. Appl. Stat. 16, 1268–1291 (2022).

    Article  Google Scholar 

  9. Li, D., Jones, A. & Engelhardt, B. Probabilistic contrastive principal component analysis. Preprint at arXiv https://doi.org/10.48550/arXiv.2012.07977 (2020).

  10. Severson, K. A., Ghosh, S. & Ng, K. Unsupervised learning with contrastive latent variable models. In Proceedings of the AAAI Conference on Artificial Intelligence 33, 4862–4869 (AAAI, 2019).

  11. Abid, A. & Zou, J. Contrastive variational autoencoder enhances salient features. Preprint at arXiv https://doi.org/10.48550/arXiv.1902.04601 (2019).

  12. Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Risso, D., Perraudeau, F., Gribkova, S., Dudoit, S. & Vert, J.-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun. 9, 284 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2021).

  15. Mimitou, E. P. et al. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat. Methods 16, 409–412 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations (ICLR, 2015).

  17. Vassilev, L. T. et al. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science 303, 844–848 (2004).

    Article  CAS  PubMed  Google Scholar 

  18. DeTomaso, D. & Yosef, N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst. 12, 446–456 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Loonen, L. M. et al. Reg3γ-deficient mice have altered mucus distribution and increased mucosal inflammatory responses to the microbiota and enteric pathogens in the ileum. Mucosal Immunol. 7, 939–947 (2014).

    Article  CAS  PubMed  Google Scholar 

  22. Farr, L. et al. Cd74 signaling links inflammation to intestinal epithelial cell regeneration and promotes mucosal healing. Cell. Mol. Gastroenterol. Hepatol. 10, 101–112 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Koeberle, S. C. et al. Distinct and overlapping functions of glutathione peroxidases 1 and 2 in limiting NF-κB-driven inflammation through redox-active mechanisms. Redox Biol. 28, 101388 (2020).

    Article  CAS  PubMed  Google Scholar 

  24. Gerbe, F. et al. Intestinal epithelial tuft cells initiate type 2 mucosal immunity to helminth parasites. Nature 529, 226–230 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Campello, R. J., Moulavi, D., Zimek, A. & Sander, J. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10, 1–51 (2015).

    Article  Google Scholar 

  26. ENCODE Project Consortium. The ENCODE (Encyclopedia of DNA Elements) Project. Science 306, 636–640 (2004).

    Article  Google Scholar 

  27. ENCODE Project Consortium. A user’s guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol. 9, e1001046 (2011).

    Article  Google Scholar 

  28. Rouillard, A. D. et al. The Harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 2016, baw100 (2016).

  29. Frangieh, C. J. et al. Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Papalexi, E. et al. Characterizing the molecular regulation of inhibitory immune checkpoints with multimodal single-cell screens. Nat. Genet. 53, 322–331 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272–282 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Chanput, W., Mes, J. J. & Wichers, H. J. THP-1 cell line: an in vitro cell model for immune modulation approach. Int. Immunopharmacol. 23, 37–45 (2014).

    Article  CAS  PubMed  Google Scholar 

  33. Bhat, M. Y. et al. Comprehensive network map of interferon γ signaling. J. Cell Commun. Signal. 12, 745–751 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Garcia-Diaz, A. et al. Interferon receptor signaling pathways regulating PD-L1 and PD-L2 expression. Cell Rep. 19, 1189–1201 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Crabbé, J. & van der Schaar, M. Label-free explainability for unsupervised models. In International Conference on Machine Learning 4391–4420 (PMLR, 2022).

  36. Lin, C., Chen, H., Kim, C. & Lee, S.-I. Contrastive corpus attribution for explaining representations. In 11th Int. Conf. Learn. Rep. (ICLR 2023).

  37. Ashuach, T., Reidenbach, D. A., Gayoso, A. & Yosef, N. PeakVI: a deep generative model for single-cell chromatin accessibility analysis. Cell Rep. Methods 2, 100182 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Gut, G., Stark, S. G., Rätsch, G. & Davidson, N. R. pmVAE: learning interpretable single-cell representations with pathway modules. Preprint at bioRxiv https://doi.org/10.1101/2021.01.28.428664 (2021).

  39. Rybakov, S., Lotfollahi, M., Theis, F. J. & Wolf, F. A. Learning interpretable latent autoencoder representations with annotations of feature sets. Preprint at bioRxiv https://doi.org/10.1101/2020.12.02.401182 (2020).

  40. Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112, 859–877 (2017).

    Article  CAS  Google Scholar 

  41. Villani, C. Optimal Transport: Old and New, Vol. 338 (Springer, 2009).

  42. Weinberger, E., Lopez, R., Hutter, J.-C. & Regev, A. Disentangling shared and group-specific variations in single-cell transcriptomics data with multiGroupVI. Preprint at bioRxiv https://doi.org/10.1101/2022.12.13.520349 (2022).

  43. Wolf, F. A., Angerer, P. & Theis, F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40, 163–166 (2022).

    Article  CAS  PubMed  Google Scholar 

  45. Boyeau, P. et al. Deep generative models for detecting differential expression in single cells. In Machine Learning in Computational Biology (MLCB, 2019).

  46. Khatri, P., Sirota, M. & Butte, A. J. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput. Biol. 8, e1002375 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).

    Article  CAS  PubMed  Google Scholar 

  49. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodol. 57, 289–300 (1995).

    Google Scholar 

  50. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR, 2015).

  51. Kotliar, D. et al. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-seq. eLife 8, e43803 (2019).

  52. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Buitinck, L. et al. API design for machine learning software: experiences from the scikit-learn project. Preprint at arXiv https://doi.org/10.48550/arXiv.1309.0238 (2013).

  55. Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37, 547–554 (2019).

    Article  CAS  PubMed  Google Scholar 

  56. Haghverdi, L., Lun, A. T., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank members of the Lee laboratory for their helpful feedback on this work. This work was funded by NSF DBI-1552309 and DBI-1759487 (E.W., C.L. and S.-I.L.), NIH R35-GM-128638 and R01-NIA-AG-061132 (E.W., C.L. and S.-I.L.). E.W. was supported by the National Science Foundation Graduate Research Fellowship under grant no. DGE-2140004.

Author information

Authors and Affiliations

Authors

Contributions

E.W. and C.L. contributed equally. E.W. conceived the study with input from S.-I.L. E.W. implemented an initial prototype of contrastiveVI, and C.L. wrote the final refactored scvi-tools implementation and associated tests. E.W. and C.L. both applied the model to analyze the datasets considered in this work with input from S.-I.L. S.-I.L. supervised the work. All authors participated in writing the manuscript.

Corresponding author

Correspondence to Su-In Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Natalie Davidson, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Lei Tang and Lin Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–32, Tables 1–21 and Notes 1–15

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weinberger, E., Lin, C. & Lee, SI. Isolating salient variations of interest in single-cell data with contrastiveVI. Nat Methods 20, 1336–1345 (2023). https://doi.org/10.1038/s41592-023-01955-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-023-01955-3

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics