Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
A diffusion model conditioned on compound bioactivity profiles for generating high-content images
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 April 2026

A diffusion model conditioned on compound bioactivity profiles for generating high-content images

  • Steven Cook1,
  • Jason Chyba1,
  • Laura Gresoro1,
  • Doug Quackenbush1,
  • Minhua Qiu1,
  • Peter Kutchukian2,
  • Eric J. Martin3,
  • Peter Skewes-Cox3 &
  • …
  • William J. Godinez3 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Machine learning
  • Phenotypic screening
  • Virtual drug screening
  • Virtual screening

Abstract

High-content imaging (HCI) provides a rich snapshot of compound-induced phenotypic outcomes that augment our understanding of how compounds affect cellular systems. Generative imaging models for HCI provide a route towards anticipating the phenotypic outcomes of chemical perturbations in silico at unprecedented scale and speed. Here, we developed Profile-Diffusion (pDIFF), a generative method leveraging a profile-to-image latent diffusion model conditioned on in silico bioactivity profiles to generate high-content images displaying the cellular outcomes induced by compound treatment. We trained and evaluated a pDIFF model using high-content images from a Cell Painting assay profiling 3750 molecules (3375 training compounds and 375 held-out compounds) with corresponding in silico bioactivity profiles. Using the held-out set we demonstrate that pDIFF provides improved visual depictions of phenotypic responses of compounds that are structurally dissimilar to training compounds, compared to a baseline profile-to-image latent diffusion model trained on substructural molecular descriptors only. In a virtual hit expansion scenario, pDIFF yielded statistically significant improvement in expansion outcomes as measured by nearest-neighbor retrieval accuracy, compared to expansions based on compound structural representations, bioactivity profiles, and generative imaging models based only on substructural molecular descriptors, thus showcasing the potential of the methodology to speed up and improve the search for novel phenotypically active molecules.

Data availability

The data used in this study are proprietary to Novartis. The data are not publicly available due to intellectual property restrictions. An example dataset is available in the pDIFF code respository.

Code availability

The code and an example dataset for pDIFF is available in Supplementary Code and at https://github.com/Novartis/pDIFF.

References

  1. Bray, M. A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 11(9), 1757–1774 (2016).

    Google Scholar 

  2. Dobson, C. M. Chemical space and biology. Nature 432(7019), 824–828 (2004).

    Google Scholar 

  3. Drew, K. L. M. et al. Size estimation of chemical space: How big is it?. J. Pharm. Pharmacol. 64(4), 490–495 (2012).

    Google Scholar 

  4. Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23(11), 101681 (2020).

    Google Scholar 

  5. Yang, K., et al. Mol2Image: Improved Conditional Flow Models for Molecule to Image Synthesis. 6688–6698 (2021) .

  6. Sorokin, D. V. et al. FiloGen: A model-based generator of synthetic 3-D time-lapse sequences of single motile cells with growing and branching filopodia. IEEE Trans. Med. Imaging 37(12), 2630–2641. https://doi.org/10.1109/TMI.2018.2845884 (2018).

    Google Scholar 

  7. Murphy, R. Location proteomics: A systems approach to subcellular location. Biochem. Soc. Trans. 33(3), 535–538 (2005).

    Google Scholar 

  8. Zhao, T. & Murphy, R. F. Automated learning of generative models for subcellular location: Building blocks for systems biology. Cytometry. A. 71A(12), 978–990 (2007).

    Google Scholar 

  9. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521(7553), 436–444. https://doi.org/10.1038/nature14539 (2015).

    Google Scholar 

  10. Goldsborough, P., Pawlowski, N., Caicedo, JC. et al CytoGAN: Generative Modeling of Cell Images. bioRxiv 227645 (2017).

  11. Johnson, G.R., Donovan-Maiye, R.M. & Maleckar, M.M. Generative modeling with conditional autoencoders: Building an integrated cell. arxiv:1705.00092 (2017).

  12. Osokin, A., et al. Gans for biological image synthesis. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2252–2261,https://doi.org/10.1109/ICCV.2017.245 (2017).

  13. Palma, A., Theis, F. J. & Lotfollahi, M. Predicting cell morphological responses to perturbations using generative modeling. bioRxiv 2023.07.17.549216 (2023).

  14. Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. arxiv 2105.05233 (2021).

  15. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630(8016), 493–500 (2024).

    Google Scholar 

  16. Corso, G., et al. Diffdock: Diffusion steps, twists, and turns for molecular docking. arxiv:2210.01776 (2023).

  17. Martin, E. J. et al. All-Assay-Max2 pQSAR: Activity predictions as accurate as four-concentration IC50s for 8558 Novartis assays. J. Chem. Inf. Model. 59(10), 4450–4459 (2019).

    Google Scholar 

  18. Canham, S. M. et al. Systematic chemogenetic library assembly. Cell Chem. Biol. 27(9), 1124–1129 (2020).

    Google Scholar 

  19. Rombach, R., et al. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10684–10695 (2022).

  20. Kingma, D.P., Welling, M.Auto-encoding variational bayes. arxiv:1312.6114 (2022).

  21. Ho, J., Jain, A., Abbeelm, P. Denoising diffusion probabilistic models. ,arxiv:2006.11239 (2020).

  22. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010).

    Google Scholar 

  23. Keller, T. L. et al. Halofuginone and other febrifugine derivatives inhibit prolyl-tRNA synthetase. Nat. Chem. Biol. 8(3), 311–317 (2012).

    Google Scholar 

  24. Lamora, A. et al. Anticancer activity of halofuginone in a preclinical model of osteosarcoma: Inhibition of tumor growth and lung metastases. Oncotarget 6(16), 14413–14427 (2015).

    Google Scholar 

  25. Jimenez, J.M., et al. Design and Optimization of Selective Protein Kinase C theta (PKCTheta) Inhibitors for the Treatment of Autoimmune Diseases. Journal of Medicinal Chemistry 56(5), 1799–1810. (Publisher: American Chemical Society, 2013).

  26. Li, J., Xu, C. & Liu, Q. Roles of NRF2 in DNA damage repair. Cell. Oncol. (Amst.) 46(6), 1577–1593 (2023).

    Google Scholar 

  27. Ren, D. et al. Brusatol enhances the efficacy of chemotherapy by inhibiting the NRF2-mediated defense mechanism. Proc. Natl. Acad. Sci. U. S. A. 108(4), 1433–1438 (2011).

    Google Scholar 

  28. Wang, C. et al. Thailandepsins: Bacterial products with potent histone deacetylase inhibitory activities and broad-spectrum antiproliferative activities. J. Nat. Prod. 74(10), 2031–2038 (2011).

    Google Scholar 

  29. Heinrich, L. et al. Selection of optimal cell lines for high-content phenotypic screening. ACS Chem. Biol. 18(4), 679–685 (2023).

    Google Scholar 

  30. Nestal de Moraes, G., et al. The pterocarpanquinone LQB-118 induces apoptosis in acute myeloid leukemia cells of distinct molecular subtypes and targets FoxO3a and FoxM1 transcription factors Corrigendum in /10.3892/ijo.2019.4874. International Journal of Oncology, 45(5), 1949–1958 (2014).

  31. Stringer, C. et al. Cellpose: A generalist algorithm for cellular segmentation. Nat. Methods. 18(1), 100–106 (2021).

    Google Scholar 

  32. Feydy, J., et al. Interpolating between optimal transport and mmd using sinkhorn divergences. In: The 22nd International Conference on Artificial Intelligence and Statistics, 2681–2690 (2019).

  33. Conover, W. One-sample Kolmogorov test/two-sample Smirnov test. In B W (ed. Nonparametric, Practical) 295–314 (Statistics. Wiley, 1971).

    Google Scholar 

  34. Virtanen, P. et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020).

    Google Scholar 

  35. Huang, Z., S. et al. Scalelong: towards more stable training of diffusion model via scaling network long skip connection. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. (Curran Associates Inc., Red Hook, NY, USA, 2024).

  36. Zdrazil, B. et al. The ChEMBL database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic. Acids. Res. 52(D1), D1180–D1192 (2023).

    Google Scholar 

  37. Li, H., Qiu, J. & Fu, X. RASL seq for massively parallel and quantitative analysis of gene expression. Curr. Protoc. Mol. Biol. https://doi.org/10.1002/0471142727.mb0413s98 (2012).

    Google Scholar 

  38. Subramanian, A. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171(6), 1437-1452.e17 (2017).

    Google Scholar 

  39. Ye, C. et al. DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery. Nat. Commun. 9(1), 4307 (2018).

    Google Scholar 

  40. Chandrasekaran, S. N. et al. Image-based profiling for drug discovery: Due for a machine-learning upgrade?. Nat. Rev. Drug Discov. 20(2), 145–159 (2021).

    Google Scholar 

  41. Garcia de Lomana, M., Marin Zapata, P. A. & Montanari, F. Predicting the mitochondrial toxicity of small molecules: Insights from mechanistic assays and cell painting data. Chem. Res. Toxicol. 36(7), 1107–1120 (2023).

    Google Scholar 

  42. Seal, S. et al. Insights into drug cardiotoxicity from biological and chemical data: The first public classifiers for FDA drug-induced cardiotoxicity rank. J. Chem. Inf. Model. 64(4), 1172–1186 (2024).

    Google Scholar 

  43. Lu, C., et al. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arxiv:2206.00927 (2022)

  44. Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4(2), 180–186 (2022).

    Google Scholar 

  45. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361(6400), 360–365 (2018).

    Google Scholar 

  46. Shen, L. et al. Pocket crafter: A 3D generative modeling based workflow for the rapid generation of hit molecules in drug discovery. J. Cheminform. 16(1), 33 (2024).

    Google Scholar 

  47. Peng, T. et al. A BaSiC tool for background and shading correction of optical microscopy images. Nat. Commun. 8(1), 14836 (2017).

    Google Scholar 

  48. Hang, T., et al. Efficient diffusion training via min-snr weighting strategy. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7441–7451 (2023).

  49. Guttenberg, N. Diffusion with Offset Noise. URL https://www.crosslabs.org//blog/diffusion-with-offset-noise (2023).

  50. Ho, J., Salimans, T.Classifier-free diffusion guidance. ,arXiv:2207.12598 (2022).

Download references

Acknowledgements

We thank Frederick Lo for help with data logistics and pre-processing. We thank Mark A. Bray for fruitful discussions.

Author information

Authors and Affiliations

  1. Novartis Biomedical Research, San Diego, 92121, CA, USA

    Steven Cook, Jason Chyba, Laura Gresoro, Doug Quackenbush & Minhua Qiu

  2. Novartis Biomedical Research, Cambridge, 02139, MA, USA

    Peter Kutchukian

  3. Novartis Biomedical Research, Emeryville, 94608, CA, USA

    Eric J. Martin, Peter Skewes-Cox & William J. Godinez

Authors
  1. Steven Cook
    View author publications

    Search author on:PubMed Google Scholar

  2. Jason Chyba
    View author publications

    Search author on:PubMed Google Scholar

  3. Laura Gresoro
    View author publications

    Search author on:PubMed Google Scholar

  4. Doug Quackenbush
    View author publications

    Search author on:PubMed Google Scholar

  5. Minhua Qiu
    View author publications

    Search author on:PubMed Google Scholar

  6. Peter Kutchukian
    View author publications

    Search author on:PubMed Google Scholar

  7. Eric J. Martin
    View author publications

    Search author on:PubMed Google Scholar

  8. Peter Skewes-Cox
    View author publications

    Search author on:PubMed Google Scholar

  9. William J. Godinez
    View author publications

    Search author on:PubMed Google Scholar

Contributions

S.C. and W.J.G. designed and led the study. S.C. developed, implemented, and evaluated pDIFF. J.C., L.G., and D.Q. developed and ran the imaging assay. E.J.M. developed the algorithm to compute the in silico bioactivity profiles and provided feedback. M.Q., P.K., and P.S.-C. provided feedback. S.C., M.Q., and W.J.G. analyzed and interpreted the results. S.C. and W.J.G. wrote the article. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Steven Cook or William J. Godinez.

Ethics declarations

Competing interests

All authors are (or were at the time of their involvement with the studies) employees of Novartis.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information. (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cook, S., Chyba, J., Gresoro, L. et al. A diffusion model conditioned on compound bioactivity profiles for generating high-content images. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44976-6

Download citation

  • Received: 24 January 2025

  • Accepted: 15 March 2026

  • Published: 03 April 2026

  • DOI: https://doi.org/10.1038/s41598-026-44976-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Generative AI for drug discovery
  • In silico HCI
  • Virtual screening
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing