Abstract
High-content imaging (HCI) provides a rich snapshot of compound-induced phenotypic outcomes that augment our understanding of how compounds affect cellular systems. Generative imaging models for HCI provide a route towards anticipating the phenotypic outcomes of chemical perturbations in silico at unprecedented scale and speed. Here, we developed Profile-Diffusion (pDIFF), a generative method leveraging a profile-to-image latent diffusion model conditioned on in silico bioactivity profiles to generate high-content images displaying the cellular outcomes induced by compound treatment. We trained and evaluated a pDIFF model using high-content images from a Cell Painting assay profiling 3750 molecules (3375 training compounds and 375 held-out compounds) with corresponding in silico bioactivity profiles. Using the held-out set we demonstrate that pDIFF provides improved visual depictions of phenotypic responses of compounds that are structurally dissimilar to training compounds, compared to a baseline profile-to-image latent diffusion model trained on substructural molecular descriptors only. In a virtual hit expansion scenario, pDIFF yielded statistically significant improvement in expansion outcomes as measured by nearest-neighbor retrieval accuracy, compared to expansions based on compound structural representations, bioactivity profiles, and generative imaging models based only on substructural molecular descriptors, thus showcasing the potential of the methodology to speed up and improve the search for novel phenotypically active molecules.
Data availability
The data used in this study are proprietary to Novartis. The data are not publicly available due to intellectual property restrictions. An example dataset is available in the pDIFF code respository.
Code availability
The code and an example dataset for pDIFF is available in Supplementary Code and at https://github.com/Novartis/pDIFF.
References
Bray, M. A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat. Protoc. 11(9), 1757–1774 (2016).
Dobson, C. M. Chemical space and biology. Nature 432(7019), 824–828 (2004).
Drew, K. L. M. et al. Size estimation of chemical space: How big is it?. J. Pharm. Pharmacol. 64(4), 490–495 (2012).
Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23(11), 101681 (2020).
Yang, K., et al. Mol2Image: Improved Conditional Flow Models for Molecule to Image Synthesis. 6688–6698 (2021) .
Sorokin, D. V. et al. FiloGen: A model-based generator of synthetic 3-D time-lapse sequences of single motile cells with growing and branching filopodia. IEEE Trans. Med. Imaging 37(12), 2630–2641. https://doi.org/10.1109/TMI.2018.2845884 (2018).
Murphy, R. Location proteomics: A systems approach to subcellular location. Biochem. Soc. Trans. 33(3), 535–538 (2005).
Zhao, T. & Murphy, R. F. Automated learning of generative models for subcellular location: Building blocks for systems biology. Cytometry. A. 71A(12), 978–990 (2007).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521(7553), 436–444. https://doi.org/10.1038/nature14539 (2015).
Goldsborough, P., Pawlowski, N., Caicedo, JC. et al CytoGAN: Generative Modeling of Cell Images. bioRxiv 227645 (2017).
Johnson, G.R., Donovan-Maiye, R.M. & Maleckar, M.M. Generative modeling with conditional autoencoders: Building an integrated cell. arxiv:1705.00092 (2017).
Osokin, A., et al. Gans for biological image synthesis. In: 2017 IEEE International Conference on Computer Vision (ICCV), 2252–2261,https://doi.org/10.1109/ICCV.2017.245 (2017).
Palma, A., Theis, F. J. & Lotfollahi, M. Predicting cell morphological responses to perturbations using generative modeling. bioRxiv 2023.07.17.549216 (2023).
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. arxiv 2105.05233 (2021).
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630(8016), 493–500 (2024).
Corso, G., et al. Diffdock: Diffusion steps, twists, and turns for molecular docking. arxiv:2210.01776 (2023).
Martin, E. J. et al. All-Assay-Max2 pQSAR: Activity predictions as accurate as four-concentration IC50s for 8558 Novartis assays. J. Chem. Inf. Model. 59(10), 4450–4459 (2019).
Canham, S. M. et al. Systematic chemogenetic library assembly. Cell Chem. Biol. 27(9), 1124–1129 (2020).
Rombach, R., et al. High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10684–10695 (2022).
Kingma, D.P., Welling, M.Auto-encoding variational bayes. arxiv:1312.6114 (2022).
Ho, J., Jain, A., Abbeelm, P. Denoising diffusion probabilistic models. ,arxiv:2006.11239 (2020).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010).
Keller, T. L. et al. Halofuginone and other febrifugine derivatives inhibit prolyl-tRNA synthetase. Nat. Chem. Biol. 8(3), 311–317 (2012).
Lamora, A. et al. Anticancer activity of halofuginone in a preclinical model of osteosarcoma: Inhibition of tumor growth and lung metastases. Oncotarget 6(16), 14413–14427 (2015).
Jimenez, J.M., et al. Design and Optimization of Selective Protein Kinase C theta (PKCTheta) Inhibitors for the Treatment of Autoimmune Diseases. Journal of Medicinal Chemistry 56(5), 1799–1810. (Publisher: American Chemical Society, 2013).
Li, J., Xu, C. & Liu, Q. Roles of NRF2 in DNA damage repair. Cell. Oncol. (Amst.) 46(6), 1577–1593 (2023).
Ren, D. et al. Brusatol enhances the efficacy of chemotherapy by inhibiting the NRF2-mediated defense mechanism. Proc. Natl. Acad. Sci. U. S. A. 108(4), 1433–1438 (2011).
Wang, C. et al. Thailandepsins: Bacterial products with potent histone deacetylase inhibitory activities and broad-spectrum antiproliferative activities. J. Nat. Prod. 74(10), 2031–2038 (2011).
Heinrich, L. et al. Selection of optimal cell lines for high-content phenotypic screening. ACS Chem. Biol. 18(4), 679–685 (2023).
Nestal de Moraes, G., et al. The pterocarpanquinone LQB-118 induces apoptosis in acute myeloid leukemia cells of distinct molecular subtypes and targets FoxO3a and FoxM1 transcription factors Corrigendum in /10.3892/ijo.2019.4874. International Journal of Oncology, 45(5), 1949–1958 (2014).
Stringer, C. et al. Cellpose: A generalist algorithm for cellular segmentation. Nat. Methods. 18(1), 100–106 (2021).
Feydy, J., et al. Interpolating between optimal transport and mmd using sinkhorn divergences. In: The 22nd International Conference on Artificial Intelligence and Statistics, 2681–2690 (2019).
Conover, W. One-sample Kolmogorov test/two-sample Smirnov test. In B W (ed. Nonparametric, Practical) 295–314 (Statistics. Wiley, 1971).
Virtanen, P. et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17, 261–272 (2020).
Huang, Z., S. et al. Scalelong: towards more stable training of diffusion model via scaling network long skip connection. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. (Curran Associates Inc., Red Hook, NY, USA, 2024).
Zdrazil, B. et al. The ChEMBL database in 2023: A drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic. Acids. Res. 52(D1), D1180–D1192 (2023).
Li, H., Qiu, J. & Fu, X. RASL seq for massively parallel and quantitative analysis of gene expression. Curr. Protoc. Mol. Biol. https://doi.org/10.1002/0471142727.mb0413s98 (2012).
Subramanian, A. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171(6), 1437-1452.e17 (2017).
Ye, C. et al. DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery. Nat. Commun. 9(1), 4307 (2018).
Chandrasekaran, S. N. et al. Image-based profiling for drug discovery: Due for a machine-learning upgrade?. Nat. Rev. Drug Discov. 20(2), 145–159 (2021).
Garcia de Lomana, M., Marin Zapata, P. A. & Montanari, F. Predicting the mitochondrial toxicity of small molecules: Insights from mechanistic assays and cell painting data. Chem. Res. Toxicol. 36(7), 1107–1120 (2023).
Seal, S. et al. Insights into drug cardiotoxicity from biological and chemical data: The first public classifiers for FDA drug-induced cardiotoxicity rank. J. Chem. Inf. Model. 64(4), 1172–1186 (2024).
Lu, C., et al. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arxiv:2206.00927 (2022)
Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4(2), 180–186 (2022).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 361(6400), 360–365 (2018).
Shen, L. et al. Pocket crafter: A 3D generative modeling based workflow for the rapid generation of hit molecules in drug discovery. J. Cheminform. 16(1), 33 (2024).
Peng, T. et al. A BaSiC tool for background and shading correction of optical microscopy images. Nat. Commun. 8(1), 14836 (2017).
Hang, T., et al. Efficient diffusion training via min-snr weighting strategy. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 7441–7451 (2023).
Guttenberg, N. Diffusion with Offset Noise. URL https://www.crosslabs.org//blog/diffusion-with-offset-noise (2023).
Ho, J., Salimans, T.Classifier-free diffusion guidance. ,arXiv:2207.12598 (2022).
Acknowledgements
We thank Frederick Lo for help with data logistics and pre-processing. We thank Mark A. Bray for fruitful discussions.
Author information
Authors and Affiliations
Contributions
S.C. and W.J.G. designed and led the study. S.C. developed, implemented, and evaluated pDIFF. J.C., L.G., and D.Q. developed and ran the imaging assay. E.J.M. developed the algorithm to compute the in silico bioactivity profiles and provided feedback. M.Q., P.K., and P.S.-C. provided feedback. S.C., M.Q., and W.J.G. analyzed and interpreted the results. S.C. and W.J.G. wrote the article. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
All authors are (or were at the time of their involvement with the studies) employees of Novartis.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cook, S., Chyba, J., Gresoro, L. et al. A diffusion model conditioned on compound bioactivity profiles for generating high-content images. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44976-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-44976-6