Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Interpretable machine learning rationalizes carbonic anhydrase inhibition via conformal and counterfactual prediction
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 11 February 2026

Interpretable machine learning rationalizes carbonic anhydrase inhibition via conformal and counterfactual prediction

  • Masoumeh Shams Ghamsary1 na1,
  • Milad Rayka1 na1 &
  • S. Shahab Naghavi1 

Scientific Reports , Article number:  (2026) Cite this article

  • 552 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Cancer
  • Computational biology and bioinformatics
  • Drug discovery

Abstract

Human carbonic anhydrase (hCA) isoforms IX and XII are promising anticancer targets. Yet, their selective inhibition remains elusive due to close similarity with the abundant hCA II, whose off-target inhibition causes harmful side effects. Here, we introduce an interpretable machine learning framework to predict inhibition across hCA II, IX, and XII. To address this issue, our approach combines rigorous data curation, systematic benchmarking of classical and deep learning models, and integration of conformal prediction for uncertainty quantification with counterfactual explanations for molecular interpretability. After extensive benchmarking, we find that Support Vector Machines with extended-connectivity fingerprints consistently outperform more complex models, underscoring the importance of data quality and validation over algorithmic complexity. Here, conformal prediction provides rigorous activity estimation, while counterfactual analysis rationalizes structural features governing isoform selectivity, together enabling interpretable guidance for inhibitor design. To further test our model capability, we examine it on SLC-0111, as a selective inhibitor, which leads to a compatible result with the experiment. Our model reiterates experimental findings that modifications in the tail region strongly affect molecular selectivity, emphasizing the tail group as a key structural determinant for differentiating inhibitor activity among hCA isoforms II, IX, and XII. To facilitate adoption, we also release CAInsight, a user-friendly software with a graphical interface for virtual screening and generative design of a selective hCA inhibition.

Similar content being viewed by others

Identification of non-classical hCA XII inhibitors using combination of computational approaches for drug design and discovery

Article Open access 30 July 2021

Machine learning-guided discovery of thermophilic carbonic anhydrases from environmental metagenomes

Article Open access 20 November 2025

QSAR machine learning-guided discovery of renieramycin right-half derivatives with cytotoxic activity against NSCLC cells

Article Open access 05 December 2025

Data availability

The data used for this project has been retrieved from the ChEMBL database (https://www.ebi.ac.uk/chembl/). All codes to reproduce the results, figures, and installing CAInsight are available at https://github.com/miladrayka/hca_ml.

References

  1. Manzari, M. T. et al. Targeted drug delivery strategies for precision medicines. Nat. Rev. Mater. 6, 351–370 (2021).

    Google Scholar 

  2. Srinivasarao, M. & Low, P. S. Ligand-targeted drug delivery. Chem. Rev. 117, 12133–12164 (2017).

    Google Scholar 

  3. Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).

    Google Scholar 

  4. Bhandari, V. et al. Molecular landmarks of tumor hypoxia across cancer types. Nat. Genet. 51, 308–318 (2019).

    Google Scholar 

  5. Pan, Y., Liu, L., Mou, X. & Cai, Y. Nanomedicine strategies in conquering and utilizing the cancer hypoxia environment. ACS Nano 17, 20875–20924 (2023).

    Google Scholar 

  6. Liu, J.-N., Bu, W. & Shi, J. Chemical design and synthesis of functionalized probes for imaging and treating tumor hypoxia. Chem. Rev. 117, 6160–6224 (2017).

    Google Scholar 

  7. D’Ambrosio, K. et al. Multiple binding modes of inhibitors to human carbonic anhydrases: An update on the design of isoform-specific modulators of activity. Chem. Rev. 125, 150–222 (2025).

    Google Scholar 

  8. Baroni, C. et al. Lasamide, a potent human carbonic anhydrase inhibitor from the market: Inhibition profiling and crystallographic studies. ACS Med. Chem. Lett. 15, 1749–1755 (2024).

    Google Scholar 

  9. Eldehna, W. M. et al. Benzofuran-based carboxylic acids as carbonic anhydrase inhibitors and antiproliferative agents against breast cancer. ACS Med. Chem. Lett. 11, 1022–1027 (2020).

    Google Scholar 

  10. Peerzada, M. N. et al. Discovery of novel hydroxyimine-tethered benzenesulfonamides as potential human carbonic anhydrase IX/XII inhibitors. ACS Med. Chem. Lett. 14, 810–819 (2023).

    Google Scholar 

  11. Kciuk, M. et al. Targeting carbonic anhydrase IX and XII isoforms with small molecule inhibitors and monoclonal antibodies. J. Enzyme Inhib. Med. Chem. 37, 1278–1298 (2022).

    Google Scholar 

  12. Mishra, C. B., Tiwari, M. & Supuran, C. T. Progress in the development of human carbonic anhydrase inhibitors and their pharmacological applications: Where are we today?. Med. Res. Rev. 40, 2485–2565 (2020).

    Google Scholar 

  13. Nada, H., Meanwell, N. A. & Gabr, M. T. Virtual screening: hope, hype, and the fine line in between. Expert Opin. Drug Discov. 20, 145–162 (2025).

    Google Scholar 

  14. Weissenow, K. & Rost, B. Are protein language models the new universal key?. Curr. Opin. Struct. Biol. 91, 102997 (2025).

    Google Scholar 

  15. Rayka, M., Mirzaei, M., Farnoosh, G. & Latifi, A. M. Investigating enzyme biochemistry by deep learning: A computational tool for a new era. J. Comput. Biophys. Chem. 23, 781–799 (2024).

    Google Scholar 

  16. Hann, M. M. & Keserű, G. M. The continuing importance of chemical intuition for the medicinal chemist in the era of Artificial Intelligence. Expert Opin. Drug Discov. 20, 137–140 (2025).

    Google Scholar 

  17. Nobel Prize in Chemistry 2024 (2024). https://www.nobelprize.org/prizes/chemistry/2024/press-release.

  18. Pitt, W. R. et al. Real-world applications and experiences of AI/ML deployment for drug discovery. J. Med. Chem. 68, 851–859 (2025).

    Google Scholar 

  19. Wang, H. Prediction of protein-ligand binding affinity via deep learning models. Briefings Bioinf. 25, bbae081 (2024).

    Google Scholar 

  20. Schapin, N., Majewski, M., Varela-Rial, A., Arroniz, C. & Fabritiis, G. D. Machine learning small molecule properties in drug discovery. Artif. Intell. Chem. 1, 100020 (2023).

    Google Scholar 

  21. Zhou, G. et al. An artificial intelligence accelerated virtual screening platform for drug discovery. Nat. Commun. 15, 1–14 (2024).

    Google Scholar 

  22. Li, L. et al. Machine learning optimization of candidate antibody yields highly diverse sub-nanomolar affinity antibody libraries. Nat. Commun. 14, 1–12 (2023).

    Google Scholar 

  23. Sapoval, N. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun. 13, 1–12 (2022).

    Google Scholar 

  24. Galati, S. et al. Predicting isoform-selective carbonic anhydrase inhibitors via machine learning and rationalizing structural features important for selectivity. ACS Omega 6, 4080–4089 (2021).

    Google Scholar 

  25. Kim, S. et al. PubChem 2019 update. Nucleic Acids Res. 51, D1102–D1109 (2019).

    Google Scholar 

  26. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Google Scholar 

  27. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Google Scholar 

  28. Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems (O’Reilly Media, Inc., 2022).

  29. Tinivella, A., Pinzi, L. & Rastelli, G. Prediction of activity and selectivity profiles of human carbonic anhydrase inhibitors using machine learning classification models. J. Cheminf. 13, 1–15 (2021).

    Google Scholar 

  30. Zdrazil, B. et al. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 52, D1180–D1192 (2024).

    Google Scholar 

  31. Li, J. et al. Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction. (accessed 3 Jul 2024). https://arxiv.org/html/2308.09639v2 (2024).

  32. Su, M. et al. Comparative assessment of scoring functions: The CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2019).

    Google Scholar 

  33. Tang, J. et al. Making sense of large-scale kinase inhibitor bioactivity data sets: A comparative and integrative analysis. J. Chem. Inf. Model. 54, 735–743 (2014).

    Google Scholar 

  34. Contributors, D. Datamol: Molecular processing made easy. https://github.com/datamol-io/datamol (2025).

  35. Rdkit: Open-source cheminformatics. https://www.rdkit.org (2023). Version 2023.9.5.

  36. Identification of Novel Carbonic Anhydrase IX Inhibitors Using High-Throughput Screening of Pooled Compound Libraries by DNA-Linked Inhibitor Antibody Assay (DIANA) (2020). (accessed 28 Dec 2025).

  37. Deng, J. et al. A systematic study of key elements underlying molecular property prediction. Nat. Commun. 14, 1–20 (2023).

    Google Scholar 

  38. Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).

    Google Scholar 

  39. Bemis, G. W. & Murcko, M. A. The properties of known drugs: 1—Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    Google Scholar 

  40. Ramsundar, B. et al. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More (O’Reilly Media, Inc., 2019).

  41. Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  42. Wigh, D. S., Goodman, J. M. & Lapkin, A. A. A review of molecular representation in the age of machine learning. WIREs Comput. Mol. Sci. 12, e1603 (2022).

    Google Scholar 

  43. David, L., Thakkar, A., Mercado, R. & Engkvist, O. Molecular representations in AI-driven drug discovery: a review and practical guide. J. Cheminf. 12, 1–22 (2020).

    Google Scholar 

  44. Sánchez-Cruz, N., Medina-Franco, J. L., Mestres, J. & Barril, X. Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics 37, 1376–1382 (2021).

    Google Scholar 

  45. Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).

    Google Scholar 

  46. Chithrananda, S., Grand, G. & Ramsundar, B. ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction. arXiv. 2010.09885 (2020).

  47. Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. In ACM Conferences, 785–794 (Association for Computing Machinery, New York, NY, USA, 2016).

  48. Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Deep Learning Vol. 1 (MIT press Cambridge, 2016).

    Google Scholar 

  49. Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? arxiv preprint arxiv: 1810.00826. Published online (2018).

  50. Arvidsson McShane, S. et al. CPSign: conformal prediction for cheminformatics modeling. J. Cheminf. 16, 75 (2024).

    Google Scholar 

  51. Norinder, U., Carlsson, L., Boyer, S. & Eklund, M. Introducing conformal prediction in predictive modeling: A transparent and flexible alternative to applicability domain determination. J. Chem. Inf. Model. 54, 1596–1603 (2014).

    Google Scholar 

  52. Wellawatte, G. P., Seshadri, A. & White, A. D. Model agnostic generation of counterfactual explanations for molecules. Chem. Sci. 13, 3697–3705 (2022).

    Google Scholar 

  53. Wu, Z. et al. From black boxes to actionable insights: A perspective on explainable artificial intelligence for scientific discovery. J. Chem. Inf. Model. 63, 7617–7627 (2023).

    Google Scholar 

  54. Pollastri, M. P. Overview on the rule of five. Curr. Protoc. Pharmacol. 49, 9.12.1-9.12.8 (2010).

    Google Scholar 

  55. Sosnin, S. MolCompass: multi-tool for the navigation in chemical space and visual validation of QSAR/QSPR models. J. Cheminf. 16, 1–13 (2024).

    Google Scholar 

  56. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  57. Lemaître, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 1–5 (2017).

    Google Scholar 

  58. Ash, J. R. et al. Practically significant method comparison protocols for machine learning in small molecule drug discovery. ChemRxiv (2024).

  59. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Google Scholar 

  60. Dwivedi, V. P. et al. Benchmarking graph neural networks. arxiv: 2003.00982 (2020).

  61. Pembury-Smith, M. Q. R. & Ruxton, G. D. Effective use of the McNemar test. Behav. Ecol. Sociobiol. 74, 133–9 (2020).

    Google Scholar 

  62. McDonald, P. C. et al. A phase 1 study of SLC-0111, a novel inhibitor of carbonic anhydrase IX, in patients with advanced solid tumors. Am. J. Clin. Oncol. 43, 484 (2020).

    Google Scholar 

  63. Pacchiano, F. et al. Ureido-substituted benzenesulfonamides potently inhibit carbonic anhydrase IX and show antimetastatic activity in a model of breast cancer metastasis. J. Med. Chem. 54, 1896–1902 (2011).

    Google Scholar 

  64. Williams, K. J. & Gieling, R. G. Preclinical evaluation of ureidosulfamate carbonic anhydrase IX/XII inhibitors in the treatment of cancers. Int. J. Mol. Sci. 20, 6080 (2019).

    Google Scholar 

  65. Thaingtamtanha, T., Ravichandran, R. & Gentile, F. On the application of artificial intelligence in virtual screening. Expert Opin. Drug Discov. 20, 845–857 (2025).

    Google Scholar 

  66. Gangwal, A. & Lavecchia, A. Unleashing the power of generative ai in drug discovery. Drug Discov. Today 29, 103992 (2024).

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge the support and resources from the Center for High-Performance Computing (SARMAD) at Shahid Beheshti University of Iran. This work is based upon research funded by Iran National Science Foundation (INSF) under project No.4037187.

Funding

This work is based upon research funded by Iran National Science Foundation (INSF) under project No.4037187.

Author information

Author notes
  1. These authors contributed equally: Masoumeh Shams Ghamsary and Milad Rayka.

Authors and Affiliations

  1. Department of Physical and Computational Chemistry, Shahid Beheshti University, Tehran, 1983969411, Iran

    Masoumeh Shams Ghamsary, Milad Rayka & S. Shahab Naghavi

Authors
  1. Masoumeh Shams Ghamsary
    View author publications

    Search author on:PubMed Google Scholar

  2. Milad Rayka
    View author publications

    Search author on:PubMed Google Scholar

  3. S. Shahab Naghavi
    View author publications

    Search author on:PubMed Google Scholar

Contributions

SSN conceptualized the study, MR and MSQ contributed equally to this study by conducting the experiments, and writing the manuscript. SSN revised the manuscript. All authors read the manuscript, provided feedback and eventually approved it in its final form.

Corresponding author

Correspondence to S. Shahab Naghavi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghamsary, M.S., Rayka, M. & Naghavi, S.S. Interpretable machine learning rationalizes carbonic anhydrase inhibition via conformal and counterfactual prediction. Sci Rep (2026). https://doi.org/10.1038/s41598-026-39771-2

Download citation

  • Received: 02 November 2025

  • Accepted: 06 February 2026

  • Published: 11 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-39771-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Human carbonic anhydrase
  • Machine learning
  • Deep learning
  • Selective inhibitor
  • Activity
  • Interpretability
  • Uncertainty quantification
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer