Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Communications
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. nature communications
  3. articles
  4. article
Deciphering DEL pocket patterns through contrastive learning
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 16 February 2026

Deciphering DEL pocket patterns through contrastive learning

  • Wenyi Zhang  ORCID: orcid.org/0009-0008-4969-10991,2,
  • Yuxing Wang  ORCID: orcid.org/0009-0004-8271-04161,2,
  • Rui Zhan  ORCID: orcid.org/0009-0005-5055-76101,2,
  • Runtong Qian  ORCID: orcid.org/0009-0005-8743-48331,2,
  • Qi Hu  ORCID: orcid.org/0000-0001-7803-78211,2 &
  • …
  • Jing Huang  ORCID: orcid.org/0000-0001-9639-29071,2 

Nature Communications , Article number:  (2026) Cite this article

  • 1691 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biophysics
  • High-throughput screening
  • Protein function predictions
  • Structural biology

Abstract

DNA-encoded libraries (DELs) facilitate high-throughput screening of trillions of molecules against protein targets through split-pool synthesis and DNA tagging. Despite their potential, only a few DEL-derived compounds have advanced to clinical trials or reached the market. A better understanding of the defining characteristics of target proteins, particularly those with binding pockets suitable for DEL screening, is critical to improving success rates. However, existing approaches remain limited in assessing pocket flexibility and functional similarity. Here, we present ErePOC, a pocket representation model based on contrastive learning with ESM-2 embeddings to address these challenges. ErePOC captures both structural and functional features of binding pockets, enabling identification of shared characteristics among DEL targets. By integrating analyses of low-dimensional physicochemical properties and high-dimensional ErePOC embeddings, we provide a comprehensive view of DEL target space. With 98% precision in downstream classification tasks, ErePOC demonstrates high performance in pocket representation, which is then applied to predict human proteins suitable for DEL screening, with enrichment uncovered across 18 functional categories. This work establishes a framework for enhancing DEL-based drug discovery through more effective target selection and pocket similarity analysis.

Similar content being viewed by others

A pocket-based 3D molecule generative model fueled by experimental electron density

Article Open access 06 September 2022

Comprehensive detection and characterization of human druggable pockets through binding site descriptors

Article Open access 10 September 2024

PocketFlow is a data-and-knowledge-driven structure-based molecular generative model

Article 11 March 2024

Data availability

The datasets generated and analyzed in this study have been deposited in the Zenodo database under [https://doi.org/10.5281/zenodo.18033921]. The raw data supporting the findings of this study, including model outputs and processed datasets, are available from Zenodo under this accession code. Source data underlying all quantitative figures with manageable size are provided as individual Source data Excel files with this manuscript, including Figs. 1, 2, 5 and 7, and Supplementary Figs. S1–S4, S9, S12–S13, S15 and S17–S19. Due to the large scale of the datasets underlying the t-SNE visualizations (Fig. 4 and Supplementary Figs. S5–S11 and S14), individual data points are not provided as Source data files; however, the full processed inputs required to reproduce these figures are available on Zenodo. Raw data related to Figs. 1, 2, 7, S2, S4, S13, S17 and S19 are also provided on Zenodo. PyMOL session files (.pse) used for structural visualization in Fig. 6 and Supplementary Fig. S16 are available on Zenodo as well. The BioLiP2, AlphaFill, and AF2-predicted protein structure data used in this study are publicly available from the BioLiP, AlphaFill, and AlphaFold databases at [https://zhanggroup.org/BioLiP/], [https://alphafill.eu/], and [https://alphafold.ebi.ac.uk/download], respectively. Lists of target proteins for BioLiP2, AlphaFill, and AlphaFold-predicted human proteins, as well as PDB code mappings for DEL and FDA-AD targets, are also available on Zenodo. Source data are provided with this paper.

Code availability

All custom source code and algorithms generated in this study are publicly available at https://github.com/JingHuangLab/ErePOC, along with the processed data required to reproduce all results reported in this manuscript. There are no restrictions on access.

References

  1. Ma, P. et al. Evolution of chemistry and selection technology for DNA-encoded library. Acta Pharm. Sin. B 14, 492–516 (2024).

    Google Scholar 

  2. Gironda-Martínez, A., Donckele, E. J., Samain, F. & Neri, D. DNA-encoded chemical libraries: a comprehensive review with successful stories and future challenges. ACS Pharmacol. Transl. Sci. 4, 1265–1279 (2021).

    Google Scholar 

  3. Hou, N. et al. Development of highly potent noncovalent inhibitors of SARS-CoV-2 3CLpro. ACS Cent. Sci. 9, 217–227 (2023).

    Google Scholar 

  4. Ding, Y. et al. Discovery of soluble epoxide hydrolase inhibitors through DNA-encoded library technology (ELT). Bioorg. Med. Chem. 41, 116216 (2021).

    Google Scholar 

  5. Cuozzo, J. W. et al. Novel autotaxin inhibitor for the treatment of idiopathic pulmonary fibrosis: a clinical candidate discovered using DNA-encoded chemistry. J. Med. Chem. 63, 7840–7856 (2020).

    Google Scholar 

  6. Harris, P. A. et al. Discovery of a first-in-class receptor interacting protein 1 (RIP1) kinase specific clinical candidate (GSK2982772) for the treatment of inflammatory diseases. J. Med. Chem. 60, 1247–1261 (2017).

    Google Scholar 

  7. Peterson, A. A. & Liu, D. R. Small-molecule discovery through DNA-encoded libraries. Nat. Rev. Drug Discov. 22, 699–722 (2023).

    Google Scholar 

  8. Wichert, M., Guasch, L. & Franzini, R. M. Challenges and Prospects of DNA-Encoded Library Data Interpretation. Chem. Rev. 124, 12551–12572 (2024).

    Google Scholar 

  9. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Google Scholar 

  10. Ouyang-Zhang, J., Diaz, D., Klivans, A. & Krähenbühl, P. Predicting a protein’s stability under a million mutations. Advances in Neural Information Processing Systems 36, 76229–76247 (NeurIPS, 2023).

  11. Kulmanov, M. et al. Protein function prediction as approximate semantic entailment. Nat. Mach. Intell. 6, 220–228 (2024).

    Google Scholar 

  12. Wang, D. et al. S-PLM: structure-aware protein language model via contrastive learning between sequence and structure. Adv. Sci. 12, e2404212 (2025).

  13. He, Y. et al. Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing. Mol. Cell 84, 1257–1270 e1256 (2024).

    Google Scholar 

  14. Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Google Scholar 

  15. Zhou, G. et al. Uni-Mol: a universal 3D molecular representation learning framework. The Eleventh International Conference on Learning Representations (OpenReview, 2023).

  16. Li, S. et al. PocketAnchor: learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst. 14, 692–705. e696 (2023).

    Google Scholar 

  17. Luo, Z., Wu, W., Sun, Q. & Wang, J. Accurate and transferable drug–target interaction prediction with DrugLAMP. Bioinformatics 40, btae693 (2024).

    Google Scholar 

  18. Zhao, L., Wang, H. & Shi, S. PocketDTA: an advanced multimodal architecture for enhanced prediction of drug−target affinity from 3D structural data of target binding pockets. Bioinformatics 40, btae594 (2024).

    Google Scholar 

  19. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. International Conference on Machine Learning, 1597–1607 (PMLR, 2020).

  20. Kahraman, A., Morris, R. J., Laskowski, R. A. & Thornton, J. M. Shape variation in protein binding pockets and their ligands. J. Mol. Biol. 368, 283–301 (2007).

    Google Scholar 

  21. Skolnick, J. & Gao, M. Interplay of physics and evolution in the likely origin of protein biochemical function. Proc. Natl. Acad. Sci. USA 110, 9344–9349 (2013).

    Google Scholar 

  22. Dhenni, R. et al. Macrophages direct location-dependent recall of B cell memory to vaccination. Cell 188, 3477–3496 e3422 (2025).

    Google Scholar 

  23. Zhang, C., Zhang, X., Freddolino, P. L. & Zhang, Y. BioLiP2: an updated structure database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 52, D404–D412 (2024).

    Google Scholar 

  24. Hekkelman, M. L., de Vries, I., Joosten, R. P. & Perrakis, A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat. Methods 20, 205–213 (2023).

    Google Scholar 

  25. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Google Scholar 

  26. Leeson, P. D. & Springthorpe, B. The influence of drug-like concepts on decision-making in medicinal chemistry. Nat. Rev. Drug Discov. 6, 881–890 (2007).

    Google Scholar 

  27. Le Guilloux, V., Schmidtke, P. & Tuffery, P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinforma. 10, 1–11 (2009).

    Google Scholar 

  28. Jubb, H. C. et al. Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures. J. Mol. Biol. 429, 365–371 (2017).

    Google Scholar 

  29. Vilar, S., Cozza, G. & Moro, S. Medicinal chemistry and the molecular operating environment (MOE): application of QSAR and molecular docking to drug discovery. Curr. Top. Med. Chem. 8, 1555–1572 (2008).

    Google Scholar 

  30. Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  31. Varadi, M. et al. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 52, D368–D375 (2024).

    Google Scholar 

  32. Skolnick, J. & Zhou, H. Implications of the essential role of small molecule ligand binding pockets in protein-protein interactions. J. Phys. Chem. B 126, 6853–6867 (2022).

    Google Scholar 

  33. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids Res. 33, 2302–2309 (2005).

    Google Scholar 

  34. Hu, J. & Zhang, Y. PPS-align: an user-friendly pocket structural alignment server. https://zhanglab.dcmb.med.umich.edu/PPS-align/.

  35. Xie, L. & Bourne, P. E. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. Natl. Acad. Sci. USA 105, 5441–5446 (2008).

    Google Scholar 

  36. Iqbal, S. et al. Evaluation of DNA encoded library and machine learning model combinations for hit discovery. npj Drug Discov. 2, 5 (2025).

  37. Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).

    Google Scholar 

  38. Liu, F. et al. The impact of library size and scale of testing on virtual screening. Nat. Chem. Biol. 21, 1039–1045 (2025).

    Google Scholar 

  39. Pulous, F. E. et al. MAT2A inhibition combats metabolic and transcriptional reprogramming in cancer. Drug Discov. Today 29, 104189 (2024).

    Google Scholar 

  40. Zhang, W. & Huang, J. EViS: an enhanced virtual screening approach based on pocket–ligand similarity. J. Chem. Inf. Model. 62, 498–510 (2022).

    Google Scholar 

  41. Zhan, R., Zhang, W. & Huang, J. Scrutinization on docking against individually generated target pockets for each ligand. Preprint at bioRxiv https://doi.org/10.1101/2025.01.01.630989 (2025).

  42. Nishimura, T., Tsuboyama, K., Nakagaki, Y., Yamamoto, E. & Mizushima, N. Rational engineering of lipid-binding probes via high-throughput protein-lipid interaction screening. Preprint at bioRxiv https://doi.org/10.1101/2025.01.05.627504 (2025).

  43. Daina, A. & Zoete, V. Testing the predictive power of reverse screening to infer drug targets, with the help of machine learning. Commun. Chem. 7, 105 (2024).

    Google Scholar 

  44. Zhang, W., Zhang, K. & Huang, J. A simple way to incorporate target structural information in molecular generative models. J. Chem. Inf. Model. 63, 3719–3730 (2023).

    Google Scholar 

  45. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Google Scholar 

  46. Abramson, J. et al. Addendum: accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 636, E4 (2024).

    Google Scholar 

  47. Chen, G. et al. Rapid clearance of achiral small-molecule drugs using de novo-designed proteins and their cyclic and mirror-image variants. Nat. Biomed. Eng. 9, 1775–1787 (2025).

    Google Scholar 

  48. Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).

    Google Scholar 

  49. Bento, A. P. et al. An open source chemical structure curation pipeline using RDKit. J. Cheminform. 12, 51 (2020).

    Google Scholar 

  50. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

    Google Scholar 

  51. O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).

    Google Scholar 

  52. Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    Google Scholar 

Download references

Acknowledgements

This work was supported by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang, grant numbers 2023C03109 (J.H.) and 2024SSYS0036 (J.H.); the National Natural Science Foundation of China, grant 32171247 (J.H.), T2596084 (J.H.), and 32501101 (W.Z.); the Zhejiang Provincial Natural Science Foundation, grant LQ23F020011 (W.Z.); the State Key Laboratory of Gene Expression; and the Westlake Education Foundation. The authors thank the Westlake University Supercomputer Center for computational resources and related assistance.

Author information

Authors and Affiliations

  1. State Key Laboratory of Gene Expression, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China

    Wenyi Zhang, Yuxing Wang, Rui Zhan, Runtong Qian, Qi Hu & Jing Huang

  2. Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China

    Wenyi Zhang, Yuxing Wang, Rui Zhan, Runtong Qian, Qi Hu & Jing Huang

Authors
  1. Wenyi Zhang
    View author publications

    Search author on:PubMed Google Scholar

  2. Yuxing Wang
    View author publications

    Search author on:PubMed Google Scholar

  3. Rui Zhan
    View author publications

    Search author on:PubMed Google Scholar

  4. Runtong Qian
    View author publications

    Search author on:PubMed Google Scholar

  5. Qi Hu
    View author publications

    Search author on:PubMed Google Scholar

  6. Jing Huang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

J.H. and Q.H. conceived the study. W.Z., Y.W., R.Z., and R.Q. designed the experiments. All authors analyzed the results. J.H. and W.Z. wrote the manuscript.

Corresponding author

Correspondence to Jing Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Holly Soutter, Noel O’Boyle, who co-reviewed with Melissa F. Adasme, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Wang, Y., Zhan, R. et al. Deciphering DEL pocket patterns through contrastive learning. Nat Commun (2026). https://doi.org/10.1038/s41467-026-69663-y

Download citation

  • Received: 17 June 2025

  • Accepted: 04 February 2026

  • Published: 16 February 2026

  • DOI: https://doi.org/10.1038/s41467-026-69663-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Videos
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims & Scope
  • Editors
  • Journal Information
  • Open Access Fees and Funding
  • Calls for Papers
  • Editorial Values Statement
  • Journal Metrics
  • Editors' Highlights
  • Contact
  • Editorial policies
  • Top Articles

Publish with us

  • For authors
  • For Reviewers
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Communications (Nat Commun)

ISSN 2041-1723 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing