Graph-based deep learning approach for high-throughput protein-DNA interaction scoring

Zhao, Yi-hao; Wang, Ying; Shen, Chao; Jiang, De-jun; Gu, Shu-kai; Zhao, Hui-feng; You, Zi-yi; Hou, Ting-jun; Kang, Yu

doi:10.1038/s41401-025-01688-3

Article
Published: 01 December 2025

Graph-based deep learning approach for high-throughput protein-DNA interaction scoring

Yi-hao Zhao¹^na1,
Ying Wang¹^na1,
Chao Shen²,
De-jun Jiang³,
Shu-kai Gu¹,
Hui-feng Zhao¹,
Zi-yi You¹,
Ting-jun Hou^1,4 &
…
Yu Kang^1,4

Acta Pharmacologica Sinica volume 47, pages 977–989 (2026)Cite this article

554 Accesses
1 Altmetric
Metrics details

Abstract

Accurately quantifying protein-DNA interactions (PDIs) is critical for understanding biological processes and facilitating drug design. However, the inherent flexibility of nucleic acids limits the availability of experimentally determined structures of PDI complexes, posing a significant challenge for training reliable scoring functions (SFs). To address this, we developed PDIScore, a novel deep learning-based SF for PDI prediction. PDIScore utilizes a comprehensive graph representation to capture nucleotide flexibility, employs a scalable GraphGPS architecture with BigBird linear global attention to handle large interaction interfaces, and leverages Mixture Density Networks (MDNs) to model residue-nucleotide distance distributions. PDIScore was trained on a self-collected dataset of ~7000 protein-nucleic acid complex structures and validated on three rigorous test sets for evaluating its screening, docking, and ranking capabilities. The results illustrated that PDIScore significantly outperformed existing methods: it achieved the best screening power on the screening set (e.g., EF_1% = 14.13, AUROC = 0.82 using AlphaFold3 structures), the highest docking success rate on the docking set (48.94% top1), and superior ranking capability on the ranking set (PCC = 0.50). Case studies demonstrated PDIScore’s ability to elucidate biological mechanisms (e.g., adenovirus transcription, SOCS1 regulation) and its interpretability at the nucleotide level for identifying key interaction sites. PDIScore represents a robust, generalizable tool with significant potential for advancing PDI-related research and therapeutic design.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The model architecture of PDIScore.**

**Fig. 2: Screening and ranking powers of SFs on the screening set.**

**Fig. 3: Docking powers of SFs on the docking benchmark with two cases.**

**Fig. 4: Ranking powers of SFs on the ranking dataset.**

**Fig. 5: PDIScore predicts the transcriptional activity of the promoter in the regulation of SOCS1 by Egr1.**

**Fig. 6: The decomposed contributions of PDIScore at the nucleotide level.**

Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree

Article Open access 15 June 2022

Decoding the protein–ligand interactions using parallel graph neural networks

Article Open access 10 May 2022

A robust deep learning approach for identification of RNA 5-methyluridine sites

Article Open access 28 October 2024

Data availability

The datasets are available at https://doi.org/10.5281/zenodo.13764615. The code is available at https://github.com/roger-yh-zhao/pdiscore.

References

Charoensawan V, Wilson D, Teichmann SA. Genomic repertoires of DNA-binding transcription factors across the tree of life. Nucleic Acids Res. 2010;38:7364–77.
Article CAS PubMed PubMed Central Google Scholar
Markodimitraki CM, Rang FJ, Rooijers K, de Vries SS, Chialastri A, de Luca KL, et al. Simultaneous quantification of protein-DNA interactions and transcriptomes in single cells with scDam&T-seq. Nat Protoc. 2020;15:1922–53.
Article CAS PubMed PubMed Central Google Scholar
Yu L, Liu P. Cytosolic DNA sensing by cGAS: regulation, function, and human diseases. Signal Transduct Target Ther. 2021;6:170.
Article CAS PubMed PubMed Central Google Scholar
Wells A, Heckerman D, Torkamani A, Yin L, Sebat J, Ren B, et al. Ranking of non-coding pathogenic variants and putative essential regions of the human genome. Nat Commun. 2019;10:5241.
Article PubMed PubMed Central Google Scholar
Jiao W, Atwal G, Polak P, Karlic R, Cuppen E, Subtypes PT, et al. A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nat Commun. 2020;11:728.
Article CAS PubMed PubMed Central Google Scholar
Guo AD, Yan KN, Hu H, Zhai L, Hu TF, Su H, et al. Spatiotemporal and global profiling of DNA-protein interactions enables discovery of low-affinity transcription factors. Nat Chem. 2023;15:803–14.
Article CAS PubMed Google Scholar
Chan LL, Pineda M, Heeres JT, Hergenrother PJ, Cunningham BT. A general method for discovering inhibitors of protein-DNA interactions using photonic crystal biosensors. ACS Chem Biol. 2008;3:437–48.
Article CAS PubMed PubMed Central Google Scholar
Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12:56–68.
Article CAS PubMed PubMed Central Google Scholar
Nikolov DB, Chen H, Halay ED, Usheva AA, Hisatake K, Lee DK, et al. Crystal structure of a TFIIB-TBP-TATA-element ternary complex. Nature. 1995;377:119–28.
Article CAS PubMed Google Scholar
Hashimoto H, Olanrewaju YO, Zheng Y, Wilson GG, Zhang X, Cheng X. Wilms tumor protein recognizes 5-carboxylcytosine within a specific DNA sequence. Genes Dev. 2014;28:2304–13.
Article PubMed PubMed Central Google Scholar
Kays AR, Schepartz A. Virtually unidirectional binding of TBP to the AdMLP TATA box within the quaternary complex with TFIIA and TFIIB. Chem Biol. 2000;7:601–10.
Article CAS PubMed Google Scholar
Mostecki J, Showalter BM, Rothman PB. Early growth response-1 regulates lipopolysaccharide-induced suppressor of cytokine signaling-1 transcription. J Biol Chem. 2005;280:2596–605.
Article CAS PubMed Google Scholar
Sobah ML, Liongue C, Ward AC. SOCS proteins in immunity, inflammatory diseases, and immune-related cancer. Front Med. 2021;8:727987.
Article Google Scholar
Sefah K, Shangguan D, Xiong X, O’Donoghue MB, Tan W. Development of DNA aptamers using Cell-SELEX. Nat Protoc. 2010;5:1169–85.
Article CAS PubMed Google Scholar
Huynh L, Chen A. Design of a protein-targeted DNA aptamer using atomistic simulation. J Biomol Struct Dyn. 2023;41:672–80.
Article CAS PubMed Google Scholar
Wang X, Wang Y, Cao A, Luo Q, Chen D, Zhao W, et al. Development of cyclopeptide inhibitors of cGAS targeting protein-DNA interaction and phase separation. Nat Commun. 2023;14:6132.
Article CAS PubMed PubMed Central Google Scholar
Kimoto M, Yamashige R, Matsunaga K, Yokoyama S, Hirao I. Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat Biotechnol. 2013;31:453–7.
Article CAS PubMed Google Scholar
Hellman LM, Fried MG. Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions. Nat Protoc. 2007;2:1849–61.
Article CAS PubMed PubMed Central Google Scholar
Hadzi S, Lah J. Analysis of protein-DNA interactions using isothermal titration calorimetry: successes and failures. Methods Mol Biol. 2022;2516:239–57.
Article CAS PubMed Google Scholar
Majka J, Speck C. Analysis of protein-DNA interactions using surface plasmon resonance. Adv Biochem Eng Biotechnol. 2007;104:13–36.
CAS PubMed Google Scholar
Townshend RJL, Eismann S, Watkins AM, Rangan R, Karelina M, Das R, et al. Geometric deep learning of RNA structure. Science. 2021;373:1047–51.
Article CAS PubMed PubMed Central Google Scholar
Roche R, Moussad B, Shuvo MH, Tarafder S, Bhattacharya D. EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. Nucleic Acids Res. 2024;52:e27.
Article CAS PubMed PubMed Central Google Scholar
Delgado J, Radusky LG, Cianferoni D, Serrano L. FoldX 5.0: working with RNA, small molecules and a new graphical interface. Bioinformatics. 2019;35:4168–9.
Article CAS PubMed PubMed Central Google Scholar
Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force field. Nucleic Acids Res. 2005;33:W382–8.
Article CAS PubMed PubMed Central Google Scholar
Wang E, Sun H, Wang J, Wang Z, Liu H, Zhang JZH, et al. End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design. Chem Rev. 2019;119:9478–508.
Article CAS PubMed Google Scholar
Chaudhury S, Lyskov S, Gray JJ. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics. 2010;26:689–91.
Article CAS PubMed PubMed Central Google Scholar
Yang W, Deng L. PreDBA: a heterogeneous ensemble approach for predicting protein-DNA binding affinity. Sci Rep. 2020;10:1278.
Article CAS PubMed PubMed Central Google Scholar
Yang S, Gong W, Zhou T, Sun X, Chen L, Zhou W, et al. emPDBA: protein-DNA binding affinity prediction by combining features from binding partners and interface learned with ensemble regression model. Brief Bioinform. 2023;24:bbad192.
Zhang X, Mei LC, Gao YY, Hao GF, Song BA. Web tools support predicting protein-nucleic acid complexes stability with affinity changes. WIREs RNA. 2023;14:e1781.
Article PubMed Google Scholar
Yan Y, Zhang D, Zhou P, Li B, Huang SY. HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy. Nucleic Acids Res. 2017;45:W365–73.
Article CAS PubMed PubMed Central Google Scholar
Rodriguez-Lumbreras LA, Jimenez-Garcia B, Gimenez-Santamarina S, Fernandez-Recio J. pyDockDNA: a new web server for energy-based protein-DNA docking and scoring. Front Mol Biosci. 2022;9:988996.
Article CAS PubMed PubMed Central Google Scholar
van Zundert GCP, Rodrigues J, Trellet M, Schmitz C, Kastritis PL, Karaca E, et al. The HADDOCK2.2 web server: user-friendly integrative modeling of biomolecular complexes. J Mol Biol. 2016;428:720–5.
Article PubMed Google Scholar
Honorato RV, Koukos PI, Jimenez-Garcia B, Tsaregorodtsev A, Verlato M, Giachetti A, et al. Structural biology in the clouds: the WeNMR-EOSC ecosystem. Front Mol Biosci. 2021;8:729513.
Article PubMed PubMed Central Google Scholar
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–9.
Article CAS PubMed Google Scholar
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630:493–500.
Article CAS PubMed PubMed Central Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–42.
Article CAS PubMed PubMed Central Google Scholar
Afek A, Shi H, Rangadurai A, Sahay H, Senitzki A, Xhani S, et al. DNA mismatches reveal conformational penalties in protein-DNA recognition. Nature. 2020;587:291–6.
Article CAS PubMed PubMed Central Google Scholar
van Dijk M, Bonvin AM. A protein-DNA docking benchmark. Nucleic Acids Res. 2008;36:e88.
Article PubMed PubMed Central Google Scholar
Larkin C, Datta S, Harley MJ, Anderson BJ, Ebie A, Hargreaves V, et al. Inter- and intramolecular determinants of the specificity of single-stranded DNA binding and cleavage by the F factor relaxase. Structure. 2005;13:1533–44.
Article CAS PubMed Google Scholar
Li J, Lee JC. Modulation of allosteric behavior through adjustment of the differential stability of the two interacting domains in E. coli cAMP receptor protein. Biophys Chem. 2011;159:210–6.
Article CAS PubMed PubMed Central Google Scholar
Yu S, Maillard RA, Gribenko AV, Lee JC. The N-terminal capping propensities of the D-helix modulate the allosteric activation of the Escherichia coli cAMP receptor protein. J Biol Chem. 2012;287:39402–11.
Article CAS PubMed PubMed Central Google Scholar
Seldeen KL, Deegan BJ, Bhat V, Mikles DC, McDonald CB, Farooq A. Energetic coupling along an allosteric communication channel drives the binding of Jun-Fos heterodimeric transcription factor to DNA. FEBS J. 2011;278:2090–104.
Article CAS PubMed PubMed Central Google Scholar
Rampášek L, Galkin M, Dwivedi VP, Luu AT, Wolf G, Beaini D. Recipe for a general, powerful, scalable graph transformer. NeurIPS. 2022;35:14501–15.
Google Scholar
Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, et al. Big bird: transformers for longer sequences. NeurIPS. 2020;33:17283–97.
Google Scholar
Shen C, Zhang X, Hsieh CY, Deng Y, Wang D, Xu L, et al. A generalized protein-ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem Sci. 2023;14:8129–46.
Article CAS PubMed PubMed Central Google Scholar
Shen C, Zhang X, Deng Y, Gao J, Wang D, Xu L, et al. Boosting protein-ligand binding pose prediction and virtual screening based on residue-atom distance likelihood potential and graph transformer. J Med Chem. 2022;65:10691–706.
Article CAS PubMed Google Scholar
Wang R, Fang X, Lu Y, Wang S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J Med Chem. 2004;47:2977–80.
Article CAS PubMed Google Scholar
Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera-a visualization system for exploratory research and analysis. J Comput Chem. 2004;25:1605–12.
Article CAS PubMed Google Scholar
Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 2011;487:545–74.
Article CAS PubMed PubMed Central Google Scholar
Lu XJ, Olson WK. 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 2003;31:5108–21.
Article CAS PubMed PubMed Central Google Scholar
Steinegger M, Soding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35:1026–8.
Article CAS PubMed Google Scholar
Michaud-Agrawal N, Denning EJ, Woolf TB, Beckstein O. MDAnalysis: a toolkit for the analysis of molecular dynamics simulations. J Comput Chem. 2011;32:2319–27.
Article CAS PubMed PubMed Central Google Scholar
Wang M, Xu L, Zheng D, Gan Q, Gai Y, Ye Z, et al. Deep graph library: towards efficient and scalable deep learning on graphs. Preprint at. https://doi.org/10.48550/arXiv.1909.01315 (2019).
Hu W, Liu B, Gomes J, Zitnik M, Liang P, Pande V, et al. In presented in part at the International Conference on Learning Representations (ICLR), (ICLR, 2020).
Bauer MR, Ibrahim TM, Vogel SM, Boeckler FM. Evaluation and optimization of virtual screening workflows with DEKOIS 2.0-a public library of challenging docking benchmark sets. J Chem Inf Model. 2013;53:1447–62.
Article CAS PubMed Google Scholar
Concino M, Goldman RA, Caruthers MH, Weinmann R. Point mutations of the adenovirus major late promoter with different transcriptional efficiencies in vitro. J Biol Chem. 1983;258:8493–6.
Article CAS PubMed Google Scholar
Ilangumaran S, Gui Y, Shukla A, Ramanathan S. SOCS1 expression in cancer cells: potential roles in promoting antitumor immunity. Front Immunol. 2024;15:1362224.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was financially supported by National Natural Science Foundation of China (22220102001, 22307112, 92370130, 22303081) and Postdoctoral Fellowship Program of CPSF (GZB20230648).

Author information

These authors contributed equally: Yi-hao Zhao, Ying Wang.

Authors and Affiliations

College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
Yi-hao Zhao, Ying Wang, Shu-kai Gu, Hui-feng Zhao, Zi-yi You, Ting-jun Hou & Yu Kang
Department of Clinical Pharmacy, the First Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, 310003, China
Chao Shen
Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004, China
De-jun Jiang
Zhejiang Provincial Key Laboratory for Intelligent Drug Discovery and Development, Jinhua, 321016, China
Ting-jun Hou & Yu Kang

Authors

Yi-hao Zhao
View author publications
Search author on:PubMed Google Scholar
Ying Wang
View author publications
Search author on:PubMed Google Scholar
Chao Shen
View author publications
Search author on:PubMed Google Scholar
De-jun Jiang
View author publications
Search author on:PubMed Google Scholar
Shu-kai Gu
View author publications
Search author on:PubMed Google Scholar
Hui-feng Zhao
View author publications
Search author on:PubMed Google Scholar
Zi-yi You
View author publications
Search author on:PubMed Google Scholar
Ting-jun Hou
View author publications
Search author on:PubMed Google Scholar
Yu Kang
View author publications
Search author on:PubMed Google Scholar

Contributions

TJH and YK designed the research study. YHZ developed the method and wrote the code. YHZ, YW, CS, DJJ, SKG, HFZ, and ZYY performed the analysis. YHZ and TJH wrote the paper. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Ting-jun Hou or Yu Kang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download DOCX )

affinity (download XLSX )

cluster (download XLSX )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, Yh., Wang, Y., Shen, C. et al. Graph-based deep learning approach for high-throughput protein-DNA interaction scoring. Acta Pharmacol Sin 47, 977–989 (2026). https://doi.org/10.1038/s41401-025-01688-3

Download citation

Received: 14 July 2025
Accepted: 02 October 2025
Published: 01 December 2025
Version of record: 01 December 2025
Issue date: April 2026
DOI: https://doi.org/10.1038/s41401-025-01688-3

Graph-based deep learning approach for high-throughput protein-DNA interaction scoring

Abstract

Access options

Similar content being viewed by others

Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree

Decoding the protein–ligand interactions using parallel graph neural networks

A robust deep learning approach for identification of RNA 5-methyluridine sites

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information (download DOCX )

affinity (download XLSX )

cluster (download XLSX )

Rights and permissions

About this article

Cite this article

Key words

Search

Quick links

Abstract

Access options

Similar content being viewed by others

Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree

Decoding the protein–ligand interactions using parallel graph neural networks

A robust deep learning approach for identification of RNA 5-methyluridine sites

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information (download DOCX )

affinity (download XLSX )

cluster (download XLSX )

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Quick links