Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A federated graph learning method to realize multi-party collaboration for molecular discovery

Abstract

Optimizing molecular resource utilization for molecular discovery requires collaborative efforts across research institutions and organizations to accelerate progress. However, given the high research value of both successful and unsuccessful molecules produced by each institution (or organization), these findings are typically kept highly private and confidential until formal publication or commercialization, with even failed molecules rarely disclosed. This confidentiality requirement presents a great challenge for most existing methods when collaboratively handling molecular data with heterogeneous distributions under stringent privacy constraints. Here we propose FedLG (federated learning Lanczos graph), a federated graph learning method that leverages the Lanczos algorithm to facilitate collaborative model training across multiple parties, achieving reliable prediction performance under strict privacy protection conditions. Compared with various existing federate learning methods, FedLG exhibits excellent model performance on 18 benchmark datasets in a simulated federated learning environment. Under different privacy-preserving mechanism settings, FedLG demonstrates robust performance and resistance to noise. Leave-one-client-out experiments and comparison tests across each simulated institution show that FedLG achieves improved heterogeneous data aggregation capabilities and more promising outcomes than localized training. In addition, we incorporate Bayesian optimization into FedLG to show its scalability and further stabilize model performance. Overall, FedLG can be considered an effective method to realize multi-party collaboration while ensuring that sensitive molecular information is protected from potential leakage.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Diagram showing the process flow of the FedLG method.
The alternative text for this image may have been generated using AI.
Fig. 2: GNN architectures.
The alternative text for this image may have been generated using AI.
Fig. 3: Comparison of effectiveness, performance and computational efficiency of the FedLG method using the MPNN model.
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

All datasets used in this study for method construction are freely available. The MoleculeNet (BBBP, BACE, SIDER, Tox21, ToxCast, ESOL, Lipo and FreeSolv) datasets are available at http://moleculenet.ai/datasets-1. The LIT-PCBA (ALDH1, FEN1, GBA, KAT2A, MAPK1, PKM2 and VDR) datasets are available at https://github.com/idrugLab/FP-GNN/blob/main/Data.rar. The DrugBank and BIOSNAP datasets are available at https://github.com/kexinhuang12345/CASTER/tree/master/DDE/data. The CoCrystal dataset is available at https://github.com/Saoge123/ccgnet/tree/main/data.

Code availability

All source codes for this study are available via GitHub at https://github.com/Turningl/FedLG and Zenodo at https://doi.org/10.5281/zenodo.16872722 (ref. 67).

References

  1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  2. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article  Google Scholar 

  3. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).

    Article  Google Scholar 

  4. Hartono, N. T. P. et al. How machine learning can help select capping layers to suppress perovskite degradation. Nat. Commun. 11, 4172 (2020).

    Article  Google Scholar 

  5. Jiang, Y. et al. Coupling complementary strategy to flexible graph neural network for quick discovery of coformer in diverse co-crystal materials. Nat. Commun. 12, 5950 (2021).

    Article  Google Scholar 

  6. Cao, Y. et al. Perovskite light-emitting diodes based on spontaneously formed submicrometre-scale structures. Nature 562, 249–253 (2018).

    Article  Google Scholar 

  7. Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).

    Article  Google Scholar 

  8. Müller, S. Small-molecule-mediated G-quadruplex isolation from human cells. Nat. Chem. 2, 1095–1098 (2010).

    Article  Google Scholar 

  9. Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

    Article  Google Scholar 

  10. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  Google Scholar 

  11. Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).

    Article  Google Scholar 

  12. Wishart, D. S. et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36, D901–D906 (2008).

    Article  Google Scholar 

  13. Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).

    Article  Google Scholar 

  14. Tan, L. et al. Tackling assay interference associated with small molecules. Nat. Rev. Chem. 8, 319–339 (2024).

    Article  Google Scholar 

  15. Durant, G. et al. The future of machine learning for small-molecule drug discovery will be driven by data. Nat. Comput. Sci. 4, 735–743 (2024).

    Article  Google Scholar 

  16. Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).

    Google Scholar 

  17. Zhu, W. et al. Federated learning of molecular properties with graph neural networks in a heterogeneous setting. Patterns 3, 100521 (2022).

    Article  Google Scholar 

  18. Xiong, Z. et al. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. Sci. China Life Sci. 65, 529–539 (2022).

    Article  Google Scholar 

  19. Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2024).

    Article  Google Scholar 

  20. Gilmer, J. et al. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).

  21. Veličković, P. et al. Graph attention networks. Preprint at https://arxiv.org/abs/1710.10903 (2018).

  22. Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations https://openreview.net/pdf?id=SJU4ayYgl (ICLR, 2017).

  23. Ning, Y. et al. GFedKRL: graph federated knowledge re-learning for effective molecular property prediction via privacy protection. In International Conference on Artificial Neural Networks. 426–438 (Springer, 2023).

  24. Cao, X., Jia, J., Zhang, Z. & Gong, N. Z. FedRecover: recovering from poisoning attacks in federated learning using historical information. In Proc. 2023 IEEE Symposium on Security and Privacy (SP) 1366–1383 (IEEE, 2023).

  25. Gupta, S. et al. Recovering private text in federated learning of language models. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) https://papers.neurips.cc/paper_files/paper/2022/file/35b5c175e139bff5f22a5361270fce87-Paper-Conference.pdf (2022).

  26. Zhang, K. et al. Flip: a provable defense framework for backdoor mitigation in federated learning. In International Conference on Learning Representations https://openreview.net/pdf?id=Xo2E217_M4n (ICLR, 2022).

  27. Chen, J. et al. FederEI: federated library matching framework for electron ionization mass spectrum based compound identification. Anal. Chem. 96, 15840–15845 (2024).

    Article  Google Scholar 

  28. Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl Bur. Stand. 45, 255 (1950).

    Article  MathSciNet  Google Scholar 

  29. Liao, R., Zhao, Z., Urtasun, R. & Zemel, R. S. Lanczosnet: multi-scale deep graph convolutional networks. In International Conference on Learning Representations (ICLR, 2019).

  30. Olkin, I. & Rubin, H. Multivariate beta distributions and independence properties of the Wishart distribution. Ann. Math. Stat. 35, 261–269 (1964).

    Article  MathSciNet  Google Scholar 

  31. Alaggan, M., Gambs, S. & Kermarrec, A.-M. Heterogeneous differential privacy. Journal of Privacy and Confidentiality, 7(2) (2016).

  32. McInnes, L. et al. UMAP: Uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).

    Article  Google Scholar 

  33. Pelikan, M. Bayesian Optimization Algorithm. In Hierarchical Bayesian Optimization Algorithm Vol. 170, 31–48 (Springer, Berlin, Heidelberg, 2005).

  34. Wu, C. et al. A federated graph neural network framework for privacy-preserving personalization. Nat. Commun. 13, 3091 (2022).

    Article  Google Scholar 

  35. Liu, J., Lou, J., Xiong, L., Liu, J. & Meng, X. Projected federated averaging with heterogeneous differential privacy. Proc. VLDB Endow. 15, 828–840 (2021).

    Article  Google Scholar 

  36. Wang, L. et al. Enhancing federated learning with in-cloud unlabeled data. In Proc. IEEE 38th International Conference on Data Engineering (ICDE) 136–149 (IEEE, 2022).

  37. Lin, T. et al. Ensemble distillation for robust model fusion in federated learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020) https://proceedings.neurips.cc/paper/2020/file/18df51b97ccd68128e994804f3eccc87-Paper.pdf (2020).

  38. Li, Q. et al. Practical one-shot federated learning for cross-silo setting. Preprint at https://arxiv.org/abs/2010.01017 (2020).

  39. Shao, J., Wu, F. & Zhang, J. Selective knowledge sharing for privacy-preserving federated distillation without a good teacher. Nat. Commun. 15, 349 (2024).

    Article  Google Scholar 

  40. Park, J. et al. Sageflow: robust federated learning against both stragglers and adversaries. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021) https://proceedings.neurips.cc/paper/2021/file/076a8133735eb5d7552dc195b125a454-Paper.pdf (2021).

  41. Xie, C. et al. Zeno++: Robust fully asynchronous SGD. In Proceedings of the 37th International Conference on Machine Learning (eds III, H. D. & Singh, A.) 10495–10503 (PMLR, 2020).

  42. Huang, K., Xiao, C., Hoang, T. N., Glass, L. M. & Sun, J. CASTER: predicting drug interactions with chemical substructure representation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) 702–709 (2020).

  43. Li, Y. et al. An adaptive graph learning method for automated molecular interactions and properties predictions. Nat. Mach. Intell. 4, 645–651 (2022).

    Article  Google Scholar 

  44. Zeng, X. et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat. Mach. Intell. 4, 1004–1016 (2022).

    Article  Google Scholar 

  45. Zhang, X., Kang, Y., Chen, K., Fan, L. & Yang, Q. Trading off privacy, utility and efficiency in federated learning. In ACM Trans. Intell. Syst. Technol. 14, 1–32 (2023).

  46. Cai, H., Zhang, H., Zhao, D., Wu, J. & Wang, L. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief. Bioinform. 23, bbac408 (2022).

    Article  Google Scholar 

  47. Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).

    Article  Google Scholar 

  48. Boiko, D. A. et al. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

    Article  Google Scholar 

  49. Mirza, A. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 17, 1027–1034 (2025).

  50. McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).

  51. Farayola, O. A. et al. Data privacy and security in it: a review of techniques and challenges. Comput. Sci. IT Res. J. 5, 606–615 (2024).

    Article  Google Scholar 

  52. Weber, R. H. Internet of things—new security and privacy challenges. Comput. Law Secur. Rev. 26, 23–30 (2010).

    Article  Google Scholar 

  53. Smith, V., Chiang, C.-K., Sanjabi, M. & Talwalkar, A. S. Federated multi-task learning. In Advances in Neural Information Processing Systems 30 (NIPS 2017) https://papers.nips.cc/paper_files/paper/2017/file/6211080fa89981f66b1a0c9d55c61d0f-Paper.pdf (2017).

  54. Liu, L. et al. GEM-2: next generation molecular property prediction network by modeling full-range many-body interactions. Preprint at http://arxiv.org/abs/2208.05863 (2022).

  55. Hussain, M. S., Zaki, M. J. & Subramanian, D. Triplet interaction improves graph transformers: accurate molecular graph learning with triplet graph transformers. Preprint at http://arxiv.org/abs/2402.04538 (2024).

  56. Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).

    Article  Google Scholar 

  57. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  Google Scholar 

  58. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962).

    Article  MathSciNet  Google Scholar 

  59. Li, P. et al. TrimNet: learning molecular representation from triplet messages for biomedicine. Brief. Bioinform. 22, bbaa266 (2021).

    Article  Google Scholar 

  60. Gao, W., Tang, Z., Zhao, J. & Chelikowsky, J. R. Efficient full-frequency GW calculations using a Lanczos method. Phys. Rev. Lett. 132, 126402 (2024).

    Article  MathSciNet  Google Scholar 

  61. Ma, W., Lou, Q., Kazemi, A., Faraone, J. & Afzal, T. Super efficient neural network for compression artifacts reduction and super resolution. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 460–468 (2024).

  62. Wang, S., Zhang, Z. & Zhang, T. Improved analyses of the randomized power method and block Lanczos method. Preprint at https://arxiv.org/pdf/1508.06429 (2015).

  63. Bergstra, J., Yamins, D. & Cox, D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 115–123 (PMLR, 2013).

  64. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).

    Article  Google Scholar 

  65. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf (2019).

  66. Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. Preprint at https://arxiv.org/abs/1903.02428 (2019).

  67. Zhang, L. et al. A federated graph learning method to realize multi-party collaboration for molecular discovery. Zenodo https://doi.org/10.5281/zenodo.16872722 (2025).

Download references

Acknowledgements

This work was supported by the China Ministry of Science and Technology (2020YFA0710203), the Joint Funds of the National Natural Science Foundation of China (U23A2081), the National Natural Science Foundation of China (22201271, 92261105, 22025304, 22033007 and 22221003), the Anhui Provincial Natural Science Foundation (2108085UD06 and 2208085UD04), the Anhui Provincial Key Research and Development Project (2023z04020010 and 2022a05020053), USTC Research Funds of the Double First-Class Initiative (YD2060002029 and YD2060006005), the Joint Funds from Hefei National Synchrotron Radiation Laboratory (KY2060000180 and KY2060000195) and the Fundamental Research Funds for the Central Universities (WK2060000088). The AI-driven experiments, simulations and model training were performed on the robotic AI-Scientist platform of the Chinese Academy of Science.

Author information

Authors and Affiliations

Authors

Contributions

Y. Wu, J.J., K.C. and Y.Z. conceived the study and supervised the research. L.Z. designed and performed the computational framework, carrying out benchmarks and case studies with assistance of J.Z., R.H., Y. Wang and L.L. L.Z. wrote the initial draft of the paper with assistance from Y.Z. Y. Wu, Y.Z., K.C and L.Z. contributed major revisions. All authors discussed the results and provided feedback on the paper.

Corresponding authors

Correspondence to Yanyong Zhang, Kong Chen, Jun Jiang or Yuen Wu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Thierry Hanser, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Data flow of training/validation/testing splitting for simulated private institutional and open-access molecular databases.

The whole data partitioning process is consistent across 18 benchmark datasets.

Extended Data Fig. 2 Data distribution of multiple local models in BBBP dataset.

a, Data distribution of random seed 1234. b, Data distribution of random seed 4567. c, Data distribution of random seed 7890. PLM1: private local model 1. PLM2: private local model 2. PLM3: private local model 3. OLM: open-access local model.

Extended Data Fig. 3 Data distribution of multiple local models in FEN1 dataset.

a, Data distribution of random seed 1234. b, Data distribution of random seed 4567. c, Data distribution of random seed 7890. PLM1: private local model 1. PLM2: private local model 2. PLM3: private local model 3. OLM: open-access local model.

Extended Data Fig. 4 Data distribution of multiple local models in CoCrystal dataset.

a, Data distribution of random seed 1234. b, Data distribution of random seed 4567. c, Data distribution of random seed 7890. PLM1: private local model 1. PLM2: private local model 2. PLM3: private local model 3. OLM: open-access local model. Data distribution is partitioned by the first molecule in each pair of molecules (See Methods).

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–7, Notes 1–14, Tables 1–32 and References.

Reporting Summary (download PDF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Zhang, J., Huang, R. et al. A federated graph learning method to realize multi-party collaboration for molecular discovery. Nat Mach Intell 8, 246–256 (2026). https://doi.org/10.1038/s42256-026-01184-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-026-01184-1

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research