A federated graph learning method to realize multi-party collaboration for molecular discovery

Zhang, Liang; Zhang, Juan; Huang, Rui; Wang, Yiwen; Liu, Linjing; Zhang, Yanyong; Chen, Kong; Jiang, Jun; Wu, Yuen

doi:10.1038/s42256-026-01184-1

Article
Published: 10 February 2026

A federated graph learning method to realize multi-party collaboration for molecular discovery

Nature Machine Intelligence volume 8, pages 246–256 (2026) Cite this article

3092 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Optimizing molecular resource utilization for molecular discovery requires collaborative efforts across research institutions and organizations to accelerate progress. However, given the high research value of both successful and unsuccessful molecules produced by each institution (or organization), these findings are typically kept highly private and confidential until formal publication or commercialization, with even failed molecules rarely disclosed. This confidentiality requirement presents a great challenge for most existing methods when collaboratively handling molecular data with heterogeneous distributions under stringent privacy constraints. Here we propose FedLG (federated learning Lanczos graph), a federated graph learning method that leverages the Lanczos algorithm to facilitate collaborative model training across multiple parties, achieving reliable prediction performance under strict privacy protection conditions. Compared with various existing federate learning methods, FedLG exhibits excellent model performance on 18 benchmark datasets in a simulated federated learning environment. Under different privacy-preserving mechanism settings, FedLG demonstrates robust performance and resistance to noise. Leave-one-client-out experiments and comparison tests across each simulated institution show that FedLG achieves improved heterogeneous data aggregation capabilities and more promising outcomes than localized training. In addition, we incorporate Bayesian optimization into FedLG to show its scalability and further stabilize model performance. Overall, FedLG can be considered an effective method to realize multi-party collaboration while ensuring that sensitive molecular information is protected from potential leakage.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Diagram showing the process flow of the FedLG method.**

**Fig. 3: Comparison of effectiveness, performance and computational efficiency of the FedLG method using the MPNN model.**

Attention-based functional-group coarse-graining: a deep learning framework for molecular prediction and design

Article Open access 21 November 2025

Selective knowledge sharing for privacy-preserving federated distillation without a good teacher

Article Open access 08 January 2024

Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting

Article Open access 26 February 2024

Data availability

All datasets used in this study for method construction are freely available. The MoleculeNet (BBBP, BACE, SIDER, Tox21, ToxCast, ESOL, Lipo and FreeSolv) datasets are available at http://moleculenet.ai/datasets-1. The LIT-PCBA (ALDH1, FEN1, GBA, KAT2A, MAPK1, PKM2 and VDR) datasets are available at https://github.com/idrugLab/FP-GNN/blob/main/Data.rar. The DrugBank and BIOSNAP datasets are available at https://github.com/kexinhuang12345/CASTER/tree/master/DDE/data. The CoCrystal dataset is available at https://github.com/Saoge123/ccgnet/tree/main/data.

Code availability

All source codes for this study are available via GitHub at https://github.com/Turningl/FedLG and Zenodo at https://doi.org/10.5281/zenodo.16872722 (ref. ⁶⁷).

References

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).
Article Google Scholar
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).
Article Google Scholar
Hartono, N. T. P. et al. How machine learning can help select capping layers to suppress perovskite degradation. Nat. Commun. 11, 4172 (2020).
Article Google Scholar
Jiang, Y. et al. Coupling complementary strategy to flexible graph neural network for quick discovery of coformer in diverse co-crystal materials. Nat. Commun. 12, 5950 (2021).
Article Google Scholar
Cao, Y. et al. Perovskite light-emitting diodes based on spontaneously formed submicrometre-scale structures. Nature 562, 249–253 (2018).
Article Google Scholar
Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).
Article Google Scholar
Müller, S. Small-molecule-mediated G-quadruplex isolation from human cells. Nat. Chem. 2, 1095–1098 (2010).
Article Google Scholar
Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).
Article Google Scholar
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Article Google Scholar
Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).
Article Google Scholar
Wishart, D. S. et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36, D901–D906 (2008).
Article Google Scholar
Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).
Article Google Scholar
Tan, L. et al. Tackling assay interference associated with small molecules. Nat. Rev. Chem. 8, 319–339 (2024).
Article Google Scholar
Durant, G. et al. The future of machine learning for small-molecule drug discovery will be driven by data. Nat. Comput. Sci. 4, 735–743 (2024).
Article Google Scholar
Yang, Q., Liu, Y., Chen, T. & Tong, Y. Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10, 1–19 (2019).
Google Scholar
Zhu, W. et al. Federated learning of molecular properties with graph neural networks in a heterogeneous setting. Patterns 3, 100521 (2022).
Article Google Scholar
Xiong, Z. et al. Facing small and biased data dilemma in drug discovery with enhanced federated learning approaches. Sci. China Life Sci. 65, 529–539 (2022).
Article Google Scholar
Heyndrickx, W. et al. MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2024).
Article Google Scholar
Gilmer, J. et al. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).
Veličković, P. et al. Graph attention networks. Preprint at https://arxiv.org/abs/1710.10903 (2018).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations https://openreview.net/pdf?id=SJU4ayYgl (ICLR, 2017).
Ning, Y. et al. GFedKRL: graph federated knowledge re-learning for effective molecular property prediction via privacy protection. In International Conference on Artificial Neural Networks. 426–438 (Springer, 2023).
Cao, X., Jia, J., Zhang, Z. & Gong, N. Z. FedRecover: recovering from poisoning attacks in federated learning using historical information. In Proc. 2023 IEEE Symposium on Security and Privacy (SP) 1366–1383 (IEEE, 2023).
Gupta, S. et al. Recovering private text in federated learning of language models. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022) https://papers.neurips.cc/paper_files/paper/2022/file/35b5c175e139bff5f22a5361270fce87-Paper-Conference.pdf (2022).
Zhang, K. et al. Flip: a provable defense framework for backdoor mitigation in federated learning. In International Conference on Learning Representations https://openreview.net/pdf?id=Xo2E217_M4n (ICLR, 2022).
Chen, J. et al. FederEI: federated library matching framework for electron ionization mass spectrum based compound identification. Anal. Chem. 96, 15840–15845 (2024).
Article Google Scholar
Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl Bur. Stand. 45, 255 (1950).
Article MathSciNet Google Scholar
Liao, R., Zhao, Z., Urtasun, R. & Zemel, R. S. Lanczosnet: multi-scale deep graph convolutional networks. In International Conference on Learning Representations (ICLR, 2019).
Olkin, I. & Rubin, H. Multivariate beta distributions and independence properties of the Wishart distribution. Ann. Math. Stat. 35, 261–269 (1964).
Article MathSciNet Google Scholar
Alaggan, M., Gambs, S. & Kermarrec, A.-M. Heterogeneous differential privacy. Journal of Privacy and Confidentiality, 7(2) (2016).
McInnes, L. et al. UMAP: Uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Article Google Scholar
Pelikan, M. Bayesian Optimization Algorithm. In Hierarchical Bayesian Optimization Algorithm Vol. 170, 31–48 (Springer, Berlin, Heidelberg, 2005).
Wu, C. et al. A federated graph neural network framework for privacy-preserving personalization. Nat. Commun. 13, 3091 (2022).
Article Google Scholar
Liu, J., Lou, J., Xiong, L., Liu, J. & Meng, X. Projected federated averaging with heterogeneous differential privacy. Proc. VLDB Endow. 15, 828–840 (2021).
Article Google Scholar
Wang, L. et al. Enhancing federated learning with in-cloud unlabeled data. In Proc. IEEE 38th International Conference on Data Engineering (ICDE) 136–149 (IEEE, 2022).
Lin, T. et al. Ensemble distillation for robust model fusion in federated learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020) https://proceedings.neurips.cc/paper/2020/file/18df51b97ccd68128e994804f3eccc87-Paper.pdf (2020).
Li, Q. et al. Practical one-shot federated learning for cross-silo setting. Preprint at https://arxiv.org/abs/2010.01017 (2020).
Shao, J., Wu, F. & Zhang, J. Selective knowledge sharing for privacy-preserving federated distillation without a good teacher. Nat. Commun. 15, 349 (2024).
Article Google Scholar
Park, J. et al. Sageflow: robust federated learning against both stragglers and adversaries. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021) https://proceedings.neurips.cc/paper/2021/file/076a8133735eb5d7552dc195b125a454-Paper.pdf (2021).
Xie, C. et al. Zeno++: Robust fully asynchronous SGD. In Proceedings of the 37th International Conference on Machine Learning (eds III, H. D. & Singh, A.) 10495–10503 (PMLR, 2020).
Huang, K., Xiao, C., Hoang, T. N., Glass, L. M. & Sun, J. CASTER: predicting drug interactions with chemical substructure representation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) 702–709 (2020).
Li, Y. et al. An adaptive graph learning method for automated molecular interactions and properties predictions. Nat. Mach. Intell. 4, 645–651 (2022).
Article Google Scholar
Zeng, X. et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat. Mach. Intell. 4, 1004–1016 (2022).
Article Google Scholar
Zhang, X., Kang, Y., Chen, K., Fan, L. & Yang, Q. Trading off privacy, utility and efficiency in federated learning. In ACM Trans. Intell. Syst. Technol. 14, 1–32 (2023).
Cai, H., Zhang, H., Zhao, D., Wu, J. & Wang, L. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief. Bioinform. 23, bbac408 (2022).
Article Google Scholar
Hanser, T. Federated learning for molecular discovery. Curr. Opin. Struct. Biol. 79, 102545 (2023).
Article Google Scholar
Boiko, D. A. et al. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Article Google Scholar
Mirza, A. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 17, 1027–1034 (2025).
McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 642, 451–457 (2025).
Farayola, O. A. et al. Data privacy and security in it: a review of techniques and challenges. Comput. Sci. IT Res. J. 5, 606–615 (2024).
Article Google Scholar
Weber, R. H. Internet of things—new security and privacy challenges. Comput. Law Secur. Rev. 26, 23–30 (2010).
Article Google Scholar
Smith, V., Chiang, C.-K., Sanjabi, M. & Talwalkar, A. S. Federated multi-task learning. In Advances in Neural Information Processing Systems 30 (NIPS 2017) https://papers.nips.cc/paper_files/paper/2017/file/6211080fa89981f66b1a0c9d55c61d0f-Paper.pdf (2017).
Liu, L. et al. GEM-2: next generation molecular property prediction network by modeling full-range many-body interactions. Preprint at http://arxiv.org/abs/2208.05863 (2022).
Hussain, M. S., Zaki, M. J. & Subramanian, D. Triplet interaction improves graph transformers: accurate molecular graph learning with triplet graph transformers. Preprint at http://arxiv.org/abs/2402.04538 (2024).
Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).
Article Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Article Google Scholar
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962).
Article MathSciNet Google Scholar
Li, P. et al. TrimNet: learning molecular representation from triplet messages for biomedicine. Brief. Bioinform. 22, bbaa266 (2021).
Article Google Scholar
Gao, W., Tang, Z., Zhao, J. & Chelikowsky, J. R. Efficient full-frequency GW calculations using a Lanczos method. Phys. Rev. Lett. 132, 126402 (2024).
Article MathSciNet Google Scholar
Ma, W., Lou, Q., Kazemi, A., Faraone, J. & Afzal, T. Super efficient neural network for compression artifacts reduction and super resolution. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 460–468 (2024).
Wang, S., Zhang, Z. & Zhang, T. Improved analyses of the randomized power method and block Lanczos method. Preprint at https://arxiv.org/pdf/1508.06429 (2015).
Bergstra, J., Yamins, D. & Cox, D. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 115–123 (PMLR, 2013).
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006).
Article Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf (2019).
Fey, M. & Lenssen, J. E. Fast graph representation learning with PyTorch Geometric. Preprint at https://arxiv.org/abs/1903.02428 (2019).
Zhang, L. et al. A federated graph learning method to realize multi-party collaboration for molecular discovery. Zenodo https://doi.org/10.5281/zenodo.16872722 (2025).

Download references

Acknowledgements

This work was supported by the China Ministry of Science and Technology (2020YFA0710203), the Joint Funds of the National Natural Science Foundation of China (U23A2081), the National Natural Science Foundation of China (22201271, 92261105, 22025304, 22033007 and 22221003), the Anhui Provincial Natural Science Foundation (2108085UD06 and 2208085UD04), the Anhui Provincial Key Research and Development Project (2023z04020010 and 2022a05020053), USTC Research Funds of the Double First-Class Initiative (YD2060002029 and YD2060006005), the Joint Funds from Hefei National Synchrotron Radiation Laboratory (KY2060000180 and KY2060000195) and the Fundamental Research Funds for the Central Universities (WK2060000088). The AI-driven experiments, simulations and model training were performed on the robotic AI-Scientist platform of the Chinese Academy of Science.

Author information

Authors and Affiliations

Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei, China
Liang Zhang, Juan Zhang, Rui Huang, Yiwen Wang, Linjing Liu, Kong Chen, Jun Jiang & Yuen Wu
National Key Laboratory of Deep Space Exploration, Deep Space Exploration Laboratory, Hefei, China
Liang Zhang, Juan Zhang, Rui Huang, Yiwen Wang, Linjing Liu, Kong Chen & Yuen Wu
School of Artificial Intelligence and Data Science, University of Science and Technology of China, Hefei, China
Yanyong Zhang
Hefei National Laboratory, University of Science and Technology of China, Hefei, China
Jun Jiang

Authors

Liang Zhang
View author publications
Search author on:PubMed Google Scholar
Juan Zhang
View author publications
Search author on:PubMed Google Scholar
Rui Huang
View author publications
Search author on:PubMed Google Scholar
Yiwen Wang
View author publications
Search author on:PubMed Google Scholar
Linjing Liu
View author publications
Search author on:PubMed Google Scholar
Yanyong Zhang
View author publications
Search author on:PubMed Google Scholar
Kong Chen
View author publications
Search author on:PubMed Google Scholar
Jun Jiang
View author publications
Search author on:PubMed Google Scholar
Yuen Wu
View author publications
Search author on:PubMed Google Scholar

Contributions

Y. Wu, J.J., K.C. and Y.Z. conceived the study and supervised the research. L.Z. designed and performed the computational framework, carrying out benchmarks and case studies with assistance of J.Z., R.H., Y. Wang and L.L. L.Z. wrote the initial draft of the paper with assistance from Y.Z. Y. Wu, Y.Z., K.C and L.Z. contributed major revisions. All authors discussed the results and provided feedback on the paper.

Corresponding authors

Correspondence to Yanyong Zhang, Kong Chen, Jun Jiang or Yuen Wu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Thierry Hanser, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Data flow of training/validation/testing splitting for simulated private institutional and open-access molecular databases.

The whole data partitioning process is consistent across 18 benchmark datasets.

Extended Data Fig. 2 Data distribution of multiple local models in BBBP dataset.

a, Data distribution of random seed 1234. b, Data distribution of random seed 4567. c, Data distribution of random seed 7890. PLM1: private local model 1. PLM2: private local model 2. PLM3: private local model 3. OLM: open-access local model.

Extended Data Fig. 3 Data distribution of multiple local models in FEN1 dataset.

a, Data distribution of random seed 1234. b, Data distribution of random seed 4567. c, Data distribution of random seed 7890. PLM1: private local model 1. PLM2: private local model 2. PLM3: private local model 3. OLM: open-access local model.

Extended Data Fig. 4 Data distribution of multiple local models in CoCrystal dataset.

a, Data distribution of random seed 1234. b, Data distribution of random seed 4567. c, Data distribution of random seed 7890. PLM1: private local model 1. PLM2: private local model 2. PLM3: private local model 3. OLM: open-access local model. Data distribution is partitioned by the first molecule in each pair of molecules (See Methods).

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–7, Notes 1–14, Tables 1–32 and References.

Reporting Summary (download PDF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, L., Zhang, J., Huang, R. et al. A federated graph learning method to realize multi-party collaboration for molecular discovery. Nat Mach Intell 8, 246–256 (2026). https://doi.org/10.1038/s42256-026-01184-1

Download citation

Received: 29 November 2024
Accepted: 09 January 2026
Published: 10 February 2026
Version of record: 10 February 2026
Issue date: February 2026
DOI: https://doi.org/10.1038/s42256-026-01184-1