Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Periodicity-aware deep learning for polymers

A preprint version of the article is available at ChemRxiv.

Abstract

Deep learning has revolutionized chemical research by accelerating the discovery and understanding of complex chemical systems. However, polymer chemistry lacks a unified deep learning framework owing to the complexity of polymer structures. Existing self-supervised learning methods simplify polymers into repeating units and neglect their inherent periodicity, thereby limiting the models’ ability to generalize across tasks. To address this, we propose a periodicity-aware deep learning framework for polymers, PerioGT. In pre-training, a chemical knowledge-driven periodicity prior is constructed and incorporated into the model through contrastive learning. Then, periodicity prompts are learned in fine-tuning based on the prior. Additionally, a graph augmentation strategy is employed, which integrates additional conditions via virtual nodes to model complex chemical interactions. PerioGT achieves state-of-the-art performance on 16 downstream tasks. Wet-lab experiments highlight PerioGT’s potential in the real world, identifying two polymers with potent antimicrobial properties. Our results demonstrate that introducing the periodicity prior effectively enhances model performance.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of PerioGT.
Fig. 2: PA-level similarity analysis.
Fig. 3: Instance-level similarity analysis.
Fig. 4: Latent space analysis.

Similar content being viewed by others

Data availability

The pre-training dataset is available via GitHub at https://github.com/RUIMINMA1996/PI1M (ref. 25). The Tg, Tm and ρ datasets can be downloaded from ref. 51, and the Egc, Egb, Eat, Ei, Eea, nc and ε0 datasets are available via The Georgia Institute of Technology at https://khazana.gatech.edu/dataset/ (ref. 68). The EA and IP datasets are available via GitHub at https://github.com/coleygroup/polymer-chemprop-data (ref. 30). The OPV dataset is available via ACS at https://pubs.acs.org/doi/suppl/10.1021/acs.jpclett.8b00635/suppl_file/jz8b00635_si_002.txt (ref. 69). The PE dataset is available via UC Santa Barbara at https://pedatamine.org/ (ref. 70). The MAR1 and MAR2 datasets are available via Zenodo at https://doi.org/10.5281/zenodo.17035498 (ref. 71). These data are available via GitHub at https://github.com/wuyuhui-zju/PerioGT. The pre-trained, fine-tuned checkpoints and the processed datasets are available via Zenodo at https://doi.org/10.5281/zenodo.17035498 (ref. 71). Source data are provided with this paper.

Code availability

The code of PerioGT is available via GitHub at https://github.com/wuyuhui-zju/PerioGT and via Zenodo at https://doi.org/10.5281/zenodo.17035498 (ref. 71).

References

  1. Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).

    Article  Google Scholar 

  2. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Article  Google Scholar 

  3. Zhang, H. et al. Algorithm for optimized mRNA design improves stability and immunogenicity. Nature 621, 396–403 (2023).

    Article  Google Scholar 

  4. Rinehart, N. l. A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C–N couplings. Science 381, 965–972 (2023).

    Article  Google Scholar 

  5. Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).

    Article  Google Scholar 

  6. Patra, T. K. Data-driven methods for accelerating polymer design. ACS Polym. Au 2, 8–26 (2021).

    Article  Google Scholar 

  7. Martin, T. B. & Audus, D. J. Emerging trends in machine learning: a polymer perspective. ACS Polym. Au 3, 239–258 (2023).

    Article  Google Scholar 

  8. Struble, D. C., Lamb, B. G. & Ma, B. A prospective on machine learning challenges, progress, and potential in polymer science. MRS Commun. 14, 752–770 (2024).

    Article  Google Scholar 

  9. Ge, W., De Silva, R., Fan, Y., Sisson, S. A. & Stenzel, M. H. Machine learning in polymer research. Adv. Mater. 37, 2413695 (2025).

    Article  Google Scholar 

  10. Yang, J., Tao, L., He, J., McCutcheon, J. R. & Li, Y. Machine learning enables interpretable discovery of innovative polymers for gas separation membranes. Sci. Adv. 8, eabn9545 (2022).

    Article  Google Scholar 

  11. Tao, L., Varshney, V. & Li, Y. Benchmarking machine learning models for polymer informatics: an example of glass transition temperature. J. Chem. Inf. Model. 61, 5395–5413 (2021).

    Article  Google Scholar 

  12. Arora, A. et al. Random forest predictor for diblock copolymer phase behavior. ACS Macro Lett. 10, 1339–1345 (2021).

    Article  Google Scholar 

  13. Tao, L. et al. Discovery of multi-functional polyimides through high-throughput screening using explainable machine learning. Chem. Eng. J. 465, 142949 (2023).

    Article  Google Scholar 

  14. Li, H. et al. Machine learning-accelerated discovery of heat-resistant polysulfates for electrostatic energy storage. Nat. Energy 10, 90–100 (2025).

    Article  Google Scholar 

  15. Sun, W. et al. Machine learning–assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials. Sci. Adv. 5, eaay4275 (2019).

    Article  Google Scholar 

  16. Meenakshisundaram, V., Hung, J.-H., Patra, T. K. & Simmons, D. S. Designing sequence-specific copolymer compatibilizers using a molecular-dynamics-simulation-based genetic algorithm. Macromolecules 50, 1155–1166 (2017).

    Article  Google Scholar 

  17. Kim, C., Chandrasekaran, A., Huan, T. D., Das, D. & Ramprasad, R. Polymer genome: a data-powered polymer informatics platform for property predictions. J. Phys. Chem. C. 122, 17575–17585 (2018).

    Article  Google Scholar 

  18. Gong, D. et al. Machine learning guided structure function predictions enable in silico nanoparticle screening for polymeric gene delivery. Acta Biomater. 154, 349–358 (2022).

    Article  Google Scholar 

  19. Patel, R. A., Borca, C. H. & Webb, M. A. Featurization strategies for polymer sequence or composition design by machine learning. Mol. Syst. Des. Eng. 7, 661–676 (2022).

    Article  Google Scholar 

  20. Tamasi, M. J. et al. Machine learning on a robotic platform for the design of polymer-protein hybrids. Adv. Mater. 34, 12 (2022).

    Google Scholar 

  21. Zhang, X. Y. et al. Polymer-unit fingerprint (PUFp): an accessible expression of polymer organic semiconductors for machine learning. ACS Appl. Mater. Interfaces 15, 21537–21548 (2023).

    Article  Google Scholar 

  22. Tropsha, A., Isayev, O., Varnek, A., Schneider, G. & Cherkasov, A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat. Rev. Drug Discov. 23, 141–155 (2024).

    Article  Google Scholar 

  23. Webb, M. A., Jackson, N. E., Gil, P. S. & de Pablo, J. J. Targeted sequence design within the coarse-grained polymer genome. Sci. Adv. 6, eabc6216 (2020).

    Article  Google Scholar 

  24. Tao, L., Byrnes, J., Varshney, V. & Li, Y. Machine learning strategies for the structure-property relationship of copolymers. iScience 25, 104585 (2022).

    Article  Google Scholar 

  25. Ma, R. & Luo, T. PI1M: a benchmark database for polymer informatics. J. Chem. Inf. Model. 60, 4684–4690 (2020).

    Article  Google Scholar 

  26. Miccio, L. A. & Schwartz, G. A. From chemical structure to quantitative polymer properties prediction through convolutional neural networks. Polymer 193, 122341 (2020).

    Article  Google Scholar 

  27. Yan, C., Feng, X. M., Wick, C., Peters, A. & Li, G. Q. Machine learning assisted discovery of new thermoset shape memory polymers based on a small training dataset. Polymer 214, 12 (2021).

    Article  Google Scholar 

  28. Zang, X., Zhao, X. & Tang, B. Hierarchical molecular graph self-supervised learning for property prediction. Commun. Chem. 6, 34 (2023).

    Article  Google Scholar 

  29. Antoniuk, E. R., Li, P., Kailkhura, B. & Hiszpanski, A. M. Representing polymers as periodic graphs with learned descriptors for accurate polymer property predictions. J. Chem. Inf. Model. 62, 5435–5445 (2022).

    Article  Google Scholar 

  30. Aldeghi, M. & Coley, C. W. A graph representation of molecular ensembles for polymer property prediction. Chem. Sci. 13, 10486–10498 (2022).

    Article  Google Scholar 

  31. Zhang, S. et al. Deep learning-assisted design of novel donor–acceptor combinations for organic photovoltaic materials with enhanced efficiency. Adv. Mater. 37, 2407613 (2025).

    Article  Google Scholar 

  32. Gurnani, R. et al. AI-assisted discovery of high-temperature dielectrics for energy storage. Nat. Commun. 15, 6107 (2024).

    Article  Google Scholar 

  33. Park, J. et al. Prediction and interpretation of polymer properties using the graph convolutional network. ACS Polym. Au 2, 213–222 (2022).

    Article  Google Scholar 

  34. Chen, D. et al. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proc. AAAI Conference on Artificial Intelligence 3438–3445 (AAAI Press, 2020).

  35. Wang, Y., Wang, J., Cao, Z. & Barati Farimani, A. Molecular contrastive learning of representations via graph neural networks. Nat. Mach. Intell. 4, 279–287 (2022).

    Article  Google Scholar 

  36. Zemin, L. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  MathSciNet  Google Scholar 

  37. Qiang, B. et al. Bridging the gap between chemical reaction pretraining and conditional molecule generation with a unified model. Nat. Mach. Intell. 5, 1476–1485 (2023).

    Article  Google Scholar 

  38. Kuenneth, C. & Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun. 14, 4099 (2023).

    Article  Google Scholar 

  39. Xu, C., Wang, Y. & Barati Farimani, A. TransPolymer: a transformer-based language model for polymer property predictions. npj Comput. Mater. 9, 64 (2023).

    Article  Google Scholar 

  40. Lin, T.-S. et al. BigSMILES: a structurally-based line notation for describing macromolecules. ACS Cent. Sci. 5, 1523–1531 (2019).

    Article  Google Scholar 

  41. Lin, T. S., Rebello, N. J., Lee, G. H., Morris, M. A. & Olsen, B. D. Canonicalizing BigSMILES for polymers with defined backbones. ACS Polym. Au 2, 486–500 (2022).

    Article  Google Scholar 

  42. Schneider, L., Walsh, D., Olsen, B. & de Pablo, J. Generative BigSMILES: an extension for polymer informatics, computer simulations & ML/AI. Digit. Discov. 3, 51–61 (2024).

    Article  Google Scholar 

  43. Luo, Y. et al. Masked graph modeling with multi-view contrast. In Proc. 40th International Conference on Data Engineering 2584–2597 (IEEE, 2024).

  44. Tan, H., Lei, J., Wolf, T. & Bansal, M. Vimpac: video pre-training via masked token prediction and contrastive learning. Preprint at https://www.arxiv.org/abs/2106.11250 (2021).

  45. Chaitanya, K., Erdil, E., Karani, N. & Konukoglu, E. Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv. Neural Inf. Process. Syst. 33, 12546–12558 (2020).

    Google Scholar 

  46. Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).

    Article  Google Scholar 

  47. RDKit: open-source cheminformatics (RDKit, 2021); http://www.rdkit.org

  48. Pengfei, L. et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55, 1–35 (2021).

    Google Scholar 

  49. Taoran, F., Yunchao, Z., Yang, Y., Chunping, W. & Lei, C. Universal prompt tuning for graph neural networks. Adv. Neural Inf. Process. Syst. 36, 52464–52489 (2023).

    Google Scholar 

  50. Fang, Y. et al. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nat. Mach. Intell. 5, 542–553 (2023).

    Article  Google Scholar 

  51. Liu, G., Zhao, T., Xu, J., Luo, T. & Jiang, M. Graph rationalization with environment-based augmentations. In Proc. 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 1069–1078 (ACM, 2022).

  52. Wang, T. & Isola, P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proc. 37th International Conference on Machine Learning 9929–9939 (PMLR, 2020).

  53. Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    Article  Google Scholar 

  54. Mookherjee, N., Anderson, M. A., Haagsman, H. P. & Davidson, D. J. Antimicrobial host defence peptides: functions and clinical potential. Nat. Rev. Drug Discov. 19, 311–332 (2020).

    Article  Google Scholar 

  55. Huang, J. et al. Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences. Nat. Biomed. Eng. 7, 797–810 (2023).

    Article  Google Scholar 

  56. Shabani, S. et al. Synthetic peptide branched polymers for antibacterial and biomedical applications. Nat. Rev. Bioeng. 2, 343–361 (2024).

    Article  Google Scholar 

  57. Zhou, M. et al. A dual-targeting antifungal is effective against multidrug-resistant human fungal pathogens. Nat. Microbiol. 9, 1325–1339 (2024).

    Article  Google Scholar 

  58. Phuong, P. T. et al. Effect of hydrophobic groups on antimicrobial and hemolytic activity: developing a predictive tool for ternary antimicrobial polymers. Biomacromolecules 21, 5241–5255 (2020).

    Article  Google Scholar 

  59. Furka, Á. Forty years of combinatorial technology. Drug Discov. Today 27, 103308 (2022).

    Article  Google Scholar 

  60. Bai, P., Liu, X. & Lu, H. Geometry-aware line graph transformer pre-training for molecular property prediction. Preprint at https://www.arxiv.org/abs/2309.00483 (2023).

  61. Li, H., Zhao, D. & Zeng, J. KPGT: knowledge-guided pre-training of graph transformer for molecular property prediction. In Proc. 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 857–867 (ACM, 2022).

  62. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human language Technologies, Volume 1 (Long and Short Papers) 4171–4186 (ACL, 2019).

  63. He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9729–9738 (IEEE, 2020).

  64. Ying, C. et al. Do Transformers really perform bad for graph representation? Adv. Neural Inf. Process. Syst. 34, 28877–28888 (2021).

    Google Scholar 

  65. Rampášek, L. et al. Recipe for a general, powerful, scalable graph transformer. Adv. Neural Inf. Process. Syst. 35, 14501–14515 (2022).

    Google Scholar 

  66. Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).

    Google Scholar 

  67. Corso, G., Cavalleri, L., Beaini, D., Liò, P. & Veličković, P. Principal neighbourhood aggregation for graph nets. Adv. Neural Inf. Process. Syst. 33, 13260–13271 (2020).

    Google Scholar 

  68. Kuenneth, C. et al. Polymer informatics with multi-task learning. Patterns 2, 100238 (2021).

    Article  Google Scholar 

  69. Nagasawa, S., Al-Naamani, E. & Saeki, A. Computer-aided screening of conjugated polymers for organic solar cell: classification by random forest. J. Phys. Chem. Lett. 9, 2639–2646 (2018).

    Article  Google Scholar 

  70. Schauser, N. S., Kliegle, G. A., Cooke, P., Segalman, R. A. & Seshadri, R. Database creation, visualization, and statistical learning for polymer Li+-electrolyte design. Chem. Mater. 33, 4863–4876 (2021).

    Article  Google Scholar 

  71. Wu, Y. Datasets and checkpoints for PerioGT. Zenodo https://doi.org/10.5281/zenodo.17035498 (2025).

  72. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning 1263–1272 (PMLR, 2017).

  73. Veličković, P. et al. Graph attention networks. In Proc. 6th International Conference on Learning Representations (ICLR, 2018).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant no. 52293381), the Zhejiang Provincial Natural Science Foundation of China (grant no. LR25E030001) and the National Key Research and Development Program of China (grant no. 2022YFB3807300). This work was also supported by the Transvascular Implantation Devices Research Institute China under grant nos. KY012024007 and KY012024009.

Author information

Authors and Affiliations

Authors

Contributions

Y.W. conceived the main idea and conducted the in silico experiments. C.W. was responsible for chemical synthesis and biological characterization. Y.W. and C.W. wrote the paper together. X.S. and T.Z. participated in discussions and provided many suggestions for the wet-lab experiments. P.Z. and J.J. guided the whole project. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Peng Zhang or Jian Ji.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Jacob Gissinger and Boran Ma for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Application of PerioGT in antimicrobial polymer discovery.

a, Pairwise combinations of diacrylates (pink) and amines (blue) generated polymers via Michael addition. A subset of 150 polymers was synthesized and labeled for antimicrobial activity. PerioGT was fine-tuned to predict unlabeled products, and the top 30 candidates with highest predicted activity were selected for validation. b, Distribution of MIC in training set and top 30 candidates by each model (n = 3 biologically independent replicates). The contour shows the kernel density estimation, with a white dot for the median, a thick bar for the interquartile range, and thin lines for the 95% confidence intervals. Lower MIC indicates better antimicrobial activity. The part above the dashed line indicates no antimicrobial activity (MIC > 64 μg ml−1). c, The two polymers with the lowest MIC (8 μg ml−1) predicted by PerioGT and evaluated by wet-lab experiments. d, Live/dead staining assay. The assay was performed using N01 to label live cells (green) and PI (propidium iodide) to label dead cells (red). Scale bar: 20 μm. e, MRSA colony counting after incubation with polymer 1 and 2 for 9 h. Data are presented as mean values ± s.d., n = 3 biologically independent replicates. Lower values indicate fewer surviving bacterial colonies. f, Membrane potential disturbance induced by different treatments. Phosphate-buffered saline (PBS) and Triton X-100 (TX) were included as negative and positive controls, respectively. g, TEM characterization of MRSA after incubation with 1 and 2 for 5 min. Scale bar: 200 nm. Panel a created with BioRender.com.

Source data

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes 1–3, Figs. 1–6 and Tables 1–11.

Peer Review File (download PDF )

Supplementary Data 1 (download XLSX )

The polymer libraries and building blocks constructed in the case study.

Source data

Source Data Fig. 2 (download XLSX )

Source data for Fig. 2.

Source Data Fig. 3 (download XLSX )

Source data for Fig. 3.

Source Data Fig. 4 (download XLSX )

Source data for Fig. 4.

Source Data Extended Data Fig. 1 (download ZIP )

Source data for Extended Data Fig. 1.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Wang, C., Shen, X. et al. Periodicity-aware deep learning for polymers. Nat Comput Sci 5, 1214–1226 (2025). https://doi.org/10.1038/s43588-025-00903-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43588-025-00903-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing