Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A neural symbolic model for space physics

A preprint version of the article is available at arXiv.

Abstract

Symbolic regression, a key problem in discovering physics formulas from observational data, faces persistent challenges in scalability and interpretability. We introduce PhyE2E, an AI framework designed to discover physically meaningful symbolic expressions. PhyE2E decomposes the symbolic regression problem into subproblems via second-order neural network derivatives, and employs a transformer architecture to translate data into symbolic formulas in an end-to-end manner. The generated expressions are further refined via Monte Carlo tree search and genetic programming. We leverage a large language model to synthesize extensive expressions resembling real physics, and train the model to recover these formulas directly from data. Comprehensive evaluations demonstrate that PhyE2E outperforms existing state-of-the-art approaches, delivering superior symbolic accuracy, fitting precision and unit consistency. We deployed PhyE2E on five critical applications in space physics. The AI-derived formulas exhibit excellent agreement with empirical data from satellites and astronomical telescopes. We improved NASA’s 1993 formula for solar activity and provided an explicit symbolic explanation of the long-term solar cycle. We also found that the decay of near-Earth plasma pressure is proportional to the square of the distance r from the Earth’s centre, with subsequent mathematical derivations validated by independent satellite observations. Furthermore, we found symbolic formulas relating solar extreme ultraviolet emission lines to temperature, electron density and magnetic-field variations. The formulas obtained are consistent with properties previously hypothesized by physicists.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The overall PhyE2E framework.
Fig. 2: Performance on the synthetic and AI Feynman datasets.
Fig. 3: Performance of sunspot intensity prediction.
Fig. 4: Performance of plasma-sheet pressure prediction and solar differential rotation prediction.
Fig. 5: Performance of contribution function of emission line predictions and lunar tide signal of plasma layer predictions.

Similar content being viewed by others

Data availability

AI Feynman data can be downloaded from the Feynman Symbolic Regression Database (https://space.mit.edu/home/tegmark/aifeynman.html). The formula generated by the LLM, including the training and test datasets, can be downloaded71 from https://figshare.com/articles/dataset/PhyE2E_datas/29615831/1. The SSN data can be downloaded from the SILSO website (https://www.sidc.be/SILSO/datafiles). The plasma pressure data can be downloaded from the Geotail and Time History of Events and Macroscale Interactions during Substorms website (https://themis.igpp.ucla.edu/overview_data.shtml). The solar rotational angular velocity data can be found in the table presented by Snodgrass in ‘Magnetic rotation of the solar photosphere’52. We collected the contribution function of emission lines data from the CHIANTI website (http://www.chiantidatabase.org). Lunar tide signal data were downloaded from the Radiation Belt Storm Probe Electric Field and Waves Instrument official website (https://www.space.umn.edu/rbspefw-data/). Source data are provided with this paper.

Code availability

Codes for running PhyE2E including both training and test modules are accessible at https://github.com/Jie0618/PhysicsRegression with a permanent version available72 via Zenodo at https://doi.org/10.5281/zenodo.16305086. The pretrained PhyE2E model can be downloaded at https://figshare.com/articles/dataset/PhyE2E_datas/29615831/1.

References

  1. Cranmer, M. D. Interpretable Machine Learning for the Physical Sciences. PhD thesis, Princeton Univ. (2023).

  2. Pearce Williams, L. Faraday’s discovery of electromagnetic induction. Contemp. Phys. 5, 28–37 (1963).

    Article  Google Scholar 

  3. Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).

    Article  MathSciNet  Google Scholar 

  4. De Florio, M., Kevrekidis, I. G. & Karniadakis, G. E. AI-Lorenz: a physics-data-driven framework for black-box and gray-box identification of chaotic systems with symbolic regression. Chaos Solitons Fractals 188, 115538 (2024).

    Article  MathSciNet  Google Scholar 

  5. Ahmadi Daryakenari, N., De Florio, M., Shukla, K. & Karniadakis, G. E. AI-Aristotle: a physics-informed framework for systems biology gray-box identification. PLoS Comput. Biol. 20, e1011916 (2024).

    Article  Google Scholar 

  6. Schmidt, M. D. & Lipson, H. Age–fitness Pareto optimization. In Proc. 12th Annual Conference on Genetic and Evolutionary Computation 543–544 (Association for Computing Machinery, 2010).

  7. Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).

    Article  Google Scholar 

  8. La Cava, W., Helmuth, T., Spector, L. & Moore, J. H. A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection. Evol. Comput. 27, 377–402 (2019).

    Article  Google Scholar 

  9. La Cava, W., Singh, T. R., Taggart, J., Suri, S. & Moore, J. H. Learning concise representations for regression by evolving networks of trees. In 7th International Conference on Learning Representations Vol. 6 (ed. Sainath, T.) 3987–4002 (ICLR, 2019).

  10. Virgolin, M., Alderliesten, T., Witteveen, C. & Bosman, P. A. Improving model-based genetic programming for symbolic regression of small expressions. Evol. Comput. 29, 211–237 (2021).

    Article  Google Scholar 

  11. McCormick, T. gplearn: genetic programming in Python. GitHub https://github.com/trevorstephens/gplearn (2019).

  12. De Franca, F. & Aldeia, G. Interaction-transformation evolutionary algorithm for symbolic regression. Evol. Comput. 29, 367–390 (2021).

    Article  Google Scholar 

  13. Arnaldo, I., Krawiec, K. & O’Reilly, U.-M. Multiple regression genetic programming. In Proc. 2014 Annual Conference on Genetic and Evolutionary Computation (ed. Igel, C.) 879–886 (Association for Computing Machinery, 2014).

  14. Kommenda, M., Burlacu, B., Kronberger, G. & Affenzeller, M. Parameter identification for symbolic regression using nonlinear least squares. Genet. Program. Evolvable Mach. 21, 471–501 (2020).

    Article  Google Scholar 

  15. Virgolin, M., Alderliesten, T. & Bosman, P. A. Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In Proc. Genetic and Evolutionary Computation Conference (ed. López-Ibáñez, M.) 1084–1092 (Association for Computing Machinery, 2019).

  16. Kamienny, P.-A., Lample, G., Lamprier, S. & Virgolin, M. Deep generative symbolic regression with Monte-Carlo-tree-search. Proc. Mach. Learn. Res. 202, 15655–15668 (2023).

  17. Lu, Q., Tao, F., Zhou, S. & Wang, Z. Incorporating actor–critic in Monte Carlo tree search for symbolic regression. Neural Comput. Appl. 33, 8495–8511 (2021).

    Article  Google Scholar 

  18. Xu, Y., Liu, Y. & Sun, H. Reinforcement symbolic regression machine. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).

  19. Xie, Y. et al. An efficient and generalizable symbolic regression method for time series analysis. Preprint at http://arxiv.org/abs/2409.03986 (2024).

  20. Sun, F., Liu, Y., Wang, J.-X. & Sun, H. Symbolic physics learner: discovering governing equations via Monte Carlo tree search. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).

  21. Valipour, M., You, B., Panju, M. & Ghodsi, A. SymbolicGPT: a generative transformer model for symbolic regression. Preprint at https://arxiv.org/abs/2106.14131 (2021).

  22. Chen, T., Li, Z., Xu, P. & Zheng, H. Bootstrapping OTS-Funcimg pre-training model (Botfip): a comprehensive multimodal scientific computing framework and its application in symbolic regression task. Complex Intell. Syst. 11, 417 (2025).

  23. Xing, H., Salleb-Aouissi, A. & Verma, N. Automated symbolic law discovery: a computer vision approach. In Proc. AAAI Conference on Artificial Intelligence 660–668 (AAAI, 2021).

  24. Kamienny, P.-A., d’Ascoli, S., Lample, G. & Charton, F. End-to-end symbolic regression with transformers. Adv. Neural Inf. Process. Syst. 35, 10269–10281 (2022).

    Google Scholar 

  25. Biggio, L., Bendinelli, T., Neitz, A., Lucchi, A. & Parascandolo, G. Neural symbolic regression that scales. Proc. Mach. Learn. Res. 139, 936–945 (2021).

  26. Shojaee, P., Meidani, K., Barati Farimani, A. & Reddy, C. Transformer-based planning for symbolic regression. Adv. Neural Inf. Process. Syst. 36, 45907–45919 (2023).

    Google Scholar 

  27. Udrescu, S.-M. & Tegmark, M. AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).

    Article  Google Scholar 

  28. Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582 (2023).

  29. Burlacu, B., Kronberger, G. & Kommenda, M. Operon C++: an efficient genetic programming framework for symbolic regression. In Proc. 2020 Genetic and Evolutionary Computation Conference Companion 1562–1570 (Association for Computing Machinery, 2020).

  30. Grayeli, A., Sehgal, A., Costilla Reyes, O., Cranmer, M. & Chaudhuri, S. Symbolic regression with a learned concept library. Adv. Neural Inf. Process. Syst. 37, 44678–44709 (2024).

    Google Scholar 

  31. Shojaee, P., Meidani, K., Gupta, S., Farimani, A. B. & Reddy, C. K. LLM-SR: scientific equation discovery via programming with large language models. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).

  32. Landajuela, M. et al. A unified framework for deep symbolic regression. Adv. Neural Inf. Process. Syst. 35, 33985–33998 (2022).

    Google Scholar 

  33. Tenachi, W., Ibata, R. & Diakogiannis, F. I. Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws. Astrophys. J. 959, 99 (2023).

    Article  Google Scholar 

  34. Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. Adv. Neural Inf. Process. Syst. 33, 4860–4871 (2020).

    Google Scholar 

  35. Scholl, P., Bieker, K., Hauger, H. & Kutyniok, G. ParFam—(neural guided) symbolic regression via continuous global optimization. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).

  36. Liu, Z. et al. KAN: Kolmogorov–Arnold networks. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).

  37. Liu, Z., Ma, P., Wang, Y., Matusik, W. & Tegmark, M. KAN 2.0: Kolmogorov–Arnold networks meet science. Preprint at https://arxiv.org/abs/2408.10205 (2024).

  38. Jin, Y., Fu, W., Kang, J., Guo, J. & Guo, J. Bayesian symbolic regression. Preprint at https://arxiv.org/abs/1910.08892 (2019).

  39. La Cava, W. et al. Contemporary symbolic regression methods and their relative performance. Adv. Neural Inf. Process. Syst. 2021, 1–16 (2021).

  40. Feynman, R., Leighton, R. & Sands, M. The Feynman Lectures on Physics, Vol. I: The New Millennium Edition: Mainly Mechanics, Radiation, and Heat (Basic Books, 2011).

  41. Feynman, R., Leighton, R. & Sands, M. The Feynman Lectures on Physics, Vol. 2 (Pearson/Addison-Wesley, 1963).

  42. Feynman, R., Leighton, R. & Sands, M. The Feynman Lectures on Physics, Vol. 3 (Pearson/Addison-Wesley, 1963).

  43. SILSO World Data Center. The International Sunspot Number (1749–2023). International Sunspot Number Monthly Bulletin and Online Catalogue (2023).

  44. Hathaway, D. H., Wilson, R. M. & Reichmann, E. J. The shape of the sunspot cycle. Sol. Phys. 151, 177–190 (1994).

    Article  Google Scholar 

  45. Upton, L. A. & Hathaway, D. H. Solar cycle precursors and the outlook for cycle 25. J. Geophys. Res. Space Phys. 128, e2023JA031681 (2023).

  46. Brehm, N. et al. Eleven-year solar cycles over the last millennium revealed by radiocarbon in tree rings. Nat. Geosci. 14, 10–15 (2021).

    Article  Google Scholar 

  47. Wang, C.-P. et al. Empirical modeling of plasma sheet pressure and three-dimensional force-balanced magnetospheric magnetic field structure: 1. Observation. J. Geophys. Res. Space Phys. 118, 6154–6165 (2013).

    Article  Google Scholar 

  48. Yue, C., Wang, C.-P., Zaharia, S. G., Xing, X. & Lyons, L. Empirical modeling of plasma sheet pressure and three-dimensional force-balanced magnetospheric magnetic field structure: 2. Modeling. J. Geophys. Res. Space Phys. 118, 6166–6175 (2013).

    Article  Google Scholar 

  49. Lui, A. T. & Hamilton, D. C. Radial profiles of quiet time magnetospheric parameters. J. Geophys. Res. Space Phys. 97, 19325–19332 (1992).

    Article  Google Scholar 

  50. Hotta, H. & Kusano, K. Solar differential rotation reproduced with high-resolution simulation. Nat. Astron. 5, 1100–1102 (2021).

    Article  Google Scholar 

  51. Vasil, G. M. et al. The solar dynamo begins near the surface. Nature 629, 769–772 (2024).

    Article  Google Scholar 

  52. Snodgrass, H. B. Magnetic rotation of the solar photosphere. Astrophys. J. 270, 288–299 (1983).

  53. Rao, S. et al. Height-dependent differential rotation of the solar atmosphere detected by CHASE. Nat. Astron. 8, 1102–1109 (2024).

  54. Dere, K. P., Landi, E., Mason, H. E., Monsignori Fossi, B. C. & Young, P. R. CHIANTI—an atomic database for emission lines. Astron. Astrophys. Suppl. Ser. 125, 149–173 (1997).

    Article  Google Scholar 

  55. Dufresne, R. P. et al. CHIANTI—an atomic database for emission lines—paper. XVIII. Version 11, advanced ionization equilibrium models: density and charge transfer effects. Astrophys. J. 974, 71 (2024).

  56. Kramida, A., Ralchenko, Y., Reader, J. & NIST ASD Team. NIST Atomic Spectra Database v.5.12 (NIST, accessed 20 October 2024); https://physics.nist.gov/asd

  57. Aschwanden, M. J. Physics of the Solar Corona. An Introduction (Praxis–Springer, 2004).

  58. Mason, H. E. & Monsignori Fossi, B. C. Spectroscopic diagnostics in the VUV for solar and stellar plasmas. Astron. Astrophys. Rev. 6, 123–179 (1994).

    Article  Google Scholar 

  59. Raymond, J. C. & Smith, B. W. Soft X-ray spectrum of a hot plasma. Astrophys. J. Suppl. Ser. 35, 419–439 (1977).

    Article  Google Scholar 

  60. Xiao, C. et al. Evidence for lunar tide effects in Earth’s plasmasphere. Nat. Phys. 19, 486–491 (2023).

    Article  Google Scholar 

  61. RBSP/EFW Data (Univ. Minnesota, accessed 24 September 2025); http://www.space.umn.edu/rbspefw-data/

  62. Zhang, Z., Liu, W. L., Zhang, D. J. & Cao, J. B. Estimating the corotation lag of the plasmasphere based on the electric field measurements of the Van Allen Probes. Adv. Space Res. 73, 758–766 (2024).

    Article  Google Scholar 

  63. Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).

    Article  MathSciNet  Google Scholar 

  64. Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).

  65. Maurya, A., Ye, J., Rafique, M. M., Cappello, F. & Nicolae, B. Deep optimizer states: towards scalable training of transformer models using interleaved offloading. In Proc. 25th International Middleware Conference, 404–416 (Association for Computing Machinery, 2024).

  66. Lample, G. & Charton, F. Deep learning for symbolic mathematics. In International Conference on Learning Representations (ICLR, 2020).

  67. Charton, F. Linear algebra with transformers. Trans. Mach. Learn. Res. (2022).

  68. Bendinelli, T., Biggio, L. & Kamienny, P.-A. Controllable neural symbolic regression. Proc. Mach. Learn. Res. 202, 2063–2077 (2023).

  69. Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).

  70. Fletcher, R. Practical Methods of Optimization 2nd ed. (Wiley, 1987).

  71. Ying, J. PhyE2E_datas. figshare https://figshare.com/articles/dataset/PhyE2E_datas/29615831 (2025).

  72. Ying, J. Jie0618/PhysicsRegression: code for “A neural symbolic model for space physics” version v1.0.0. Zenodo https://doi.org/10.5281/zenodo.16305086 (2025).

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China grants 52494974 and 62377030, China’s Village Science and Technology City Key Technology funding and Wuxi Research Institute of Applied Technologies.

Author information

Authors and Affiliations

Authors

Contributions

S.-T.Y., Y.Z. and J.M. conceived of the research idea. H.L. and Y.L. were responsible for the construction of datasets. J.Y. was responsible for the training and validation for PhyE2E, the comparison with other baseline methods and the analysis of results. C.Y. and Q.S. conceived of the five applications of space physics. J.Y. implemented the code and conducted the experiments. C.Y. was responsible for the applications of SSN prediction and plasma pressure prediction. Y.C. and Q.S. were responsible for the applications of solar rotational angular velocity prediction and contribution function of emission lines. C.X. was responsible for the application of the lunar tide signal of the plasma layer. J.Y., H.L. and C.Y. wrote the initial draft. Y.Z. and J.M. supervised the entire project and made final revisions to the paper.

Corresponding authors

Correspondence to Shing-Tung Yau, Yuan Zhou or Jianzhu Ma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Parshin Shojaee and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance on Synthetic and AI Feynman datasets.

a, An example of D&C procedure, including Divide-and-Conquer and Backward Aggregation step. b, Decomposition accuracy for different D&C strategies. c, Symbolic accuracy and average accuracy of R2 > 0.99 under different D&C strategies. Data are presented as mean values ± SEM (n = 5 individual trials for each configuration). d, The accuracy performance on low data cases with different physical priors incorporated into PhyE2E. Data are presented as mean values ± SEM (n = 5 individual trials for each configuration).

Source data

Supplementary information

Supplementary Information (download PDF )

Supplementary Tables 1–20, Discussion and Figs. 1–4.

Source data

Source Data Fig. 2 (download XLSX )

Statistical source data.

Source Data Fig. 3 (download XLSX )

Source data of SSN with its corresponding predictions.

Source Data Fig. 4 (download XLSX )

Source data of plasma pressure and solar rotational angular velocity with their corresponding predictions.

Source Data Fig. 5 (download XLSX )

Source data of contribution functions and lunar tide signal with their corresponding predictions.

Source Data Extended Data Fig. 1 (download XLSX )

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ying, J., Lin, H., Yue, C. et al. A neural symbolic model for space physics. Nat Mach Intell 7, 1726–1741 (2025). https://doi.org/10.1038/s42256-025-01126-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-025-01126-3

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics