Abstract
Symbolic regression, a key problem in discovering physics formulas from observational data, faces persistent challenges in scalability and interpretability. We introduce PhyE2E, an AI framework designed to discover physically meaningful symbolic expressions. PhyE2E decomposes the symbolic regression problem into subproblems via second-order neural network derivatives, and employs a transformer architecture to translate data into symbolic formulas in an end-to-end manner. The generated expressions are further refined via Monte Carlo tree search and genetic programming. We leverage a large language model to synthesize extensive expressions resembling real physics, and train the model to recover these formulas directly from data. Comprehensive evaluations demonstrate that PhyE2E outperforms existing state-of-the-art approaches, delivering superior symbolic accuracy, fitting precision and unit consistency. We deployed PhyE2E on five critical applications in space physics. The AI-derived formulas exhibit excellent agreement with empirical data from satellites and astronomical telescopes. We improved NASA’s 1993 formula for solar activity and provided an explicit symbolic explanation of the long-term solar cycle. We also found that the decay of near-Earth plasma pressure is proportional to the square of the distance r from the Earth’s centre, with subsequent mathematical derivations validated by independent satellite observations. Furthermore, we found symbolic formulas relating solar extreme ultraviolet emission lines to temperature, electron density and magnetic-field variations. The formulas obtained are consistent with properties previously hypothesized by physicists.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
AI Feynman data can be downloaded from the Feynman Symbolic Regression Database (https://space.mit.edu/home/tegmark/aifeynman.html). The formula generated by the LLM, including the training and test datasets, can be downloaded71 from https://figshare.com/articles/dataset/PhyE2E_datas/29615831/1. The SSN data can be downloaded from the SILSO website (https://www.sidc.be/SILSO/datafiles). The plasma pressure data can be downloaded from the Geotail and Time History of Events and Macroscale Interactions during Substorms website (https://themis.igpp.ucla.edu/overview_data.shtml). The solar rotational angular velocity data can be found in the table presented by Snodgrass in ‘Magnetic rotation of the solar photosphere’52. We collected the contribution function of emission lines data from the CHIANTI website (http://www.chiantidatabase.org). Lunar tide signal data were downloaded from the Radiation Belt Storm Probe Electric Field and Waves Instrument official website (https://www.space.umn.edu/rbspefw-data/). Source data are provided with this paper.
Code availability
Codes for running PhyE2E including both training and test modules are accessible at https://github.com/Jie0618/PhysicsRegression with a permanent version available72 via Zenodo at https://doi.org/10.5281/zenodo.16305086. The pretrained PhyE2E model can be downloaded at https://figshare.com/articles/dataset/PhyE2E_datas/29615831/1.
References
Cranmer, M. D. Interpretable Machine Learning for the Physical Sciences. PhD thesis, Princeton Univ. (2023).
Pearce Williams, L. Faraday’s discovery of electromagnetic induction. Contemp. Phys. 5, 28–37 (1963).
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).
De Florio, M., Kevrekidis, I. G. & Karniadakis, G. E. AI-Lorenz: a physics-data-driven framework for black-box and gray-box identification of chaotic systems with symbolic regression. Chaos Solitons Fractals 188, 115538 (2024).
Ahmadi Daryakenari, N., De Florio, M., Shukla, K. & Karniadakis, G. E. AI-Aristotle: a physics-informed framework for systems biology gray-box identification. PLoS Comput. Biol. 20, e1011916 (2024).
Schmidt, M. D. & Lipson, H. Age–fitness Pareto optimization. In Proc. 12th Annual Conference on Genetic and Evolutionary Computation 543–544 (Association for Computing Machinery, 2010).
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).
La Cava, W., Helmuth, T., Spector, L. & Moore, J. H. A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection. Evol. Comput. 27, 377–402 (2019).
La Cava, W., Singh, T. R., Taggart, J., Suri, S. & Moore, J. H. Learning concise representations for regression by evolving networks of trees. In 7th International Conference on Learning Representations Vol. 6 (ed. Sainath, T.) 3987–4002 (ICLR, 2019).
Virgolin, M., Alderliesten, T., Witteveen, C. & Bosman, P. A. Improving model-based genetic programming for symbolic regression of small expressions. Evol. Comput. 29, 211–237 (2021).
McCormick, T. gplearn: genetic programming in Python. GitHub https://github.com/trevorstephens/gplearn (2019).
De Franca, F. & Aldeia, G. Interaction-transformation evolutionary algorithm for symbolic regression. Evol. Comput. 29, 367–390 (2021).
Arnaldo, I., Krawiec, K. & O’Reilly, U.-M. Multiple regression genetic programming. In Proc. 2014 Annual Conference on Genetic and Evolutionary Computation (ed. Igel, C.) 879–886 (Association for Computing Machinery, 2014).
Kommenda, M., Burlacu, B., Kronberger, G. & Affenzeller, M. Parameter identification for symbolic regression using nonlinear least squares. Genet. Program. Evolvable Mach. 21, 471–501 (2020).
Virgolin, M., Alderliesten, T. & Bosman, P. A. Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In Proc. Genetic and Evolutionary Computation Conference (ed. López-Ibáñez, M.) 1084–1092 (Association for Computing Machinery, 2019).
Kamienny, P.-A., Lample, G., Lamprier, S. & Virgolin, M. Deep generative symbolic regression with Monte-Carlo-tree-search. Proc. Mach. Learn. Res. 202, 15655–15668 (2023).
Lu, Q., Tao, F., Zhou, S. & Wang, Z. Incorporating actor–critic in Monte Carlo tree search for symbolic regression. Neural Comput. Appl. 33, 8495–8511 (2021).
Xu, Y., Liu, Y. & Sun, H. Reinforcement symbolic regression machine. In Proc. 12th International Conference on Learning Representations (ICLR, 2024).
Xie, Y. et al. An efficient and generalizable symbolic regression method for time series analysis. Preprint at http://arxiv.org/abs/2409.03986 (2024).
Sun, F., Liu, Y., Wang, J.-X. & Sun, H. Symbolic physics learner: discovering governing equations via Monte Carlo tree search. In Proc. 11th International Conference on Learning Representations (ICLR, 2023).
Valipour, M., You, B., Panju, M. & Ghodsi, A. SymbolicGPT: a generative transformer model for symbolic regression. Preprint at https://arxiv.org/abs/2106.14131 (2021).
Chen, T., Li, Z., Xu, P. & Zheng, H. Bootstrapping OTS-Funcimg pre-training model (Botfip): a comprehensive multimodal scientific computing framework and its application in symbolic regression task. Complex Intell. Syst. 11, 417 (2025).
Xing, H., Salleb-Aouissi, A. & Verma, N. Automated symbolic law discovery: a computer vision approach. In Proc. AAAI Conference on Artificial Intelligence 660–668 (AAAI, 2021).
Kamienny, P.-A., d’Ascoli, S., Lample, G. & Charton, F. End-to-end symbolic regression with transformers. Adv. Neural Inf. Process. Syst. 35, 10269–10281 (2022).
Biggio, L., Bendinelli, T., Neitz, A., Lucchi, A. & Parascandolo, G. Neural symbolic regression that scales. Proc. Mach. Learn. Res. 139, 936–945 (2021).
Shojaee, P., Meidani, K., Barati Farimani, A. & Reddy, C. Transformer-based planning for symbolic regression. Adv. Neural Inf. Process. Syst. 36, 45907–45919 (2023).
Udrescu, S.-M. & Tegmark, M. AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).
Cranmer, M. Interpretable machine learning for science with PySR and SymbolicRegression.jl. Preprint at https://arxiv.org/abs/2305.01582 (2023).
Burlacu, B., Kronberger, G. & Kommenda, M. Operon C++: an efficient genetic programming framework for symbolic regression. In Proc. 2020 Genetic and Evolutionary Computation Conference Companion 1562–1570 (Association for Computing Machinery, 2020).
Grayeli, A., Sehgal, A., Costilla Reyes, O., Cranmer, M. & Chaudhuri, S. Symbolic regression with a learned concept library. Adv. Neural Inf. Process. Syst. 37, 44678–44709 (2024).
Shojaee, P., Meidani, K., Gupta, S., Farimani, A. B. & Reddy, C. K. LLM-SR: scientific equation discovery via programming with large language models. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Landajuela, M. et al. A unified framework for deep symbolic regression. Adv. Neural Inf. Process. Syst. 35, 33985–33998 (2022).
Tenachi, W., Ibata, R. & Diakogiannis, F. I. Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws. Astrophys. J. 959, 99 (2023).
Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. Adv. Neural Inf. Process. Syst. 33, 4860–4871 (2020).
Scholl, P., Bieker, K., Hauger, H. & Kutyniok, G. ParFam—(neural guided) symbolic regression via continuous global optimization. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Liu, Z. et al. KAN: Kolmogorov–Arnold networks. In Proc. 13th International Conference on Learning Representations (ICLR, 2025).
Liu, Z., Ma, P., Wang, Y., Matusik, W. & Tegmark, M. KAN 2.0: Kolmogorov–Arnold networks meet science. Preprint at https://arxiv.org/abs/2408.10205 (2024).
Jin, Y., Fu, W., Kang, J., Guo, J. & Guo, J. Bayesian symbolic regression. Preprint at https://arxiv.org/abs/1910.08892 (2019).
La Cava, W. et al. Contemporary symbolic regression methods and their relative performance. Adv. Neural Inf. Process. Syst. 2021, 1–16 (2021).
Feynman, R., Leighton, R. & Sands, M. The Feynman Lectures on Physics, Vol. I: The New Millennium Edition: Mainly Mechanics, Radiation, and Heat (Basic Books, 2011).
Feynman, R., Leighton, R. & Sands, M. The Feynman Lectures on Physics, Vol. 2 (Pearson/Addison-Wesley, 1963).
Feynman, R., Leighton, R. & Sands, M. The Feynman Lectures on Physics, Vol. 3 (Pearson/Addison-Wesley, 1963).
SILSO World Data Center. The International Sunspot Number (1749–2023). International Sunspot Number Monthly Bulletin and Online Catalogue (2023).
Hathaway, D. H., Wilson, R. M. & Reichmann, E. J. The shape of the sunspot cycle. Sol. Phys. 151, 177–190 (1994).
Upton, L. A. & Hathaway, D. H. Solar cycle precursors and the outlook for cycle 25. J. Geophys. Res. Space Phys. 128, e2023JA031681 (2023).
Brehm, N. et al. Eleven-year solar cycles over the last millennium revealed by radiocarbon in tree rings. Nat. Geosci. 14, 10–15 (2021).
Wang, C.-P. et al. Empirical modeling of plasma sheet pressure and three-dimensional force-balanced magnetospheric magnetic field structure: 1. Observation. J. Geophys. Res. Space Phys. 118, 6154–6165 (2013).
Yue, C., Wang, C.-P., Zaharia, S. G., Xing, X. & Lyons, L. Empirical modeling of plasma sheet pressure and three-dimensional force-balanced magnetospheric magnetic field structure: 2. Modeling. J. Geophys. Res. Space Phys. 118, 6166–6175 (2013).
Lui, A. T. & Hamilton, D. C. Radial profiles of quiet time magnetospheric parameters. J. Geophys. Res. Space Phys. 97, 19325–19332 (1992).
Hotta, H. & Kusano, K. Solar differential rotation reproduced with high-resolution simulation. Nat. Astron. 5, 1100–1102 (2021).
Vasil, G. M. et al. The solar dynamo begins near the surface. Nature 629, 769–772 (2024).
Snodgrass, H. B. Magnetic rotation of the solar photosphere. Astrophys. J. 270, 288–299 (1983).
Rao, S. et al. Height-dependent differential rotation of the solar atmosphere detected by CHASE. Nat. Astron. 8, 1102–1109 (2024).
Dere, K. P., Landi, E., Mason, H. E., Monsignori Fossi, B. C. & Young, P. R. CHIANTI—an atomic database for emission lines. Astron. Astrophys. Suppl. Ser. 125, 149–173 (1997).
Dufresne, R. P. et al. CHIANTI—an atomic database for emission lines—paper. XVIII. Version 11, advanced ionization equilibrium models: density and charge transfer effects. Astrophys. J. 974, 71 (2024).
Kramida, A., Ralchenko, Y., Reader, J. & NIST ASD Team. NIST Atomic Spectra Database v.5.12 (NIST, accessed 20 October 2024); https://physics.nist.gov/asd
Aschwanden, M. J. Physics of the Solar Corona. An Introduction (Praxis–Springer, 2004).
Mason, H. E. & Monsignori Fossi, B. C. Spectroscopic diagnostics in the VUV for solar and stellar plasmas. Astron. Astrophys. Rev. 6, 123–179 (1994).
Raymond, J. C. & Smith, B. W. Soft X-ray spectrum of a hot plasma. Astrophys. J. Suppl. Ser. 35, 419–439 (1977).
Xiao, C. et al. Evidence for lunar tide effects in Earth’s plasmasphere. Nat. Phys. 19, 486–491 (2023).
RBSP/EFW Data (Univ. Minnesota, accessed 24 September 2025); http://www.space.umn.edu/rbspefw-data/
Zhang, Z., Liu, W. L., Zhang, D. J. & Cao, J. B. Estimating the corotation lag of the plasmasphere based on the electric field measurements of the Van Allen Probes. Adv. Space Res. 73, 758–766 (2024).
Silver, D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140–1144 (2018).
Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://arxiv.org/abs/2307.09288 (2023).
Maurya, A., Ye, J., Rafique, M. M., Cappello, F. & Nicolae, B. Deep optimizer states: towards scalable training of transformer models using interleaved offloading. In Proc. 25th International Middleware Conference, 404–416 (Association for Computing Machinery, 2024).
Lample, G. & Charton, F. Deep learning for symbolic mathematics. In International Conference on Learning Representations (ICLR, 2020).
Charton, F. Linear algebra with transformers. Trans. Mach. Learn. Res. (2022).
Bendinelli, T., Biggio, L. & Kamienny, P.-A. Controllable neural symbolic regression. Proc. Mach. Learn. Res. 202, 2063–2077 (2023).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Fletcher, R. Practical Methods of Optimization 2nd ed. (Wiley, 1987).
Ying, J. PhyE2E_datas. figshare https://figshare.com/articles/dataset/PhyE2E_datas/29615831 (2025).
Ying, J. Jie0618/PhysicsRegression: code for “A neural symbolic model for space physics” version v1.0.0. Zenodo https://doi.org/10.5281/zenodo.16305086 (2025).
Acknowledgements
This work is supported by the National Natural Science Foundation of China grants 52494974 and 62377030, China’s Village Science and Technology City Key Technology funding and Wuxi Research Institute of Applied Technologies.
Author information
Authors and Affiliations
Contributions
S.-T.Y., Y.Z. and J.M. conceived of the research idea. H.L. and Y.L. were responsible for the construction of datasets. J.Y. was responsible for the training and validation for PhyE2E, the comparison with other baseline methods and the analysis of results. C.Y. and Q.S. conceived of the five applications of space physics. J.Y. implemented the code and conducted the experiments. C.Y. was responsible for the applications of SSN prediction and plasma pressure prediction. Y.C. and Q.S. were responsible for the applications of solar rotational angular velocity prediction and contribution function of emission lines. C.X. was responsible for the application of the lunar tide signal of the plasma layer. J.Y., H.L. and C.Y. wrote the initial draft. Y.Z. and J.M. supervised the entire project and made final revisions to the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Parshin Shojaee and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance on Synthetic and AI Feynman datasets.
a, An example of D&C procedure, including Divide-and-Conquer and Backward Aggregation step. b, Decomposition accuracy for different D&C strategies. c, Symbolic accuracy and average accuracy of R2 > 0.99 under different D&C strategies. Data are presented as mean values ± SEM (n = 5 individual trials for each configuration). d, The accuracy performance on low data cases with different physical priors incorporated into PhyE2E. Data are presented as mean values ± SEM (n = 5 individual trials for each configuration).
Supplementary information
Supplementary Information (download PDF )
Supplementary Tables 1–20, Discussion and Figs. 1–4.
Source data
Source Data Fig. 2 (download XLSX )
Statistical source data.
Source Data Fig. 3 (download XLSX )
Source data of SSN with its corresponding predictions.
Source Data Fig. 4 (download XLSX )
Source data of plasma pressure and solar rotational angular velocity with their corresponding predictions.
Source Data Fig. 5 (download XLSX )
Source data of contribution functions and lunar tide signal with their corresponding predictions.
Source Data Extended Data Fig. 1 (download XLSX )
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ying, J., Lin, H., Yue, C. et al. A neural symbolic model for space physics. Nat Mach Intell 7, 1726–1741 (2025). https://doi.org/10.1038/s42256-025-01126-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01126-3


