Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Bi-level identification of governing equations for nonlinear physical systems

A preprint version of the article is available at Research Square.

Abstract

Identifying governing equations from observational data is crucial for understanding nonlinear physical systems but remains challenging due to the risk of overfitting. Here we introduce the Bi-Level Identification of Equations (BILLIE) framework, which simultaneously discovers and validates equations using a hierarchical optimization strategy. The policy gradient algorithm of reinforcement learning is leveraged to achieve the bi-level optimization. We demonstrate BILLIE’s superior performance through comparisons with baseline methods in canonical nonlinear systems such as turbulent flows and three-body systems. Furthermore, we apply the BILLIE framework to discover RNA and protein velocity equations directly from single-cell sequencing data. The equations identified by BILLIE outperform empirical models in predicting cellular differentiation states, underscoring BILLIE’s potential to reveal fundamental physical laws across a wide range of scientific fields.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The schematic algorithm of BILLIE.
The alternative text for this image may have been generated using AI.
Fig. 2: Evaluation of BILLIE framework on different physical systems.
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

The datasets used in this study are available in the Zenodo repository at https://doi.org/10.5281/zenodo.15140828 (ref. 59). Peripheral blood mononuclear cell CITE-Seq dataset (related to Extended Data Fig. 1 and Supplementary Fig. 5): the protein and RNA expression profiles were downloaded from the Gene Expression Omnibus database with the accession numbers GSM2695381 (protein) and GSM2695382 (RNA). Mouse hippocampus RNA-Seq dataset (related to Supplementary Figs. 6 and 7): the RNA expression profiles were downloaded from http://pklab.med.harvard.edu/velocyto/DentateGyrus/DentateGyrus.loom. Source data are available with this manuscript.

Code availability

The source codes to reproduce the results in this study are available on GitHub at https://github.com/HuiningYuan/BILLIE and Code Ocean at https://doi.org/10.24433/CO.0462000.v1 (ref. 60).

References

  1. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).

    Article  Google Scholar 

  2. Gorin, G., Svensson, V. & Pachter, L. Protein velocity and acceleration from single-cell multiomics experiments. Genome Biol. 21, 39 (2020).

    Article  Google Scholar 

  3. Carroll, B. W. & Ostlie, D. A. An Introduction to Modern Astrophysics (Cambridge Univ. Press, 2017).

  4. Batchelor, G. K. An Introduction to Fluid Dynamics (Cambridge Univ. Press, 1967).

  5. Karatzas, I., Shreve, S. E., Karatzas, I. & Shreve, S. E. Methods of Mathematical Finance Vol. 39 (Springer, 1998).

  6. Achdou, Y., Buera, F. J., Lasry, J.-M., Lions, P.-L. & Moll, B. Partial differential equation models in macroeconomics. Phil. Trans. R. Soc. A 372, 20130397 (2014).

    Article  MathSciNet  Google Scholar 

  7. Schuch, N. & Verstraete, F. Computational complexity of interacting electrons and fundamental limitations of density functional theory. Nat. Phys. 5, 732–735 (2009).

    Article  Google Scholar 

  8. Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).

    Article  Google Scholar 

  9. Udrescu, S.-M. & Tegmark, M. AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).

    Article  Google Scholar 

  10. Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. Adv. Neural Inf. Process. Syst. 33, 4860–4871 (2020).

  11. Vastl, M., Kulhánek, J., Kubalík, J., Derner, E. & Babuška, R. Symformer: end-to-end symbolic regression using transformer-based architecture. IEEE Access 12, 37840–37849 (2024).

    Article  Google Scholar 

  12. Sun, F., Liu, Y., Wang, J.-X. & Sun, H. Symbolic physics learner: discovering governing equations via Monte Carlo tree search. In Proc. 11th International Conference on Learning Representations https://openreview.net/forum?id=ZTK3SefE8_Z (OpenReview.net, 2023).

  13. Lemos, P., Jeffrey, N., Cranmer, M., Ho, S. & Battaglia, P. Rediscovering orbital mechanics with machine learning. Mach. Learn. Sci. Technol. 4, 045002 (2023).

    Article  Google Scholar 

  14. Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).

    Article  MathSciNet  Google Scholar 

  15. Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).

    Article  Google Scholar 

  16. Champion, K., Zheng, P., Aravkin, A. Y., Brunton, S. L. & Kutz, J. N. A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access 8, 169259–169271 (2020).

    Article  Google Scholar 

  17. Chen, Z., Liu, Y. & Sun, H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 12, 6136 (2021).

    Article  Google Scholar 

  18. Boninsegna, L., Nüske, F. & Clementi, C. Sparse learning of stochastic dynamical equations. J. Chem. Phys. 148, 241723 (2018).

    Article  Google Scholar 

  19. Zheng, P., Askham, T., Brunton, S. L., Kutz, J. N. & Aravkin, A. Y. A unified framework for sparse relaxed regularized regression: SR3. IEEE Access 7, 1404–1423 (2018).

    Article  Google Scholar 

  20. Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Data-driven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. USA 116, 22445–22451 (2019).

    Article  MathSciNet  Google Scholar 

  21. Xu, H., Chang, H. & Zhang, D. DLGA-PDE: discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm. J. Comput. Phys. 418, 109584 (2020).

    Article  MathSciNet  Google Scholar 

  22. Xu, H., Zhang, D. & Zeng, J. Deep-learning of parametric partial differential equations from sparse and noisy data. Phys. Fluids 33, 037132 (2021).

    Article  Google Scholar 

  23. Xu, H., Zhang, D. & Wang, N. Deep-learning based discovery of partial differential equations in integral form from sparse and noisy data. J. Comput. Phys. 445, 110592 (2021).

    Article  MathSciNet  Google Scholar 

  24. Reinbold, P. A. K., Gurevich, D. R. & Grigoriev, R. O. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Phys. Rev. E 101, 010203 (2020).

    Article  Google Scholar 

  25. Reinbold, P. A., Kageorge, L. M., Schatz, M. F. & Grigoriev, R. O. Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression. Nat. Commun. 12, 3219 (2021).

    Article  Google Scholar 

  26. Fasel, U., Kutz, J. N., Brunton, B. W. & Brunton, S. L. Ensemble-SINDy: robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proc. R. Soc. A 478, 20210904 (2022).

    Article  MathSciNet  Google Scholar 

  27. Berg, J. & Nyström, K. Data-driven discovery of PDEs in complex datasets. J. Comput. Phys. 384, 239–252 (2019).

    Article  MathSciNet  Google Scholar 

  28. Xu, H., Haibin, C. & Zhang, D. DL-PDE: deep-learning based data-driven discovery of partial differential equations from discrete and noisy data. Commun. Comput. Phys. 29, 698–728 (2021).

    Article  MathSciNet  Google Scholar 

  29. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019).

    Article  MathSciNet  Google Scholar 

  30. Raissi, M. & Karniadakis, G. E. Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018).

    Article  MathSciNet  Google Scholar 

  31. Long, Z., Lu, Y., Ma, X. & Dong, B. PDE-Net: learning PDEs from data. Proc. Mach. Learn. Res. 80, 3208–3216 (2018).

  32. Long, Z., Lu, Y. & Dong, B. PDE-Net 2.0: learning PDEs from data with a numeric–symbolic hybrid deep network. J. Comput. Phys. 399, 108925 (2019).

    Article  MathSciNet  Google Scholar 

  33. Rao, C. et al. Encoding physics to learn reaction–diffusion processes. Nat. Mach. Intell. 5, 765–779 (2023).

    Article  Google Scholar 

  34. Kabanikhin, S. I. Definitions and examples of inverse and ill-posed problems. J. Inverse Ill-Posed Probl. 16, 317–357 (2008).

    Article  MathSciNet  Google Scholar 

  35. Sutton, R. S., McAllester, D., Singh, S. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (1999).

  36. Silver, D. et al. Deterministic policy gradient algorithms. Proc. Mach. Learn. Res. 32, 387–395 (2014).

  37. Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).

    Article  Google Scholar 

  38. McDonald, P. W. The Computation of Transonic Flow Through Two-Dimensional Gas Turbine Cascades 79825 (American Society of Mechanical Engineers, 1971).

  39. Ferziger, J. H., Perić, M. & Street, R. L. Computational Methods for Fluid Dynamics Vol. 3 (Springer, 2002).

  40. Li, T., Shi, J., Wu, Y. & Zhou, P. On the mathematics of RNA velocity I: theoretical analysis. CSIAM Trans. Appl. Math. 2, 1–55 (2021).

    Article  MathSciNet  Google Scholar 

  41. Stoeckius, M. et al. Large-scale simultaneous measurement of epitopes and transcriptomes in single cells. Nat. Methods 14, 865–868 (2017).

    Article  Google Scholar 

  42. Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).

    Article  Google Scholar 

  43. Dhapola, P. et al. Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data. Nat. Commun. 13, 4616 (2022).

    Article  Google Scholar 

  44. Hochgerner, H., Zeisel, A., Lönnerberg, P. & Linnarsson, S. Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by single-cell RNA sequencing. Nat. Neurosci. 21, 290–299 (2018).

    Article  Google Scholar 

  45. Cleveland, W. S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829–836 (1979).

    Article  MathSciNet  Google Scholar 

  46. Oduguwa, V. & Roy, R. Bi-level optimisation using genetic algorithm. In Proc. 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS 2002) 322–327 (IEEE, 2002).

  47. Wang, X. et al. Optimizing data usage via differentiable rewards. Proc. Mach. Learn. Res. 119, 9983–9995 (2020).

  48. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) https://arxiv.org/abs/1412.6980 (2014).

  49. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  Google Scholar 

  50. Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).

    Article  Google Scholar 

  51. Hoerl, A. E. & Kennard, R. W. Ridge regression: applications to nonorthogonal problems. Technometrics 12, 69–82 (1970).

    Article  Google Scholar 

  52. Raissi, M., Yazdani, A. & Karniadakis, G. E. Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367, 1026–1030 (2020).

    Article  MathSciNet  Google Scholar 

  53. Boffetta, G. et al. Two-dimensional turbulence. Annu. Rev. Fluid Mech. 44, 427–451 (2012).

    Article  MathSciNet  Google Scholar 

  54. Kochkov, D. et al. Machine learning-accelerated computational fluid dynamics. Proc. Natl Acad. Sci. USA 118, e2101784118 (2021).

    Article  MathSciNet  Google Scholar 

  55. Van Leer, B. Towards the ultimate conservative difference scheme. V. A second-order sequel to Godunov’s method. J. Comput. Phys. 32, 101–136 (1979).

    Article  Google Scholar 

  56. Frisch, U. & Kolmogorov, A. N. Turbulence: the Legacy of AN Kolmogorov (Cambridge Univ. Press, 1995).

  57. de Silva, B. et al. PySINDy: a Python package for the sparse identification of nonlinear dynamical systems from data. J. Open Source Softw. 5, 2104 (2020).

    Article  Google Scholar 

  58. Kaptanoglu, A. A. et al. PySINDy: a comprehensive Python package for robust sparse system identification. J. Open Source Softw. 7, 3994 (2022).

    Article  Google Scholar 

  59. Li, Z. Bi-level identification of governing equations for nonlinear physical systems. Zenodo https://doi.org/10.5281/zenodo.15140828 (2025).

  60. Li, Z. et al. Bi-level identification of governing equations for nonlinear physical systems. Code Ocean https://doi.org/10.24433/CO.0462000.v1 (2025).

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (no. 52376090) and the National Key Research and Development Program of China (no. 2022YFF0504500).

Author information

Authors and Affiliations

Authors

Contributions

L.Y. and W.H. supervised the project. L.Y., Z.L., H.Y. and W.H. conceived the idea. Z.L. carried out the numerical simulations. Z.L., H.Y., Y.H. and H.D. performed the research. All authors discussed the results and assisted during paper preparation.

Corresponding authors

Correspondence to Wang Han or Lijun Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Alan Ali Kaptanoglu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Jie Pan, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Table 1 Three sets of BILLIE-identified equation

Extended Data Fig. 1 Identifying RNA velocity and protein velocity on multi-modal single-cell sequencing data.

a, The process of a gene’s information passing from unspliced mRNA (denote as u) to spliced mRNA (s) through splicing, and from spliced mRNA (s) to protein (p) through translation. b, Cell type of the single-cell sequencing dataset used in the identification of RNA velocity and protein velocity, where Mono type cells were used as the training data, CD4+T and CD8+T type cells were used as the testing data. c, the process of performing RNA/protein velocity identification with BILLIE, in which the equations across different genes share the same form (that is, Γ) while having distinct libraries (that is, Ut and Q) and coefficients (that is, θ). d, The cell-level correlation between the original sequencing and the predictions made by the identified equation and the empirical equation on the abundance of spliced mRNA. e, The relationship between gene-level correlation (between the original sequencing and the predictions) and data sparsity on the abundance of spliced mRNA. Each point in the plot presents a single gene, and 69.5%, 30.5% and 0.5% denotes the ratio of genes divided by the data sparsity and the performance of the predictions. Predictions with over 0.6 Pearson correlation are considered ‘good’ predictions; genes with data sparsity over 0.5 are considered ‘very sparse’. f, Spliced mRNA abundance of representative marker genes, including the original sequencing and the predictions made by the different equations. g, The cell-level correlation between the original sequencing and the predictions on the abundance of protein. h, The gene-level correlation of all 7 genes on the abundance of protein. i, Protein abundance of 4 of the 7 marker genes.

Source data

Extended Data Fig. 2 The general workflow of identifying the governing equation of a 2D fluid dynamical system from data.

With the spatial–temporal measurements collected from a physical system (such as a fluid system shown in the first panel on the left), the spatial and temporal derivatives at each location can be calculated using polynomial fit (the second panel), which are then used for building the overcomplete library Q (the third panel). By selecting proper terms from the overcomplete library, the dynamics of a given system can be identified (the last panel on the right).

Supplementary information

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Yuan, H., Han, W. et al. Bi-level identification of governing equations for nonlinear physical systems. Nat Comput Sci 5, 456–466 (2025). https://doi.org/10.1038/s43588-025-00804-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s43588-025-00804-x

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics