Bi-level identification of governing equations for nonlinear physical systems

Li, Zeyu; Yuan, Huining; Han, Wang; Hou, Yimin; Li, Hongjue; Ding, Haidong; Jiang, Zhiguo; Yang, Lijun

doi:10.1038/s43588-025-00804-x

Brief Communication
Published: 09 May 2025

Bi-level identification of governing equations for nonlinear physical systems

Nature Computational Science volume 5, pages 456–466 (2025) Cite this article

1880 Accesses
2 Citations
7 Altmetric
Metrics details

Subjects

A preprint version of the article is available at Research Square.

Abstract

Identifying governing equations from observational data is crucial for understanding nonlinear physical systems but remains challenging due to the risk of overfitting. Here we introduce the Bi-Level Identification of Equations (BILLIE) framework, which simultaneously discovers and validates equations using a hierarchical optimization strategy. The policy gradient algorithm of reinforcement learning is leveraged to achieve the bi-level optimization. We demonstrate BILLIE’s superior performance through comparisons with baseline methods in canonical nonlinear systems such as turbulent flows and three-body systems. Furthermore, we apply the BILLIE framework to discover RNA and protein velocity equations directly from single-cell sequencing data. The equations identified by BILLIE outperform empirical models in predicting cellular differentiation states, underscoring BILLIE’s potential to reveal fundamental physical laws across a wide range of scientific fields.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The schematic algorithm of BILLIE.**

**Fig. 2: Evaluation of BILLIE framework on different physical systems.**

Wavelets based physics informed neural networks to solve non-linear differential equations

Article Open access 18 February 2023

Air quality prediction based on factor analysis combined with Transformer and CNN-BILSTM-ATTENTION models

Article Open access 06 June 2025

Physics-informed learning of governing equations from scarce data

Article Open access 21 October 2021

Data availability

The datasets used in this study are available in the Zenodo repository at https://doi.org/10.5281/zenodo.15140828 (ref. ⁵⁹). Peripheral blood mononuclear cell CITE-Seq dataset (related to Extended Data Fig. 1 and Supplementary Fig. 5): the protein and RNA expression profiles were downloaded from the Gene Expression Omnibus database with the accession numbers GSM2695381 (protein) and GSM2695382 (RNA). Mouse hippocampus RNA-Seq dataset (related to Supplementary Figs. 6 and 7): the RNA expression profiles were downloaded from http://pklab.med.harvard.edu/velocyto/DentateGyrus/DentateGyrus.loom. Source data are available with this manuscript.

Code availability

The source codes to reproduce the results in this study are available on GitHub at https://github.com/HuiningYuan/BILLIE and Code Ocean at https://doi.org/10.24433/CO.0462000.v1 (ref. ⁶⁰).

References

La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Article Google Scholar
Gorin, G., Svensson, V. & Pachter, L. Protein velocity and acceleration from single-cell multiomics experiments. Genome Biol. 21, 39 (2020).
Article Google Scholar
Carroll, B. W. & Ostlie, D. A. An Introduction to Modern Astrophysics (Cambridge Univ. Press, 2017).
Batchelor, G. K. An Introduction to Fluid Dynamics (Cambridge Univ. Press, 1967).
Karatzas, I., Shreve, S. E., Karatzas, I. & Shreve, S. E. Methods of Mathematical Finance Vol. 39 (Springer, 1998).
Achdou, Y., Buera, F. J., Lasry, J.-M., Lions, P.-L. & Moll, B. Partial differential equation models in macroeconomics. Phil. Trans. R. Soc. A 372, 20130397 (2014).
Article MathSciNet Google Scholar
Schuch, N. & Verstraete, F. Computational complexity of interacting electrons and fundamental limitations of density functional theory. Nat. Phys. 5, 732–735 (2009).
Article Google Scholar
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 81–85 (2009).
Article Google Scholar
Udrescu, S.-M. & Tegmark, M. AI Feynman: a physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).
Article Google Scholar
Udrescu, S.-M. et al. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. Adv. Neural Inf. Process. Syst. 33, 4860–4871 (2020).
Vastl, M., Kulhánek, J., Kubalík, J., Derner, E. & Babuška, R. Symformer: end-to-end symbolic regression using transformer-based architecture. IEEE Access 12, 37840–37849 (2024).
Article Google Scholar
Sun, F., Liu, Y., Wang, J.-X. & Sun, H. Symbolic physics learner: discovering governing equations via Monte Carlo tree search. In Proc. 11th International Conference on Learning Representations https://openreview.net/forum?id=ZTK3SefE8_Z (OpenReview.net, 2023).
Lemos, P., Jeffrey, N., Cranmer, M., Ho, S. & Battaglia, P. Rediscovering orbital mechanics with machine learning. Mach. Learn. Sci. Technol. 4, 045002 (2023).
Article Google Scholar
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937 (2016).
Article MathSciNet Google Scholar
Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).
Article Google Scholar
Champion, K., Zheng, P., Aravkin, A. Y., Brunton, S. L. & Kutz, J. N. A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access 8, 169259–169271 (2020).
Article Google Scholar
Chen, Z., Liu, Y. & Sun, H. Physics-informed learning of governing equations from scarce data. Nat. Commun. 12, 6136 (2021).
Article Google Scholar
Boninsegna, L., Nüske, F. & Clementi, C. Sparse learning of stochastic dynamical equations. J. Chem. Phys. 148, 241723 (2018).
Article Google Scholar
Zheng, P., Askham, T., Brunton, S. L., Kutz, J. N. & Aravkin, A. Y. A unified framework for sparse relaxed regularized regression: SR3. IEEE Access 7, 1404–1423 (2018).
Article Google Scholar
Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Data-driven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. USA 116, 22445–22451 (2019).
Article MathSciNet Google Scholar
Xu, H., Chang, H. & Zhang, D. DLGA-PDE: discovery of PDEs with incomplete candidate library via combination of deep learning and genetic algorithm. J. Comput. Phys. 418, 109584 (2020).
Article MathSciNet Google Scholar
Xu, H., Zhang, D. & Zeng, J. Deep-learning of parametric partial differential equations from sparse and noisy data. Phys. Fluids 33, 037132 (2021).
Article Google Scholar
Xu, H., Zhang, D. & Wang, N. Deep-learning based discovery of partial differential equations in integral form from sparse and noisy data. J. Comput. Phys. 445, 110592 (2021).
Article MathSciNet Google Scholar
Reinbold, P. A. K., Gurevich, D. R. & Grigoriev, R. O. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Phys. Rev. E 101, 010203 (2020).
Article Google Scholar
Reinbold, P. A., Kageorge, L. M., Schatz, M. F. & Grigoriev, R. O. Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression. Nat. Commun. 12, 3219 (2021).
Article Google Scholar
Fasel, U., Kutz, J. N., Brunton, B. W. & Brunton, S. L. Ensemble-SINDy: robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proc. R. Soc. A 478, 20210904 (2022).
Article MathSciNet Google Scholar
Berg, J. & Nyström, K. Data-driven discovery of PDEs in complex datasets. J. Comput. Phys. 384, 239–252 (2019).
Article MathSciNet Google Scholar
Xu, H., Haibin, C. & Zhang, D. DL-PDE: deep-learning based data-driven discovery of partial differential equations from discrete and noisy data. Commun. Comput. Phys. 29, 698–728 (2021).
Article MathSciNet Google Scholar
Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019).
Article MathSciNet Google Scholar
Raissi, M. & Karniadakis, G. E. Hidden physics models: machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018).
Article MathSciNet Google Scholar
Long, Z., Lu, Y., Ma, X. & Dong, B. PDE-Net: learning PDEs from data. Proc. Mach. Learn. Res. 80, 3208–3216 (2018).
Long, Z., Lu, Y. & Dong, B. PDE-Net 2.0: learning PDEs from data with a numeric–symbolic hybrid deep network. J. Comput. Phys. 399, 108925 (2019).
Article MathSciNet Google Scholar
Rao, C. et al. Encoding physics to learn reaction–diffusion processes. Nat. Mach. Intell. 5, 765–779 (2023).
Article Google Scholar
Kabanikhin, S. I. Definitions and examples of inverse and ill-posed problems. J. Inverse Ill-Posed Probl. 16, 317–357 (2008).
Article MathSciNet Google Scholar
Sutton, R. S., McAllester, D., Singh, S. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12, 1057–1063 (1999).
Silver, D. et al. Deterministic policy gradient algorithms. Proc. Mach. Learn. Res. 32, 387–395 (2014).
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Article Google Scholar
McDonald, P. W. The Computation of Transonic Flow Through Two-Dimensional Gas Turbine Cascades 79825 (American Society of Mechanical Engineers, 1971).
Ferziger, J. H., Perić, M. & Street, R. L. Computational Methods for Fluid Dynamics Vol. 3 (Springer, 2002).
Li, T., Shi, J., Wu, Y. & Zhou, P. On the mathematics of RNA velocity I: theoretical analysis. CSIAM Trans. Appl. Math. 2, 1–55 (2021).
Article MathSciNet Google Scholar
Stoeckius, M. et al. Large-scale simultaneous measurement of epitopes and transcriptomes in single cells. Nat. Methods 14, 865–868 (2017).
Article Google Scholar
Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).
Article Google Scholar
Dhapola, P. et al. Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data. Nat. Commun. 13, 4616 (2022).
Article Google Scholar
Hochgerner, H., Zeisel, A., Lönnerberg, P. & Linnarsson, S. Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by single-cell RNA sequencing. Nat. Neurosci. 21, 290–299 (2018).
Article Google Scholar
Cleveland, W. S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829–836 (1979).
Article MathSciNet Google Scholar
Oduguwa, V. & Roy, R. Bi-level optimisation using genetic algorithm. In Proc. 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS 2002) 322–327 (IEEE, 2002).
Wang, X. et al. Optimizing data usage via differentiable rewards. Proc. Mach. Learn. Res. 119, 9983–9995 (2020).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) https://arxiv.org/abs/1412.6980 (2014).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
Hoerl, A. E. & Kennard, R. W. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970).
Article Google Scholar
Hoerl, A. E. & Kennard, R. W. Ridge regression: applications to nonorthogonal problems. Technometrics 12, 69–82 (1970).
Article Google Scholar
Raissi, M., Yazdani, A. & Karniadakis, G. E. Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367, 1026–1030 (2020).
Article MathSciNet Google Scholar
Boffetta, G. et al. Two-dimensional turbulence. Annu. Rev. Fluid Mech. 44, 427–451 (2012).
Article MathSciNet Google Scholar
Kochkov, D. et al. Machine learning-accelerated computational fluid dynamics. Proc. Natl Acad. Sci. USA 118, e2101784118 (2021).
Article MathSciNet Google Scholar
Van Leer, B. Towards the ultimate conservative difference scheme. V. A second-order sequel to Godunov’s method. J. Comput. Phys. 32, 101–136 (1979).
Article Google Scholar
Frisch, U. & Kolmogorov, A. N. Turbulence: the Legacy of AN Kolmogorov (Cambridge Univ. Press, 1995).
de Silva, B. et al. PySINDy: a Python package for the sparse identification of nonlinear dynamical systems from data. J. Open Source Softw. 5, 2104 (2020).
Article Google Scholar
Kaptanoglu, A. A. et al. PySINDy: a comprehensive Python package for robust sparse system identification. J. Open Source Softw. 7, 3994 (2022).
Article Google Scholar
Li, Z. Bi-level identification of governing equations for nonlinear physical systems. Zenodo https://doi.org/10.5281/zenodo.15140828 (2025).
Li, Z. et al. Bi-level identification of governing equations for nonlinear physical systems. Code Ocean https://doi.org/10.24433/CO.0462000.v1 (2025).

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (no. 52376090) and the National Key Research and Development Program of China (no. 2022YFF0504500).

Author information

These authors contributed equally: Zeyu Li, Huining Yuan.

Authors and Affiliations

School of Astronautics, Beihang University, Beijing, China
Zeyu Li, Huining Yuan, Wang Han, Yimin Hou, Hongjue Li, Haidong Ding, Zhiguo Jiang & Lijun Yang

Authors

Zeyu Li
View author publications
Search author on:PubMed Google Scholar
Huining Yuan
View author publications
Search author on:PubMed Google Scholar
Wang Han
View author publications
Search author on:PubMed Google Scholar
Yimin Hou
View author publications
Search author on:PubMed Google Scholar
Hongjue Li
View author publications
Search author on:PubMed Google Scholar
Haidong Ding
View author publications
Search author on:PubMed Google Scholar
Zhiguo Jiang
View author publications
Search author on:PubMed Google Scholar
Lijun Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

L.Y. and W.H. supervised the project. L.Y., Z.L., H.Y. and W.H. conceived the idea. Z.L. carried out the numerical simulations. Z.L., H.Y., Y.H. and H.D. performed the research. All authors discussed the results and assisted during paper preparation.

Corresponding authors

Correspondence to Wang Han or Lijun Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Alan Ali Kaptanoglu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Jie Pan, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Table 1 Three sets of BILLIE-identified equation

Full size table

Extended Data Fig. 1 Identifying RNA velocity and protein velocity on multi-modal single-cell sequencing data.

a, The process of a gene’s information passing from unspliced mRNA (denote as u) to spliced mRNA (s) through splicing, and from spliced mRNA (s) to protein (p) through translation. b, Cell type of the single-cell sequencing dataset used in the identification of RNA velocity and protein velocity, where Mono type cells were used as the training data, CD4+T and CD8+T type cells were used as the testing data. c, the process of performing RNA/protein velocity identification with BILLIE, in which the equations across different genes share the same form (that is, Γ) while having distinct libraries (that is, U_t and Q) and coefficients (that is, θ). d, The cell-level correlation between the original sequencing and the predictions made by the identified equation and the empirical equation on the abundance of spliced mRNA. e, The relationship between gene-level correlation (between the original sequencing and the predictions) and data sparsity on the abundance of spliced mRNA. Each point in the plot presents a single gene, and 69.5%, 30.5% and 0.5% denotes the ratio of genes divided by the data sparsity and the performance of the predictions. Predictions with over 0.6 Pearson correlation are considered ‘good’ predictions; genes with data sparsity over 0.5 are considered ‘very sparse’. f, Spliced mRNA abundance of representative marker genes, including the original sequencing and the predictions made by the different equations. g, The cell-level correlation between the original sequencing and the predictions on the abundance of protein. h, The gene-level correlation of all 7 genes on the abundance of protein. i, Protein abundance of 4 of the 7 marker genes.

Source data

Extended Data Fig. 2 The general workflow of identifying the governing equation of a 2D fluid dynamical system from data.

With the spatial–temporal measurements collected from a physical system (such as a fluid system shown in the first panel on the left), the spatial and temporal derivatives at each location can be calculated using polynomial fit (the second panel), which are then used for building the overcomplete library Q (the third panel). By selecting proper terms from the overcomplete library, the dynamics of a given system can be identified (the last panel on the right).

Supplementary information

Supplementary Information (download PDF )

Supplementary Notes 1–7, Figs. 1–8 and Tables 1–4.

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data Fig. 2 (download XLSX )

Statistical source data.

Source Data Extended Data Fig. 1 (download XLSX )

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Z., Yuan, H., Han, W. et al. Bi-level identification of governing equations for nonlinear physical systems. Nat Comput Sci 5, 456–466 (2025). https://doi.org/10.1038/s43588-025-00804-x

Download citation

Received: 26 April 2024
Accepted: 09 April 2025
Published: 09 May 2025
Version of record: 09 May 2025
Issue date: June 2025
DOI: https://doi.org/10.1038/s43588-025-00804-x