Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Visualising backward information propagation in deep reinforcement learning from a variational data assimilation perspective
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 02 March 2026

Visualising backward information propagation in deep reinforcement learning from a variational data assimilation perspective

  • Kuo-Ying Wang1 

Scientific Reports , Article number:  (2026) Cite this article

  • 1069 Accesses

  • 1 Altmetric

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Mathematics and computing
  • Neuroscience

Abstract

It has long been recognised that variational data assimilation, including four-dimensional variational methods (4D-Var), is grounded in Bayesian inference and gradient-based optimisation. Deep reinforcement learning (RL) employs related mathematical machinery, iteratively minimising a scalar objective function through backward propagation of information. In this study, we do not propose new algorithms or theoretical connections, but instead provide a transparent and visual illustration of these well-established relationships. Using a compact neural network trained to play the classic Snake game, we track the evolution of all network weights at every training iteration. Short-horizon temporal-difference updates yield frequent local gradient steps on a linearised error signal, closely resembling the inner-loop minimisation of incremental 4D-Var, while experience replay repeatedly recomputes gradients under updated parameters, analogous to outer-loop relinearisation about an evolving reference trajectory. This minimal and fully observable system serves as a controlled laboratory for visualising backward information propagation in optimisation processes familiar to both reinforcement learning and variational data assimilation. The resulting comparison offers an interpretable, pedagogical perspective on reinforcement learning using concepts long established in the data-assimilation literature, without claiming new algorithmic insights or applications.

Similar content being viewed by others

Variational tensor neural networks for deep learning

Article Open access 16 August 2024

Near real-time online reinforcement learning with synchronous or asynchronous updates

Article Open access 17 May 2025

Vectorized instructive signals in cortical dendrites

Article Open access 25 February 2026

Data availability

All code, trained models, per-iteration weight movies, and figures are permanently archived at https://doi.org/10.6084/m9.figshare.30795956.

References

  1. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Google Scholar 

  2. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Google Scholar 

  3. Abadi, M. et al. TensorFlow: Large-scale machine learning on heterogeneous systems. Available at tensorflow.org. (2015).

  4. Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, 8024–8035 (2019).

    Google Scholar 

  5. ThinkAutomation, The AI black box problem. https://www.thinkautomation.com/bots-and-ai/the-ai-black-box-problem/. (2021).

  6. Bleicher, A. Demystifying the black box that is AI. Scientific American. https://www.scientificamerican.com/article/demystifying-the-black-box-that-is-ai/. (2017).

  7. Rudin, C. & Radin, J. Why are we using black box models in AI when we do not need to? Harvard Data Science Review. (2019).

  8. Castelvecchi, D. Can we open the black box of AI?. Nature 538, 20–23 (2016).

    Google Scholar 

  9. ODSC. AI black box horror stories – when transparency was needed more than ever. https://medium.com/@ODSC/ai-black-box-horror-stories-when-transparency-was-needed-more-than-ever-3d6ac0439242 (2019).

  10. Manica, M. & Cadow, J. Opening the black box – interpretability in deep learning. https://opendatascience.com/opening-the-black-box-interpretability-in-deep-learning/. (2019).

  11. Bathaee, S. The artificial intelligence black box and the failure of intent and causation. Harv. J. Law Technol. 31(2), 890–938 (2018).

    Google Scholar 

  12. Bender, E. Unpacking the black box in artificial intelligence for medicine. https://undark.org/2019/12/04/black-box-artificial-intelligence/. (2019).

  13. Big Cloud. The difference between white box and black box AI. https://bigcloud.global/the-difference-between-white-box-and-black-box-ai/. (2020).

  14. Deloitte. Managing the black box of artificial intelligence (AI). https://www2.deloitte.com/us/en/pages/advisory/articles/black-box-artificial-intelligence.html. (2021).

  15. Roelenga, B. Think outside the black box. https://towardsdatascience.com/think-outside-the-black-box-7e6c95bd2234. 2021.

  16. LeCun, Y., Bottou, L., Orr, G. B. & Müller, K.-R. 1998 (Tricks of the Trade (Springer, 1998).

    Google Scholar 

  17. Haykin, S. Neural Networks: A Comprehensive Foundation. (Prentice Hall, 1999).

  18. Bellman, R. On the theory of dynamic programming. Proc. Natl. Acad. Sci. U. S. A. 38, 716–719 (1952).

    Google Scholar 

  19. Daley, R. (1991), Atmospheric Data Analysis (Cambridge Univ, 1991).

    Google Scholar 

  20. Wang, K.-Y. et al. A review on the use of the adjoint method in four-dimensional atmospheric data assimilation. Q. J. R. Meteorol. Soc. 127(576), 2181–2204 (2001).

    Google Scholar 

  21. Vodopivec, T., Samothrakis, S. & Ster, B. On Monte Carlo tree search and reinforcement learning. J. Artif. Intell. Res. 60, 881–936 (2017).

    Google Scholar 

  22. Clear, J. (2018), Atomic Habits (Avery, 2018).

    Google Scholar 

  23. Gerald, C. F. & Wheatley, P. O. Applied Numerical Analysis. 6th Ed. (Addison-Wesley, 1998) (1998).

  24. Loeber, P. Teach AI to play snake! Reinforcement learning with PyTorch and Pygame. https://github.com/python-engineer/snake-ai-pytorch (2021).

  25. Ham, Y.-G., Kim, J.-H. & Luo, J.-J. Deep learning for multi-year ENSO forecasts. Nature 573, 568–572 (2019).

    Google Scholar 

  26. Ravuri, S. Skillful precipitation nowcasting using deep generative models of radar. Nature 597, 672–677 (2021).

    Google Scholar 

  27. Mihai, D. Artificial intelligence for very high resolution earth observation. https://elib.dlr.de/130905. (2019).

  28. Zhou, L. et al. Machine learning on big data: Opportunities and challenges. Neurocomputing 237, 350–361 (2017).

    Google Scholar 

  29. Kruitwagen, L. Mapping global solar plants using satellites and machine learning. https://theconversation.com/we-mapped-every-large-solar-plant-on-the-planet-using-satellites-and-machine-learning. (2021).

  30. Kruitwagen, L. et al. A global inventory of photovoltaic solar energy generating units. Nature 598, 604–610 (2021).

    Google Scholar 

  31. Reichstein, M. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019).

    Google Scholar 

  32. Baek, A. et al. Online machine learning for Earth system digital twins. Nature Machine Intelligence 6, 987–998 (2024).

    Google Scholar 

  33. Cheng, S., et al. Physics-informed machine learning for weather and climate modeling. Rev. Geophys., 62, e2023RG000812. (2024).

  34. Ebrahimi, M. et al. Artificial intelligence for geoscience: Progress, challenges, and perspectives. Earth-Sci. Rev. 249, 104650 (2024).

    Google Scholar 

  35. Kochkov, D. et al. Machine learning acceleration of Earth system models. Nature 624, 782–789 (2024).

    Google Scholar 

  36. Le Dimet, F.-X. & Talagrand, O. Variational algorithms in meteorology. Q. J. R. Meteorol. Soc. 112, 473–494 (1986).

    Google Scholar 

  37. Lee, H., et al. Reinforcement learning for ensemble data assimilation. J. Adv. Model. Earth Syst., 16, e2023MS004012. (2024).

  38. Talagrand, O. & Courtier, P. Variational assimilation of meteorological observations with the adjoint vorticity equation. Q. J. R. Meteorol. Soc. 113, 1311–1328 (1987).

    Google Scholar 

  39. Irrgang, C. et al. Will artificial intelligence supersede Earth system models?, Nature Machine Intelligence. (2021).

  40. Abarbanel, H. D. I., Rozdeba, P. J. & Shirman, S. Machine Learning: Deepest Learning as Statistical Data Assimilation Problems. Neural Computation 30(8), 2025–2055. https://doi.org/10.1162/neco_a_01094 (2018).

    Google Scholar 

  41. Geer, A. J. Learning earth system models from observations: Machine learning or data assimilation?. Philos. Trans. R. Soc. A. Math. Phys. Eng. Sci. 379(2194), 20200089. https://doi.org/10.1098/rsta.2020.0089 (2021).

    Google Scholar 

  42. Gratton, S., Lawless, A. S. & Nichols, N. K. Approximate Gauss-Newton methods for nonlinear least squares problems. SIAM Journal on Optimization 18(1), 106–132. https://doi.org/10.1137/050624935 (2007).

    Google Scholar 

  43. Lawless, A. S., Gratton, S. & Nichols, N. K. An investigation of incremental 4d-var using non-tangent linear models. Quarterly Journal of the Royal Meteorological Society 131(606), 459–476. https://doi.org/10.1256/qj.04.20 (2005).

    Google Scholar 

  44. Lawless, A. S. & Nichols, N. K. Inner-loop stopping criteria for incremental four-dimensional variational data assimilation. Monthly Weather Review 134(11), 3425–3435. https://doi.org/10.1175/MWR3242.1 (2006).

    Google Scholar 

  45. Solvik, K., Penny, S. G. & Hoyer, S. 4D-Var using hessian approximation and backpropagation applied to automatically differentiable numerical and machine learning models. J. Adv. Model. Earth Syst. 17, e2024MS004608. https://doi.org/10.1029/2024MS004608 (2025).

    Google Scholar 

  46. Abarbanel, H. D. I. The Statistical Physics of Data Assimilation and Machine Learning (Cambridge University Press, 2022).

    Google Scholar 

  47. Gardner, M. W. & Dorling, S. R. Artificial neural networks (the multilayer perceptron) - A review of applications in the atmospheric sciences. Atmospheric Environment 32(14/15), 2627–2636 (1998).

    Google Scholar 

  48. Miller, R. N., Ghil, M. & Gauthiez, F. Advanced data assimilation in strongly nonlinear dynamical systems. Journal of the Atmospheric Sciences 51(8), 1037–1056 (1994).

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the adoption of the python codes from Patrick Loeber in this work, and the open-source Python (3.9.5), PyTorch, Pygame, PyCharm (2021.1.1 Community Edition), Linux, Cygwin, Fortran, NCAR (National Center for Atmospheric Research, USA) Graphics, and ffmpeg for this work. All figures produced in this work were produced based on the NCAR graphics and the Fortran codes.

Funding

This work was funded by the National Science and Technology Council under the grant MOST 107-2111-M-008-027-.

Author information

Authors and Affiliations

  1. Department of Atmospheric Sciences, National Central University, Chung-Li, Taiwan

    Kuo-Ying Wang

Authors
  1. Kuo-Ying Wang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

KYW researched, coded, wrote, and reviewed the manuscript.

Corresponding author

Correspondence to Kuo-Ying Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, KY. Visualising backward information propagation in deep reinforcement learning from a variational data assimilation perspective. Sci Rep (2026). https://doi.org/10.1038/s41598-026-42086-x

Download citation

  • Received: 15 December 2025

  • Accepted: 24 February 2026

  • Published: 02 March 2026

  • DOI: https://doi.org/10.1038/s41598-026-42086-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics