Learning vision-based agile flight via differentiable physics

Zhang, Yuang; Hu, Yu; Song, Yunlong; Zou, Danping; Lin, Weiyao

doi:10.1038/s42256-025-01048-0

Article
Published: 16 June 2025

Learning vision-based agile flight via differentiable physics

Nature Machine Intelligence volume 7, pages 954–966 (2025)Cite this article

5575 Accesses
6 Citations
160 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Autonomous aerial robot swarms promise transformative applications, from planetary exploration to search and rescue in complex environments. However, navigating these swarms efficiently in unknown and cluttered spaces without bulky sensors, heavy computation or constant communication between robots remains a major research problem. This paper introduces an end-to-end approach that combines deep learning with first-principles physics through differentiable simulation to enable autonomous navigation by several aerial robots through complex environments at high speed. Our approach directly optimizes a neural network control policy by backpropagating loss gradients through the robot simulation using a simple point-mass physics model. Despite this simplicity, our method excels in both multi-agent and single-agent applications. In multi-agent scenarios, our system demonstrates self-organized behaviour, which enables autonomous coordination without communication or centralized planning. In single-agent scenarios, our system achieved a 90% success rate in navigating through complex unknown environments and demonstrated enhanced robustness compared to previous state-of-the-art approaches. Our system can operate without state estimation and adapt to dynamic obstacles. In real-world forest environments, it navigates at speeds of up to 20 m s⁻¹, doubling the speed of previous imitation-learning-based solutions. Notably, all these capabilities are deployed on a budget-friendly US$21 computer, which costs less than 5% of the GPU-equipped board used in existing systems.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Vision-based agile swarm navigation through cluttered environments using an end-to-end neural network controller trained with differentiable physics.**

**Fig. 2: High-speed flight through complex and dynamic environments using a tiny US$21 low-cost computer.**

**Fig. 3: Communication-free swarm navigation.**

**Fig. 4: End-to-end vision-based flight without an explicit odometry module.**

**Fig. 5: Comparison to state-of-the-art vision-based navigation.**

Synergistic morphology and feedback control for traversal of unknown compliant obstacles with aerial robots

Article Open access 26 March 2024

A bio-inspired adjustable posture quadruped robot with laterally undulating spine for terradynamically challenging environments

Article Open access 25 July 2025

Predictive control of aerial swarms in cluttered environments

Article 17 May 2021

Data availability

The videos of flight recordings of the real-world experiments and collision and flight time data in simulated experiments are available on figshare at https://doi.org/10.6084/m9.figshare.26298379 (ref. ⁷²).

Code availability

The code for single- and multi-agent training and simulation experiments is available on Zenodo at https://doi.org/10.5281/zenodo.15250256 (ref. ⁷³).

References

Schedl, D. C., Kurmi, I. & Bimber, O. An autonomous drone for search and rescue in forests using airborne optical sectioning. Sci. Robot. 6, 1188 (2021).
Article Google Scholar
Xing, J., Cioffi, G., Hidalgo-Carrió, J. & Scaramuzza, D. Autonomous power line inspection with drones via perception-aware MPC. In Proc. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 1086–1093 (IEEE, 2023).
Sage, A. T. et al. Testing the delivery of human organ transportation with drones in the real world. Sci. Robot. 7, 5798 (2022).
Article Google Scholar
Giusti, A. et al. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 1, 661–667 (2015).
Article Google Scholar
Gao, F. et al. Teach-repeat-replan: a complete and robust system for aggressive flight in complex environments. IEEE Trans. Robot. 36, 1526–1545 (2020).
Article Google Scholar
Zhou, X., Wang, Z., Ye, H., Xu, C. & Gao, F. Ego-planner: an ESDF-free gradient-based local planner for quadrotors. IEEE Robot. Autom. Lett. 6, 478–485 (2020).
Article Google Scholar
Zhou, X. et al. Swarm of micro flying robots in the wild. Sci. Robot. 7, 5954 (2022).
Article Google Scholar
Loquercio, A. et al. Learning high-speed flight in the wild. Sci. Robot. 6, 5810 (2021).
Article Google Scholar
Zhang, Z. & Scaramuzza, D. Perception-aware receding horizon navigation for MAVs. In Proc. 2018 IEEE International Conference on Robotics and Automation (ICRA) 2534–2541 (IEEE, 2018).
Maimone, M. W., Leger, P. C. & Biesiadecki, J. J. Overview of the Mars Exploration Rovers’ autonomous mobility and vision capabilities. In Proc. IEEE International Conference on Robotics and Automation (ICRA) Space Robotics Workshop (IEEE, 2007).
Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4, 5872 (2019).
Article Google Scholar
Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7, 2822 (2022).
Article Google Scholar
Choi, S. et al. Learning quadrupedal locomotion on deformable terrain. Sci. Robot. 8, 2256 (2023).
Article Google Scholar
Song, Y., Romero, A., Mueller, M., Koltun, V. & Scaramuzza, D. Reaching the limit in autonomous racing: optimal control versus reinforcement learning. Sci. Robot. https://doi.org/10.1126/scirobotics.adg1462 (2023).
Kaufmann, E. et al. Deep drone acrobatics. In Proc. Robotics: Science and Systems (eds Toussaint, M. et al.) (RSS Foundation, 2020).
Song, Y., Shi, K., Penicka, R. & Scaramuzza, D. Learning perception-aware agile flight in cluttered environments. In Proc. 2023 IEEE International Conference on Robotics and Automation (ICRA) 1989–1995 (IEEE, 2023).
Sadeghi, F. & Levine, S. CAD2RL: real single-image flight without a single real image. In Proc. Robotics: Science and Systems XIII (eds Amato, A. et al.) (RSS Foundation, 2017).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).
Foehn, P. et al. Agilicious: open-source and open-hardware agile quadrotor for vision-based flight. Sci. Robot. 7, 6259 (2022).
Article Google Scholar
Shahzad, M. M. et al. A review of swarm robotics in a nutshell. Drones 7, 269 (2023).
Article Google Scholar
Kegeleirs, M., Grisetti, G. & Birattari, M. Swarm SLAM: challenges and perspectives. Front. Robot. AI 8, 618268 (2021).
Article Google Scholar
Delmerico, J., Cieslewski, T., Rebecq, H., Faessler, M. & Scaramuzza, D. Are we ready for autonomous drone racing? The UZH-FPV drone racing dataset. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6713–6719 (IEEE, 2019).
Cioffi, G., Bauersfeld, L., Kaufmann, E. & Scaramuzza, D. Learned inertial odometry for autonomous drone racing. IEEE Robot. Autom. Lett. 8, 2684–2691 (2023).
Article Google Scholar
Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987 (2023).
Article Google Scholar
Qin, T., Li, P. & Shen, S. VINS-Mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34, 1004–1020 (2018).
Article Google Scholar
Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision (eds Cucchiara, R. et al.) 618–626 (IEEE, 2017).
Song, Y., Naji, S., Kaufmann, E., Loquercio, A. & Scaramuzza, D. Flightmare: a flexible quadrotor simulator. In Proc. Conference on Robot Learning (eds Faust, A. et al.) 1147–1157 (PMLR, 2021).
Zhou, B., Gao, F., Wang, L., Liu, C. & Shen, S. Robust and efficient quadrotor trajectory generation for fast autonomous flight. IEEE Robot. Autom. Lett. 4, 3529–3536 (2019).
Article Google Scholar
Florence, P., Carter, J. & Tedrake, R. Integrated perception and control at high speed: evaluating collision avoidance maneuvers without maps. In Proc. 12th Workshop on the Algorithmic Foundations of Robotics (eds Goldberg, K. et al.) 304–319 (Springer, 2020).
Shah, S., Dey, D., Lovett, C. & Kapoor, A. in Field and Service Robotics (eds Hutter, M. & Siegwart, R.) Ch. 40 (Springer, 2017).
Ross, S., Gordon, G. & Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proc. 14th International Conference on Artificial Intelligence and Statistics (eds Gordon, G. et al.) 627–635 (PMLR, 2011).
Gurumurthy, S., Kolter, J. Z. & Manchester, Z. Deep off-policy iterative learning control. In Proc. 5th Annual Learning for Dynamics & Control Conference (eds Matni, N. et al.) 639–652 (PMLR, 2023).
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moschitti, A. et al.) 1724–1734 (Association for Computational Linguistics, 2014).
Suh, H. J., Simchowitz, M., Zhang, K. & Tedrake, R. Do differentiable simulators give better policy gradients? In Proc. International Conference on Machine Learning (eds Chaudhuri, K. et al.) 20668–20696 (PMLR, 2022).
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. International Conference on Machine Learning (eds Dy, J. et al.) 1861–1870 (PMLR, 2018).
Metz, L., Freeman, C. D., Schoenholz, S. S. & Kachman, T. Gradients are not all you need. Preprint at https://arxiv.org/abs/2111.05803 (2021).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Preprint at https://arxiv.org/abs/1912.01703 (2019).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Article Google Scholar
O’Connell, M. et al. Neural-fly enables rapid learning for agile flight in strong winds. Sci. Robot. 7, 6597 (2022).
Article Google Scholar
Todorov, E., Erez, T. & Tassa, Y. Mujoco: a physics engine for model-based control. In Proc. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (eds Guglielmelli, G. et al.) 5026–5033 (IEEE, 2012).
Faessler, M., Franchi, A. & Scaramuzza, D. Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot. Autom. Lett. 3, 620–626 (2018).
Article Google Scholar
Hu, Y. et al. Seeing through pixel motion: learning obstacle avoidance from optical flow with one camera. IEEE Robot. Autom. Lett. 10, 5871–5878 (2024).
Article Google Scholar
Girshick, R. Fast R-CNN. In Proc. 2015 IEEE International Conference on Computer Vision (ICCV) (eds Ikeuchi, K. et al.) 1440–1448 (IEEE, 2015).
Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C. & Garcia, R. Incorporating second-order functional knowledge for better option pricing. In Proc. 13th Conference on Neural Information Processing Systems (eds Leen, T. et al.) 451–457 (MIT Press, 2000).
Bengio, Y., Simard, P. & Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5, 157–166 (1994).
Article Google Scholar
Shi, B., Bai, X. & Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2016).
Article Google Scholar
Scaramuzza, D. et al. Vision-controlled micro flying robots: from system design to autonomous navigation and mapping in GPS-denied environments. IEEE Robot. Autom. Mag. 21, 26–40 (2014).
Article Google Scholar
Liu, S., Mohta, K., Atanasov, N. & Kumar, V. Search-based motion planning for aggressive flight in SE(3). IEEE Robot. Autom. Lett. 3, 2439–2446 (2018).
Article Google Scholar
Neunert, M. et al. Fast nonlinear model predictive control for unified trajectory optimization and tracking. In Proc. 2016 IEEE International Conference on Robotics and Automation (ICRA) (eds Bicchi, A. & De Luca, A.) 1398–1404 (IEEE, 2016).
Falanga, D., Foehn, P., Lu, P. & Scaramuzza, D. Pampc: perception-aware model predictive control for quadrotors. In Proc. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (ed. Maciejewski, A. A.) 1–8 (IEEE, 2018).
Ji, J., Wang, Z., Wang, Y., Xu, C. & Gao, F. Mapless-planner: a robust and fast planning framework for aggressive autonomous flight without map fusion. In Proc. 2021 IEEE International Conference on Robotics and Automation (ICRA) 6315–6321 (IEEE, 2021).
Gao, F., Wu, W., Gao, W. & Shen, S. Flying on point clouds: online trajectory generation and autonomous navigation for quadrotors in cluttered environments. J. Field Robot. 36, 710–733 (2018).
Tordesillas, J. & How, J. P. Mader: trajectory planner in multiagent and dynamic environments. IEEE Trans. Robot. 38, 463–476 (2021).
Article Google Scholar
Wang, W. et al. Tartanair: a dataset to push the limits of visual SLAM. In Proc. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 4909–4916 (IEEE, 2020).
Teed, Z. & Deng, J. DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras. In Proc. 34th Conference on Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 16558–16569 (Curran Associates, 2021).
Vorbach, C., Hasani, R., Amini, A., Lechner, M. & Rus, D. Causal navigation by continuous-time neural networks. In Proc. 34th Conference on Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 12425–12440 (Curran Associates, 2021).
Li, G. et al. Oil: observational imitation learning. Preprint at https://arxiv.org/abs/1803.01129 (2019).
Kaufmann, E. et al. Deep drone racing: learning agile flight in dynamic environments. In Proc. Conference on Robot Learning (eds Billard, A. et al.) 133–145 (PMLR, 2018).
Loquercio, A. et al. Deep drone racing: from simulation to reality with domain randomization. IEEE Trans. Robot. 36, 1–14 (2019).
Article Google Scholar
Wang, T. & Chang, D. E. Robust navigation for racing drones based on imitation learning and modularization. In Proc. 2021 IEEE International Conference on Robotics and Automation (ICRA) 13724–13730 (IEEE, 2021).
Fu, J., Song, Y., Wu, Y., Yu, F. & Scaramuzza, D. Learning deep sensorimotor policies for vision-based autonomous drone racing. In Proc. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 5243–5250 (IEEE, 2023).
Xing, J., Romero, A., Bauersfeld, L. & Scaramuzza, D. Bootstrapping reinforcement learning with imitation for vision-based agile flight. In Proc. 8th Annual Conference on Robot Learning (2024).
Bhattacharya, A. et al. Vision transformers for end-to-end vision-based quadrotor obstacle avoidance. In Proc. 2025 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2025).
Liang, J. & Lin, M. C. Differentiable physics simulation. In Proc. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (eds Mohamed, S. et al.) (2020).
Hu, Y. et al. Difftaichi: differentiable programming for physical simulation. In Proc. International Conference on Learning Representations (2020).
Hu, Y. et al. Chainqueen: a real-time differentiable physical simulator for soft robotics. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6265–6271 (IEEE, 2019).
Bern, J. M., Schnider, Y., Banzet, P., Kumar, N. & Coros, S. Soft robot control with a learned differentiable model. In Proc. 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft) (ed. Tolley, M.) 417–423 (IEEE, 2020).
Ren, J. et al. Diffmimic: efficient motion mimicking with differentiable physics. In Proc. 11th International Conference on Learning Representations (2023).
Jatavallabhula, K. M. et al. gradsim: differentiable simulation for system identification and visuomotor control. In Proc. International Conference on Learning Representations (ICLR) (eds Oh, A. et al.) (2021).
Song, Y., Kim, S. & Scaramuzza, D. Learning quadruped locomotion using differentiable simulation. In Proc. 8th Annual Conference on Robot Learning (2024).
Schwarke, C., Klemm, V., Tordesillas, J., Sleiman, J.-P. & Hutter, M. Learning quadrupedal locomotion via differentiable simulation. Preprint at https://arxiv.org/abs/2404.02887 (2024)
Zhang, Y. & Hu, Y. Data for learning vision-based agile flight via differentiable physics. figshare https://doi.org/10.6084/m9.figshare.26298379 (2025).
Zhang, Y. & Hu, Y. Code for learning vision-based agile flight via differentiable physics. Zenodo https://doi.org/10.5281/zenodo.15250256 (2025).

Download references

Acknowledgements

This project is supported by the National Natural Science Foundation of China (Grant Nos 62325109 and U21B2013 to W.L. and Grant No. 62073214 to D.Z.). We thank SJTU SEIEE ⋅ G60 Yun Zhi AI Innovation and Application Research Center for indoor experiment support, J. Li for initial efforts in the multi-agent experiments, and L. Zhang and F. Yu for helping with the experiments and the valuable discussions.

Author information

These authors contributed equally: Yuang Zhang, Yu Hu, Yunlong Song.

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Yuang Zhang, Yu Hu, Danping Zou & Weiyao Lin
University of Zurich, Zurich, Switzerland
Yunlong Song

Authors

Yuang Zhang
View author publications
Search author on:PubMed Google Scholar
Yu Hu
View author publications
Search author on:PubMed Google Scholar
Yunlong Song
View author publications
Search author on:PubMed Google Scholar
Danping Zou
View author publications
Search author on:PubMed Google Scholar
Weiyao Lin
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Z. formulated the main ideas, implemented the system, performed the experiments, analysed the data and wrote the paper. Y.H. implemented the system, performed the experiments, produced the data visualization and wrote the paper. Y.S. contributed to project conception, data visualization and writing the paper. D.Z. contributed to the project conception and revised the paper. W.L. directed the research, contributed to the paper writing and provided funding.

Corresponding authors

Correspondence to Danping Zou or Weiyao Lin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Pratik Kunapuli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Method overview: learning vision-based agile flight via differentiable physics.

The differentiable simulator carries gradients from the output state directly to the control inputs, and hence, to the policy parameters. A single timestep consists of depth map rendering, action prediction, and quadrotor dynamics simulation. The training environment only includes randomly placed obstacles.

Extended Data Fig. 2 A computation graph of the physics simulation.

Temporal gradient decay mitigates gradient explosion. We illustrate the simplified model v_t = v_t−1 + a_t, p_t = p_t−1 + v_t.

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Figs. 1–5 and Tables 1–3.

Reporting Summary

Supplementary Video 1

Summary of results and methodology.

Supplementary Video 2

Original recordings of swarm experiments.

Supplementary Video 3

Success rate tests in a dynamic environment.

Supplementary Video 4

Policy learning within the differentiable-physics simulation environment.

Supplementary Video 5

Multi-view playback of the indoor vision-based swarm experiment.

Supplementary Video 6

Extended multi-agent experiment. An agent navigates through a field of other agents.

Supplementary Video 7

Extended real-world odometry-free low-speed obstacle avoidance experiment.

Supplementary Video 8

Depth image and activation map from a sample flight.

Supplementary Video 9

Failure cases due to a limited field of view.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Hu, Y., Song, Y. et al. Learning vision-based agile flight via differentiable physics. Nat Mach Intell 7, 954–966 (2025). https://doi.org/10.1038/s42256-025-01048-0

Download citation

Received: 15 July 2024
Accepted: 01 May 2025
Published: 16 June 2025
Issue date: June 2025
DOI: https://doi.org/10.1038/s42256-025-01048-0