Abstract
Autonomous aerial robot swarms promise transformative applications, from planetary exploration to search and rescue in complex environments. However, navigating these swarms efficiently in unknown and cluttered spaces without bulky sensors, heavy computation or constant communication between robots remains a major research problem. This paper introduces an end-to-end approach that combines deep learning with first-principles physics through differentiable simulation to enable autonomous navigation by several aerial robots through complex environments at high speed. Our approach directly optimizes a neural network control policy by backpropagating loss gradients through the robot simulation using a simple point-mass physics model. Despite this simplicity, our method excels in both multi-agent and single-agent applications. In multi-agent scenarios, our system demonstrates self-organized behaviour, which enables autonomous coordination without communication or centralized planning. In single-agent scenarios, our system achieved a 90% success rate in navigating through complex unknown environments and demonstrated enhanced robustness compared to previous state-of-the-art approaches. Our system can operate without state estimation and adapt to dynamic obstacles. In real-world forest environments, it navigates at speeds of up to 20 m s−1, doubling the speed of previous imitation-learning-based solutions. Notably, all these capabilities are deployed on a budget-friendly US$21 computer, which costs less than 5% of the GPU-equipped board used in existing systems.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 /Â 30Â days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The videos of flight recordings of the real-world experiments and collision and flight time data in simulated experiments are available on figshare at https://doi.org/10.6084/m9.figshare.26298379 (ref. 72).
Code availability
The code for single- and multi-agent training and simulation experiments is available on Zenodo at https://doi.org/10.5281/zenodo.15250256 (ref. 73).
References
Schedl, D. C., Kurmi, I. & Bimber, O. An autonomous drone for search and rescue in forests using airborne optical sectioning. Sci. Robot. 6, 1188 (2021).
Xing, J., Cioffi, G., Hidalgo-Carrió, J. & Scaramuzza, D. Autonomous power line inspection with drones via perception-aware MPC. In Proc. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 1086–1093 (IEEE, 2023).
Sage, A. T. et al. Testing the delivery of human organ transportation with drones in the real world. Sci. Robot. 7, 5798 (2022).
Giusti, A. et al. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 1, 661–667 (2015).
Gao, F. et al. Teach-repeat-replan: a complete and robust system for aggressive flight in complex environments. IEEE Trans. Robot. 36, 1526–1545 (2020).
Zhou, X., Wang, Z., Ye, H., Xu, C. & Gao, F. Ego-planner: an ESDF-free gradient-based local planner for quadrotors. IEEE Robot. Autom. Lett. 6, 478–485 (2020).
Zhou, X. et al. Swarm of micro flying robots in the wild. Sci. Robot. 7, 5954 (2022).
Loquercio, A. et al. Learning high-speed flight in the wild. Sci. Robot. 6, 5810 (2021).
Zhang, Z. & Scaramuzza, D. Perception-aware receding horizon navigation for MAVs. In Proc. 2018 IEEE International Conference on Robotics and Automation (ICRA) 2534–2541 (IEEE, 2018).
Maimone, M. W., Leger, P. C. & Biesiadecki, J. J. Overview of the Mars Exploration Rovers’ autonomous mobility and vision capabilities. In Proc. IEEE International Conference on Robotics and Automation (ICRA) Space Robotics Workshop (IEEE, 2007).
Hwangbo, J. et al. Learning agile and dynamic motor skills for legged robots. Sci. Robot. 4, 5872 (2019).
Miki, T. et al. Learning robust perceptive locomotion for quadrupedal robots in the wild. Sci. Robot. 7, 2822 (2022).
Choi, S. et al. Learning quadrupedal locomotion on deformable terrain. Sci. Robot. 8, 2256 (2023).
Song, Y., Romero, A., Mueller, M., Koltun, V. & Scaramuzza, D. Reaching the limit in autonomous racing: optimal control versus reinforcement learning. Sci. Robot. https://doi.org/10.1126/scirobotics.adg1462 (2023).
Kaufmann, E. et al. Deep drone acrobatics. In Proc. Robotics: Science and Systems (eds Toussaint, M. et al.) (RSS Foundation, 2020).
Song, Y., Shi, K., Penicka, R. & Scaramuzza, D. Learning perception-aware agile flight in cluttered environments. In Proc. 2023 IEEE International Conference on Robotics and Automation (ICRA) 1989–1995 (IEEE, 2023).
Sadeghi, F. & Levine, S. CAD2RL: real single-image flight without a single real image. In Proc. Robotics: Science and Systems XIII (eds Amato, A. et al.) (RSS Foundation, 2017).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).
Foehn, P. et al. Agilicious: open-source and open-hardware agile quadrotor for vision-based flight. Sci. Robot. 7, 6259 (2022).
Shahzad, M. M. et al. A review of swarm robotics in a nutshell. Drones 7, 269 (2023).
Kegeleirs, M., Grisetti, G. & Birattari, M. Swarm SLAM: challenges and perspectives. Front. Robot. AI 8, 618268 (2021).
Delmerico, J., Cieslewski, T., Rebecq, H., Faessler, M. & Scaramuzza, D. Are we ready for autonomous drone racing? The UZH-FPV drone racing dataset. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6713–6719 (IEEE, 2019).
Cioffi, G., Bauersfeld, L., Kaufmann, E. & Scaramuzza, D. Learned inertial odometry for autonomous drone racing. IEEE Robot. Autom. Lett. 8, 2684–2691 (2023).
Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987 (2023).
Qin, T., Li, P. & Shen, S. VINS-Mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34, 1004–1020 (2018).
Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision (eds Cucchiara, R. et al.) 618–626 (IEEE, 2017).
Song, Y., Naji, S., Kaufmann, E., Loquercio, A. & Scaramuzza, D. Flightmare: a flexible quadrotor simulator. In Proc. Conference on Robot Learning (eds Faust, A. et al.) 1147–1157 (PMLR, 2021).
Zhou, B., Gao, F., Wang, L., Liu, C. & Shen, S. Robust and efficient quadrotor trajectory generation for fast autonomous flight. IEEE Robot. Autom. Lett. 4, 3529–3536 (2019).
Florence, P., Carter, J. & Tedrake, R. Integrated perception and control at high speed: evaluating collision avoidance maneuvers without maps. In Proc. 12th Workshop on the Algorithmic Foundations of Robotics (eds Goldberg, K. et al.) 304–319 (Springer, 2020).
Shah, S., Dey, D., Lovett, C. & Kapoor, A. in Field and Service Robotics (eds Hutter, M. & Siegwart, R.) Ch. 40 (Springer, 2017).
Ross, S., Gordon, G. & Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proc. 14th International Conference on Artificial Intelligence and Statistics (eds Gordon, G. et al.) 627–635 (PMLR, 2011).
Gurumurthy, S., Kolter, J. Z. & Manchester, Z. Deep off-policy iterative learning control. In Proc. 5th Annual Learning for Dynamics & Control Conference (eds Matni, N. et al.) 639–652 (PMLR, 2023).
Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moschitti, A. et al.) 1724–1734 (Association for Computational Linguistics, 2014).
Suh, H. J., Simchowitz, M., Zhang, K. & Tedrake, R. Do differentiable simulators give better policy gradients? In Proc. International Conference on Machine Learning (eds Chaudhuri, K. et al.) 20668–20696 (PMLR, 2022).
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. International Conference on Machine Learning (eds Dy, J. et al.) 1861–1870 (PMLR, 2018).
Metz, L., Freeman, C. D., Schoenholz, S. S. & Kachman, T. Gradients are not all you need. Preprint at https://arxiv.org/abs/2111.05803 (2021).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Preprint at https://arxiv.org/abs/1912.01703 (2019).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
O’Connell, M. et al. Neural-fly enables rapid learning for agile flight in strong winds. Sci. Robot. 7, 6597 (2022).
Todorov, E., Erez, T. & Tassa, Y. Mujoco: a physics engine for model-based control. In Proc. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (eds Guglielmelli, G. et al.) 5026–5033 (IEEE, 2012).
Faessler, M., Franchi, A. & Scaramuzza, D. Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high-speed trajectories. IEEE Robot. Autom. Lett. 3, 620–626 (2018).
Hu, Y. et al. Seeing through pixel motion: learning obstacle avoidance from optical flow with one camera. IEEE Robot. Autom. Lett. 10, 5871–5878 (2024).
Girshick, R. Fast R-CNN. In Proc. 2015 IEEE International Conference on Computer Vision (ICCV) (eds Ikeuchi, K. et al.) 1440–1448 (IEEE, 2015).
Dugas, C., Bengio, Y., Bélisle, F., Nadeau, C. & Garcia, R. Incorporating second-order functional knowledge for better option pricing. In Proc. 13th Conference on Neural Information Processing Systems (eds Leen, T. et al.) 451–457 (MIT Press, 2000).
Bengio, Y., Simard, P. & Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5, 157–166 (1994).
Shi, B., Bai, X. & Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2016).
Scaramuzza, D. et al. Vision-controlled micro flying robots: from system design to autonomous navigation and mapping in GPS-denied environments. IEEE Robot. Autom. Mag. 21, 26–40 (2014).
Liu, S., Mohta, K., Atanasov, N. & Kumar, V. Search-based motion planning for aggressive flight in SE(3). IEEE Robot. Autom. Lett. 3, 2439–2446 (2018).
Neunert, M. et al. Fast nonlinear model predictive control for unified trajectory optimization and tracking. In Proc. 2016 IEEE International Conference on Robotics and Automation (ICRA) (eds Bicchi, A. & De Luca, A.) 1398–1404 (IEEE, 2016).
Falanga, D., Foehn, P., Lu, P. & Scaramuzza, D. Pampc: perception-aware model predictive control for quadrotors. In Proc. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (ed. Maciejewski, A. A.) 1–8 (IEEE, 2018).
Ji, J., Wang, Z., Wang, Y., Xu, C. & Gao, F. Mapless-planner: a robust and fast planning framework for aggressive autonomous flight without map fusion. In Proc. 2021 IEEE International Conference on Robotics and Automation (ICRA) 6315–6321 (IEEE, 2021).
Gao, F., Wu, W., Gao, W. & Shen, S. Flying on point clouds: online trajectory generation and autonomous navigation for quadrotors in cluttered environments. J. Field Robot. 36, 710–733 (2018).
Tordesillas, J. & How, J. P. Mader: trajectory planner in multiagent and dynamic environments. IEEE Trans. Robot. 38, 463–476 (2021).
Wang, W. et al. Tartanair: a dataset to push the limits of visual SLAM. In Proc. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 4909–4916 (IEEE, 2020).
Teed, Z. & Deng, J. DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras. In Proc. 34th Conference on Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 16558–16569 (Curran Associates, 2021).
Vorbach, C., Hasani, R., Amini, A., Lechner, M. & Rus, D. Causal navigation by continuous-time neural networks. In Proc. 34th Conference on Advances in Neural Information Processing Systems (eds Ranzato, M. et al.) 12425–12440 (Curran Associates, 2021).
Li, G. et al. Oil: observational imitation learning. Preprint at https://arxiv.org/abs/1803.01129 (2019).
Kaufmann, E. et al. Deep drone racing: learning agile flight in dynamic environments. In Proc. Conference on Robot Learning (eds Billard, A. et al.) 133–145 (PMLR, 2018).
Loquercio, A. et al. Deep drone racing: from simulation to reality with domain randomization. IEEE Trans. Robot. 36, 1–14 (2019).
Wang, T. & Chang, D. E. Robust navigation for racing drones based on imitation learning and modularization. In Proc. 2021 IEEE International Conference on Robotics and Automation (ICRA) 13724–13730 (IEEE, 2021).
Fu, J., Song, Y., Wu, Y., Yu, F. & Scaramuzza, D. Learning deep sensorimotor policies for vision-based autonomous drone racing. In Proc. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 5243–5250 (IEEE, 2023).
Xing, J., Romero, A., Bauersfeld, L. & Scaramuzza, D. Bootstrapping reinforcement learning with imitation for vision-based agile flight. In Proc. 8th Annual Conference on Robot Learning (2024).
Bhattacharya, A. et al. Vision transformers for end-to-end vision-based quadrotor obstacle avoidance. In Proc. 2025 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2025).
Liang, J. & Lin, M. C. Differentiable physics simulation. In Proc. ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations (eds Mohamed, S. et al.) (2020).
Hu, Y. et al. Difftaichi: differentiable programming for physical simulation. In Proc. International Conference on Learning Representations (2020).
Hu, Y. et al. Chainqueen: a real-time differentiable physical simulator for soft robotics. In Proc. 2019 International Conference on Robotics and Automation (ICRA) 6265–6271 (IEEE, 2019).
Bern, J. M., Schnider, Y., Banzet, P., Kumar, N. & Coros, S. Soft robot control with a learned differentiable model. In Proc. 2020 3rd IEEE International Conference on Soft Robotics (RoboSoft) (ed. Tolley, M.) 417–423 (IEEE, 2020).
Ren, J. et al. Diffmimic: efficient motion mimicking with differentiable physics. In Proc. 11th International Conference on Learning Representations (2023).
Jatavallabhula, K. M. et al. gradsim: differentiable simulation for system identification and visuomotor control. In Proc. International Conference on Learning Representations (ICLR) (eds Oh, A. et al.) (2021).
Song, Y., Kim, S. & Scaramuzza, D. Learning quadruped locomotion using differentiable simulation. In Proc. 8th Annual Conference on Robot Learning (2024).
Schwarke, C., Klemm, V., Tordesillas, J., Sleiman, J.-P. & Hutter, M. Learning quadrupedal locomotion via differentiable simulation. Preprint at https://arxiv.org/abs/2404.02887 (2024)
Zhang, Y. & Hu, Y. Data for learning vision-based agile flight via differentiable physics. figshare https://doi.org/10.6084/m9.figshare.26298379 (2025).
Zhang, Y. & Hu, Y. Code for learning vision-based agile flight via differentiable physics. Zenodo https://doi.org/10.5281/zenodo.15250256 (2025).
Acknowledgements
This project is supported by the National Natural Science Foundation of China (Grant Nos 62325109 and U21B2013 to W.L. and Grant No. 62073214 to D.Z.). We thank SJTU SEIEE â‹… G60 Yun Zhi AI Innovation and Application Research Center for indoor experiment support, J. Li for initial efforts in the multi-agent experiments, and L. Zhang and F. Yu for helping with the experiments and the valuable discussions.
Author information
Authors and Affiliations
Contributions
Y.Z. formulated the main ideas, implemented the system, performed the experiments, analysed the data and wrote the paper. Y.H. implemented the system, performed the experiments, produced the data visualization and wrote the paper. Y.S. contributed to project conception, data visualization and writing the paper. D.Z. contributed to the project conception and revised the paper. W.L. directed the research, contributed to the paper writing and provided funding.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Pratik Kunapuli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Method overview: learning vision-based agile flight via differentiable physics.
The differentiable simulator carries gradients from the output state directly to the control inputs, and hence, to the policy parameters. A single timestep consists of depth map rendering, action prediction, and quadrotor dynamics simulation. The training environment only includes randomly placed obstacles.
Extended Data Fig. 2 A computation graph of the physics simulation.
Temporal gradient decay mitigates gradient explosion. We illustrate the simplified model vt = vt−1 + at, pt = pt−1 + vt.
Supplementary information
Supplementary Information
Supplementary Notes 1–3, Figs. 1–5 and Tables 1–3.
Supplementary Video 1
Summary of results and methodology.
Supplementary Video 2
Original recordings of swarm experiments.
Supplementary Video 3
Success rate tests in a dynamic environment.
Supplementary Video 4
Policy learning within the differentiable-physics simulation environment.
Supplementary Video 5
Multi-view playback of the indoor vision-based swarm experiment.
Supplementary Video 6
Extended multi-agent experiment. An agent navigates through a field of other agents.
Supplementary Video 7
Extended real-world odometry-free low-speed obstacle avoidance experiment.
Supplementary Video 8
Depth image and activation map from a sample flight.
Supplementary Video 9
Failure cases due to a limited field of view.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Hu, Y., Song, Y. et al. Learning vision-based agile flight via differentiable physics. Nat Mach Intell 7, 954–966 (2025). https://doi.org/10.1038/s42256-025-01048-0
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01048-0