Abstract
Deep reinforcement learning (RL) has been successful in a variety of domains but has not yet been directly used to learn biological tasks by interacting with a living nervous system. As proof of principle, we show how to create such a hybrid system trained on a target-finding task. Using optogenetics, we interfaced the nervous system of the nematode Caenorhabditis elegans with a deep RL agent. Agents adapted to strikingly different sites of neural integration and learned site-specific activations to guide animals towards a target, including in cases where agents interfaced with sets of neurons with previously uncharacterized responses to optogenetic modulation. Agents were analysed by plotting their learned policies to understand how different sets of neurons were used to guide movement. Further, the animal and agent generalized to new environments using the same learned policies in food-search tasks, showing that the system achieved cooperative computation rather than the agent acting as a controller for a soft robot. Our system demonstrates that deep RL is a viable tool both for learning how neural circuits can produce goal-directed behaviour and for improving biologically relevant behaviour in a flexible way.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
All processed animal tracks used to generate figures are available at https://github.com/ccli3896/RLWorms.git (ref. 63).
Code availability
Analysis code and training code are available at https://github.com/ccli3896/RLWorms.git (ref. 63) and https://doi.org/10.5281/zenodo.11002033 (ref. 65).
References
Romano, D., Donati, E., Benelli, G. & Stefanini, C. A review on animal–robot interaction: from bio-hybrid organisms to mixed societies. Biol. Cybern. 113, 201–225 (2019).
Tankus, A., Fried, I. & Shoham, S. Cognitive-motor brain–machine interfaces. J. Physiol. Paris 108, 38–44 (2014).
Bostrom, N. & Sandberg, A. Cognitive enhancement: methods, ethics, regulatory challenges. Sci. Eng. Ethics 15, 311–341 (2009).
Afraz, S.-R., Kiani, R. & Esteky, H. Microstimulation of inferotemporal cortex influences face categorization. Nature 442, 692–695 (2006).
Bonizzato, M. & Martinez, M. An intracortical neuroprosthesis immediately alleviates walking deficits and improves recovery of leg control after spinal cord injury. Sci. Transl. Med. 13, eabb4422 (2021).
Enriquez-Geppert, S., Huster, R. J. & Herrmann, C. S. Boosting brain functions: Improving executive functions with behavioral training, neurostimulation, and neurofeedback. Int. J. Psychophysiol. 88, 1–16 (2013).
Iturrate, I., Pereira, M., Millán, J. & del, R. Closed-loop electrical neurostimulation: challenges and opportunities. Curr. Opin. Biomed. Eng. 8, 28–37 (2018).
Lafer-Sousa, R. et al. Behavioral detectability of optogenetic stimulation of inferior temporal cortex varies with the size of concurrently viewed objects. Curr. Res. Neurobiol. 4, 100063 (2023).
Lu, Y. et al. Optogenetically induced spatiotemporal gamma oscillations and neuronal spiking activity in primate motor cortex. J. Neurophysiol. 113, 3574–3587 (2015).
Salzman, D. C., Britten, K. H. & Newsome, W. T. Cortical microstimulation influences perceptual judgements of motion direction. Nature 346, 174–177 (1990).
Schild, L. C. & Glauser, D. A. Dual color neural activation and behavior control with Chrimson and CoChR in Caenorhabditis elegans. Genetics 200, 1029–1034 (2015).
Xu, J. et al. Thalamic stimulation improves postictal cortical arousal and behavior. J. Neurosci. 40, 7343–7354 (2020).
Park, S.-G. et al. Medial preoptic circuit induces hunting-like actions to target objects and prey. Nat. Neurosci. 21, 364–372 (2018).
Yang, J., Huai, R., Wang, H., Lv, C. & Su, X. A robo-pigeon based on an innovative multi-mode telestimulation system. Biomed. Mater. Eng. 26, S357–S363 (2015).
Holzer, R. & Shimoyama, I. Locomotion control of a bio-robotic system via electric stimulation. In Proc. Institute of Electrical and Electronics Engineers/Robotics Society of Japan International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications 1514–1519 (IEEE, 1997).
Talwar, S. K. et al. Rat navigation guided by remote control. Nature 417, 37–38 (2002).
Sato, H. et al. A cyborg beetle: insect flight control through an implantable, tetherless microsystem. In Proc. 21st Institute of Electrical and Electronics Engineers International Conference on Micro Electro Mechanical Systems 164–167 (IEEE, 2008); https://doi.org/10.1109/MEMSYS.2008.4443618
Peckham, P. H. & Knutson, J. S. Functional electrical stimulation for neuromuscular applications. Annu. Rev. Biomed. Eng. 7, 327–360 (2005).
Kashin, S. M., Feldman, A. G. & Orlovsky, G. N. Locomotion of fish evoked by electrical stimulation of the brain. Brain Res. 82, 41–47 (1974).
Hinterwirth, A. J. et al. Wireless stimulation of antennal muscles in freely flying Hawkmoths leads to flight path changes. PLoS ONE 7, e52725 (2012).
Sanchez, C. J. et al. Locomotion control of hybrid cockroach robots. J. R. Soc. Interface 12, 20141363 (2015).
Bergmann, E., Gofman, X., Kavushansky, A. & Kahn, I. Individual variability in functional connectivity architecture of the mouse brain. Commun. Biol. 3, 1–10 (2020).
Mueller, S. et al. Individual variability in functional connectivity architecture of the human brain. Neuron 77, 586–595 (2013).
Husson, S. J., Gottschalk, A. & Leifer, A. M. Optogenetic manipulation of neural activity in C. elegans: from synapse to circuits and behaviour. Biol. Cell 105, 235–250 (2013).
Nagel, G. et al. Channelrhodopsin-2, a directly light-gated cation-selective membrane channel. Proc. Natl Acad. Sci. USA 100, 13940–13945 (2003).
Kocabas, A., Shen, C.-H., Guo, Z. V. & Ramanathan, S. Controlling interneuron activity in Caenorhabditis elegans to evoke chemotactic behaviour. Nature 490, 273–277 (2012).
Leifer, A. M., Fang-Yen, C., Gershow, M., Alkema, M. J. & Samuel, A. D. T. Optogenetic manipulation of neural activity in freely moving Caenorhabditis elegans. Nat. Methods 8, 147–152 (2011).
Wen, Q. et al. Proprioceptive coupling within motor neurons drives C. elegans forward locomotion. Neuron 76, 750–761 (2012).
Hernandez-Nunez, L. et al. Reverse-correlation analysis of navigation dynamics in Drosophila larva using optogenetics. eLife 4, e06225 (2015).
Donnelly, J. L. et al. Monoaminergic orchestration of motor programs in a complex C. elegans behavior. PLoS Biol. 11, e1001529 (2013).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at http://arxiv.org/abs/1912.06680 (2019).
Wurman, P. R. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022).
Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).
Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021).
Haydari, A. & Yılmaz, Y. Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans. Intell. Transp. Syst. 23, 11–32 (2022).
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. 35th International Conference on Machine Learning 1861–1870 (PMLR, 2018).
Yang, X., Jiang, X.-L., Su, Z.-L. & Wang, B. Cyborg moth flight control based on fuzzy deep learning. Micromachines 13, 611 (2022).
Ariyanto, M., Refat, C. M. M., Hirao, K. & Morishima, K. Movement optimization for a cyborg cockroach in a bounded space incorporating machine learning. Cyborg Bionic Syst. 4, 0012 (2023).
Zheng, N. et al. Real-time and precise insect flight control system based on virtual reality. Electron. Lett. 53, 387–389 (2017).
Zheng, N. et al. Abdominal-waving control of tethered bumblebees based on sarsa with transformed reward. IEEE Trans. Cybern. 49, 3064–3073 (2019).
Ardiel, E. L. & Rankin, C. H. An elegant mind: learning and memory in Caenorhabditis elegans. Learn. Mem. 17, 191–201 (2010).
Kim, J. & Shlizerman, E. Deep reinforcement learning for neural control. Preprint at https://arxiv.org/abs/2006.07352 (2020).
Christodoulou, P. Soft actor-critic for discrete action settings. Preprint at https://arxiv.org/abs/1910.07207 (2019).
Wong, C.-C., Chien, S.-Y., Feng, H.-M. & Aoyama, H. Motion planning for dual-arm robot based on soft actor-critic. IEEE Access 9, 26871–26885 (2021).
Sarma, G. P. et al. OpenWorm: overview and recent advances in integrative biological simulation of Caenorhabditis elegans. Phil. Trans. R. Soc. B 373, 20170382 (2018).
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 60 (2019).
Nikishin, E. et al. Improving stability in deep reinforcement learning with weight averaging. Uncertainty in Artificial Intelligence Workshop on Uncertainty in Deep Learning (2018).
Stable Baselines 2.10.2 documentation. Reinforcement Learning Resources https://stable-baselines.readthedocs.io/en/master/guide/rl.html (2021).
Bhardwaj, A., Thapliyal, S., Dahiya, Y. & Babu, K. FLP-18 functions through the G-protein-coupled receptors NPR-1 and NPR-4 to modulate reversal length in Caenorhabditis elegans. J. Neurosci. 38, 4641–4654 (2018).
Riddle, D. L., Blumenthal, T., Meyer, B. J. & Priess, J. R. Mechanosensory Control of Locomotion. C. elegans II 2nd edn (Cold Spring Harbor Laboratory Press, 1997).
Brandt, R., Gergou, A., Wacker, I., Fath, T. & Hutter, H. A Caenorhabditis elegans model of tau hyperphosphorylation: induction of developmental defects by transgenic overexpression of Alzheimer’s disease-like modified tau. Neurobiol. Aging 30, 22–33 (2009).
Jospin, M. et al. A neuronal acetylcholine receptor regulates the balance of muscle excitation and inhibition in Caenorhabditis elegans. PLoS Biol. 7, e1000265 (2009).
Hollenstein, J., Auddy, S., Saveriano, M., Renaudo, E. & Piater, J. Action noise in off-policy deep reinforcement learning: Impact on exploration and performance. Transactions on Machine Learning Research (2022); https://openreview.net/forum?id=NljBlZ6hmG
Andersen, R. A., Aflalo, T., Bashford, L., Bjånes, D. & Kellis, S. Exploring cognition with brain–machine interfaces. Annu. Rev. Psychol. 73, 131–158 (2022).
Sussillo, D., Stavisky, S. D., Kao, J. C., Ryu, S. I. & Shenoy, K. V. Making brain–machine interfaces robust to future neural variability. Nat. Commun. 7, 1–13 (2016).
Dong, X. et al. Toward a living soft microrobot through optogenetic locomotion control of Caenorhabditis elegans. Sci. Robot. 6, eabe3950 (2021).
Tandon, P. pytorch-soft-actor-critic. GitHub https://github.com/pranz24/pytorch-soft-actor-critic (2022).
Li, C. RLWorms. GitHub https://github.com/ccli3896/RLWorms.git (2024).
Kazemipour, A. Discrete SAC PyTorch, GitHub, https://github.com/alirezakazemipour/Discrete-SAC-PyTorch (2020).
Li, C. RLWorms. Zenodo https://doi.org/10.5281/zenodo.11002033 (2024).
Acknowledgements
We thank S. Bhupatiraju for discussions about RL and comments on the manuscript. We thank T. Hallacy and A. Yonar for guidance in C. elegans experiments and C. McCartan for input on statistical analyses. We would like to thank Dr. Jeffrey Lee for providing us with customized high power LED light sources. We thank K. Blum, C. Pehlevan, G. Anand, A. Bacanu, B. Brissette, D. Hidalgo, R. Huang, H. Megale, W. Weiter, Y. Ilker Yaman, V. Zhuang and S. Zwick for comments on the manuscript. This work was supported in part by National Institute of General Medical Sciences grant no. 1R01NS117908-01 (S.R.), the Dean’s Competitive Fund from Harvard University (S.R., C.L.), National Institutes of Health grant no. R01EY026025 (G.K.), the Fetzer Foundation (G.K.) and a National Science Foundation Graduate Research Fellowship Program fellowship (C.L.).
Author information
Authors and Affiliations
Contributions
All the authors designed the study. C.L. wrote code, performed experiments and did data analysis. All the authors wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Artur Luczak and Greg Wayne for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–6, Tables 1 and 2 and Videos 1–6.
Supplementary Video 1
Line 1 with trained agent. The video is speeded up by 8× and shows that the animal is led to the target by the trained RL agent when coupled to neurons in Line 1. The target is a red circle, and light flashes are denoted by a blue frame in this and all subsequent supplementary videos.
Supplementary Video 2
Line 1 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations in Fig. 2g, calculated to be 0.4647. The video is speeded up by 8×. When random flashes of light activate the neurons in Line 1, the animal is unable to reach the target.
Supplementary Video 3
Line 2 with trained agent. The video is speeded up by 8× and shows that after training, unlike in the control (Video 4), the RL agent excited neurons it was coupled to in Line 2 to direct the animal to the target.
Supplementary Video 4
Line 2 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations for Line 2 in Fig. 3b, calculated to be 0.2896. The video is speeded up by 8×. When random light flashes are used to activate neurons in Line 2 (see Fig. 3a and Supplementary Table 1 for line details), the animal is unable to reach the target.
Supplementary Video 5
Line 3 with trained agent. The video is speeded up by 8× and shows that the trained RL agent is able to learn appropriate patterns of light flashes based on the animal’s posture to lead it to the target.
Supplementary Video 6
Line 3 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations for Line 3 in Fig. 3b, calculated to be 0.3844. The video is speeded up by 8× and again, with random flashes of light inhibiting neurons in Line 3, the animal does not reach the target.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, C., Kreiman, G. & Ramanathan, S. Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks. Nat Mach Intell 6, 726–738 (2024). https://doi.org/10.1038/s42256-024-00854-2
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s42256-024-00854-2
This article is cited by
-
Credible inferences in microbiome research: ensuring rigour, reproducibility and relevance in the era of AI
Nature Reviews Gastroenterology & Hepatology (2025)


