Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks

Abstract

Deep reinforcement learning (RL) has been successful in a variety of domains but has not yet been directly used to learn biological tasks by interacting with a living nervous system. As proof of principle, we show how to create such a hybrid system trained on a target-finding task. Using optogenetics, we interfaced the nervous system of the nematode Caenorhabditis elegans with a deep RL agent. Agents adapted to strikingly different sites of neural integration and learned site-specific activations to guide animals towards a target, including in cases where agents interfaced with sets of neurons with previously uncharacterized responses to optogenetic modulation. Agents were analysed by plotting their learned policies to understand how different sets of neurons were used to guide movement. Further, the animal and agent generalized to new environments using the same learned policies in food-search tasks, showing that the system achieved cooperative computation rather than the agent acting as a controller for a soft robot. Our system demonstrates that deep RL is a viable tool both for learning how neural circuits can produce goal-directed behaviour and for improving biologically relevant behaviour in a flexible way.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A system that integrates deep RL with the C. elegans neural network.
Fig. 2: The system learned to navigate the C. elegans line 1 to a target.
Fig. 3: The system could successfully navigate different optogenetic lines to a target.
Fig. 4: The system learned to navigate different optogenetic lines to a target with neuron-specific strategies.
Fig. 5: Agent policies can predict agent performance on other lines.
Fig. 6: Animals with agents can correct errors and generalize to new situations.

Similar content being viewed by others

Data availability

All processed animal tracks used to generate figures are available at https://github.com/ccli3896/RLWorms.git (ref. 63).

Code availability

Analysis code and training code are available at https://github.com/ccli3896/RLWorms.git (ref. 63) and https://doi.org/10.5281/zenodo.11002033 (ref. 65).

References

  1. Romano, D., Donati, E., Benelli, G. & Stefanini, C. A review on animal–robot interaction: from bio-hybrid organisms to mixed societies. Biol. Cybern. 113, 201–225 (2019).

    Google Scholar 

  2. Tankus, A., Fried, I. & Shoham, S. Cognitive-motor brain–machine interfaces. J. Physiol. Paris 108, 38–44 (2014).

    Google Scholar 

  3. Bostrom, N. & Sandberg, A. Cognitive enhancement: methods, ethics, regulatory challenges. Sci. Eng. Ethics 15, 311–341 (2009).

    Google Scholar 

  4. Afraz, S.-R., Kiani, R. & Esteky, H. Microstimulation of inferotemporal cortex influences face categorization. Nature 442, 692–695 (2006).

    Google Scholar 

  5. Bonizzato, M. & Martinez, M. An intracortical neuroprosthesis immediately alleviates walking deficits and improves recovery of leg control after spinal cord injury. Sci. Transl. Med. 13, eabb4422 (2021).

    Google Scholar 

  6. Enriquez-Geppert, S., Huster, R. J. & Herrmann, C. S. Boosting brain functions: Improving executive functions with behavioral training, neurostimulation, and neurofeedback. Int. J. Psychophysiol. 88, 1–16 (2013).

    Google Scholar 

  7. Iturrate, I., Pereira, M., Millán, J. & del, R. Closed-loop electrical neurostimulation: challenges and opportunities. Curr. Opin. Biomed. Eng. 8, 28–37 (2018).

    Google Scholar 

  8. Lafer-Sousa, R. et al. Behavioral detectability of optogenetic stimulation of inferior temporal cortex varies with the size of concurrently viewed objects. Curr. Res. Neurobiol. 4, 100063 (2023).

    Google Scholar 

  9. Lu, Y. et al. Optogenetically induced spatiotemporal gamma oscillations and neuronal spiking activity in primate motor cortex. J. Neurophysiol. 113, 3574–3587 (2015).

    Google Scholar 

  10. Salzman, D. C., Britten, K. H. & Newsome, W. T. Cortical microstimulation influences perceptual judgements of motion direction. Nature 346, 174–177 (1990).

    Google Scholar 

  11. Schild, L. C. & Glauser, D. A. Dual color neural activation and behavior control with Chrimson and CoChR in Caenorhabditis elegans. Genetics 200, 1029–1034 (2015).

    Google Scholar 

  12. Xu, J. et al. Thalamic stimulation improves postictal cortical arousal and behavior. J. Neurosci. 40, 7343–7354 (2020).

    Google Scholar 

  13. Park, S.-G. et al. Medial preoptic circuit induces hunting-like actions to target objects and prey. Nat. Neurosci. 21, 364–372 (2018).

    Google Scholar 

  14. Yang, J., Huai, R., Wang, H., Lv, C. & Su, X. A robo-pigeon based on an innovative multi-mode telestimulation system. Biomed. Mater. Eng. 26, S357–S363 (2015).

    Google Scholar 

  15. Holzer, R. & Shimoyama, I. Locomotion control of a bio-robotic system via electric stimulation. In Proc. Institute of Electrical and Electronics Engineers/Robotics Society of Japan International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications 1514–1519 (IEEE, 1997).

  16. Talwar, S. K. et al. Rat navigation guided by remote control. Nature 417, 37–38 (2002).

    Google Scholar 

  17. Sato, H. et al. A cyborg beetle: insect flight control through an implantable, tetherless microsystem. In Proc. 21st Institute of Electrical and Electronics Engineers International Conference on Micro Electro Mechanical Systems 164–167 (IEEE, 2008); https://doi.org/10.1109/MEMSYS.2008.4443618

  18. Peckham, P. H. & Knutson, J. S. Functional electrical stimulation for neuromuscular applications. Annu. Rev. Biomed. Eng. 7, 327–360 (2005).

    Google Scholar 

  19. Kashin, S. M., Feldman, A. G. & Orlovsky, G. N. Locomotion of fish evoked by electrical stimulation of the brain. Brain Res. 82, 41–47 (1974).

    Google Scholar 

  20. Hinterwirth, A. J. et al. Wireless stimulation of antennal muscles in freely flying Hawkmoths leads to flight path changes. PLoS ONE 7, e52725 (2012).

    Google Scholar 

  21. Sanchez, C. J. et al. Locomotion control of hybrid cockroach robots. J. R. Soc. Interface 12, 20141363 (2015).

    Google Scholar 

  22. Bergmann, E., Gofman, X., Kavushansky, A. & Kahn, I. Individual variability in functional connectivity architecture of the mouse brain. Commun. Biol. 3, 1–10 (2020).

    Google Scholar 

  23. Mueller, S. et al. Individual variability in functional connectivity architecture of the human brain. Neuron 77, 586–595 (2013).

    Google Scholar 

  24. Husson, S. J., Gottschalk, A. & Leifer, A. M. Optogenetic manipulation of neural activity in C. elegans: from synapse to circuits and behaviour. Biol. Cell 105, 235–250 (2013).

    Google Scholar 

  25. Nagel, G. et al. Channelrhodopsin-2, a directly light-gated cation-selective membrane channel. Proc. Natl Acad. Sci. USA 100, 13940–13945 (2003).

    Google Scholar 

  26. Kocabas, A., Shen, C.-H., Guo, Z. V. & Ramanathan, S. Controlling interneuron activity in Caenorhabditis elegans to evoke chemotactic behaviour. Nature 490, 273–277 (2012).

    Google Scholar 

  27. Leifer, A. M., Fang-Yen, C., Gershow, M., Alkema, M. J. & Samuel, A. D. T. Optogenetic manipulation of neural activity in freely moving Caenorhabditis elegans. Nat. Methods 8, 147–152 (2011).

    Google Scholar 

  28. Wen, Q. et al. Proprioceptive coupling within motor neurons drives C. elegans forward locomotion. Neuron 76, 750–761 (2012).

    Google Scholar 

  29. Hernandez-Nunez, L. et al. Reverse-correlation analysis of navigation dynamics in Drosophila larva using optogenetics. eLife 4, e06225 (2015).

    Google Scholar 

  30. Donnelly, J. L. et al. Monoaminergic orchestration of motor programs in a complex C. elegans behavior. PLoS Biol. 11, e1001529 (2013).

    Google Scholar 

  31. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Google Scholar 

  32. Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Google Scholar 

  33. Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020).

    Google Scholar 

  34. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Google Scholar 

  35. Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

    Google Scholar 

  36. Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at http://arxiv.org/abs/1912.06680 (2019).

  37. Wurman, P. R. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022).

    Google Scholar 

  38. Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).

    Google Scholar 

  39. Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021).

    Google Scholar 

  40. Haydari, A. & Yılmaz, Y. Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans. Intell. Transp. Syst. 23, 11–32 (2022).

    Google Scholar 

  41. Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. 35th International Conference on Machine Learning 1861–1870 (PMLR, 2018).

  42. Yang, X., Jiang, X.-L., Su, Z.-L. & Wang, B. Cyborg moth flight control based on fuzzy deep learning. Micromachines 13, 611 (2022).

    Google Scholar 

  43. Ariyanto, M., Refat, C. M. M., Hirao, K. & Morishima, K. Movement optimization for a cyborg cockroach in a bounded space incorporating machine learning. Cyborg Bionic Syst. 4, 0012 (2023).

    Google Scholar 

  44. Zheng, N. et al. Real-time and precise insect flight control system based on virtual reality. Electron. Lett. 53, 387–389 (2017).

    Google Scholar 

  45. Zheng, N. et al. Abdominal-waving control of tethered bumblebees based on sarsa with transformed reward. IEEE Trans. Cybern. 49, 3064–3073 (2019).

    Google Scholar 

  46. Ardiel, E. L. & Rankin, C. H. An elegant mind: learning and memory in Caenorhabditis elegans. Learn. Mem. 17, 191–201 (2010).

    Google Scholar 

  47. Kim, J. & Shlizerman, E. Deep reinforcement learning for neural control. Preprint at https://arxiv.org/abs/2006.07352 (2020).

  48. Christodoulou, P. Soft actor-critic for discrete action settings. Preprint at https://arxiv.org/abs/1910.07207 (2019).

  49. Wong, C.-C., Chien, S.-Y., Feng, H.-M. & Aoyama, H. Motion planning for dual-arm robot based on soft actor-critic. IEEE Access 9, 26871–26885 (2021).

    Google Scholar 

  50. Sarma, G. P. et al. OpenWorm: overview and recent advances in integrative biological simulation of Caenorhabditis elegans. Phil. Trans. R. Soc. B 373, 20170382 (2018).

    Google Scholar 

  51. Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 60 (2019).

    Google Scholar 

  52. Nikishin, E. et al. Improving stability in deep reinforcement learning with weight averaging. Uncertainty in Artificial Intelligence Workshop on Uncertainty in Deep Learning (2018).

  53. Stable Baselines 2.10.2 documentation. Reinforcement Learning Resources https://stable-baselines.readthedocs.io/en/master/guide/rl.html (2021).

  54. Bhardwaj, A., Thapliyal, S., Dahiya, Y. & Babu, K. FLP-18 functions through the G-protein-coupled receptors NPR-1 and NPR-4 to modulate reversal length in Caenorhabditis elegans. J. Neurosci. 38, 4641–4654 (2018).

    Google Scholar 

  55. Riddle, D. L., Blumenthal, T., Meyer, B. J. & Priess, J. R. Mechanosensory Control of Locomotion. C. elegans II 2nd edn (Cold Spring Harbor Laboratory Press, 1997).

  56. Brandt, R., Gergou, A., Wacker, I., Fath, T. & Hutter, H. A Caenorhabditis elegans model of tau hyperphosphorylation: induction of developmental defects by transgenic overexpression of Alzheimer’s disease-like modified tau. Neurobiol. Aging 30, 22–33 (2009).

    Google Scholar 

  57. Jospin, M. et al. A neuronal acetylcholine receptor regulates the balance of muscle excitation and inhibition in Caenorhabditis elegans. PLoS Biol. 7, e1000265 (2009).

    Google Scholar 

  58. Hollenstein, J., Auddy, S., Saveriano, M., Renaudo, E. & Piater, J. Action noise in off-policy deep reinforcement learning: Impact on exploration and performance. Transactions on Machine Learning Research (2022); https://openreview.net/forum?id=NljBlZ6hmG

  59. Andersen, R. A., Aflalo, T., Bashford, L., Bjånes, D. & Kellis, S. Exploring cognition with brain–machine interfaces. Annu. Rev. Psychol. 73, 131–158 (2022).

    Google Scholar 

  60. Sussillo, D., Stavisky, S. D., Kao, J. C., Ryu, S. I. & Shenoy, K. V. Making brain–machine interfaces robust to future neural variability. Nat. Commun. 7, 1–13 (2016).

    Google Scholar 

  61. Dong, X. et al. Toward a living soft microrobot through optogenetic locomotion control of Caenorhabditis elegans. Sci. Robot. 6, eabe3950 (2021).

    Google Scholar 

  62. Tandon, P. pytorch-soft-actor-critic. GitHub https://github.com/pranz24/pytorch-soft-actor-critic (2022).

  63. Li, C. RLWorms. GitHub https://github.com/ccli3896/RLWorms.git (2024).

  64. Kazemipour, A. Discrete SAC PyTorch, GitHub, https://github.com/alirezakazemipour/Discrete-SAC-PyTorch (2020).

  65. Li, C. RLWorms. Zenodo https://doi.org/10.5281/zenodo.11002033 (2024).

Download references

Acknowledgements

We thank S. Bhupatiraju for discussions about RL and comments on the manuscript. We thank T. Hallacy and A. Yonar for guidance in C. elegans experiments and C. McCartan for input on statistical analyses. We would like to thank Dr. Jeffrey Lee for providing us with customized high power LED light sources. We thank K. Blum, C. Pehlevan, G. Anand, A. Bacanu, B. Brissette, D. Hidalgo, R. Huang, H. Megale, W. Weiter, Y. Ilker Yaman, V. Zhuang and S. Zwick for comments on the manuscript. This work was supported in part by National Institute of General Medical Sciences grant no. 1R01NS117908-01 (S.R.), the Dean’s Competitive Fund from Harvard University (S.R., C.L.), National Institutes of Health grant no. R01EY026025 (G.K.), the Fetzer Foundation (G.K.) and a National Science Foundation Graduate Research Fellowship Program fellowship (C.L.).

Author information

Authors and Affiliations

Authors

Contributions

All the authors designed the study. C.L. wrote code, performed experiments and did data analysis. All the authors wrote the manuscript.

Corresponding authors

Correspondence to Chenguang Li, Gabriel Kreiman or Sharad Ramanathan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Artur Luczak and Greg Wayne for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6, Tables 1 and 2 and Videos 1–6.

Reporting Summary

Supplementary Video 1

Line 1 with trained agent. The video is speeded up by 8× and shows that the animal is led to the target by the trained RL agent when coupled to neurons in Line 1. The target is a red circle, and light flashes are denoted by a blue frame in this and all subsequent supplementary videos.

Supplementary Video 2

Line 1 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations in Fig. 2g, calculated to be 0.4647. The video is speeded up by 8×. When random flashes of light activate the neurons in Line 1, the animal is unable to reach the target.

Supplementary Video 3

Line 2 with trained agent. The video is speeded up by 8× and shows that after training, unlike in the control (Video 4), the RL agent excited neurons it was coupled to in Line 2 to direct the animal to the target.

Supplementary Video 4

Line 2 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations for Line 2 in Fig. 3b, calculated to be 0.2896. The video is speeded up by 8×. When random light flashes are used to activate neurons in Line 2 (see Fig. 3a and Supplementary Table 1 for line details), the animal is unable to reach the target.

Supplementary Video 5

Line 3 with trained agent. The video is speeded up by 8× and shows that the trained RL agent is able to learn appropriate patterns of light flashes based on the animal’s posture to lead it to the target.

Supplementary Video 6

Line 3 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations for Line 3 in Fig. 3b, calculated to be 0.3844. The video is speeded up by 8× and again, with random flashes of light inhibiting neurons in Line 3, the animal does not reach the target.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Kreiman, G. & Ramanathan, S. Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks. Nat Mach Intell 6, 726–738 (2024). https://doi.org/10.1038/s42256-024-00854-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-024-00854-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing