Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks

Li, Chenguang; Kreiman, Gabriel; Ramanathan, Sharad

doi:10.1038/s42256-024-00854-2

Article
Published: 14 June 2024

Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks

Nature Machine Intelligence volume 6, pages 726–738 (2024)Cite this article

3241 Accesses
8 Citations
31 Altmetric
Metrics details

Subjects

Abstract

Deep reinforcement learning (RL) has been successful in a variety of domains but has not yet been directly used to learn biological tasks by interacting with a living nervous system. As proof of principle, we show how to create such a hybrid system trained on a target-finding task. Using optogenetics, we interfaced the nervous system of the nematode Caenorhabditis elegans with a deep RL agent. Agents adapted to strikingly different sites of neural integration and learned site-specific activations to guide animals towards a target, including in cases where agents interfaced with sets of neurons with previously uncharacterized responses to optogenetic modulation. Agents were analysed by plotting their learned policies to understand how different sets of neurons were used to guide movement. Further, the animal and agent generalized to new environments using the same learned policies in food-search tasks, showing that the system achieved cooperative computation rather than the agent acting as a controller for a soft robot. Our system demonstrates that deep RL is a viable tool both for learning how neural circuits can produce goal-directed behaviour and for improving biologically relevant behaviour in a flexible way.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A system that integrates deep RL with the *C. elegans* neural network.**

**Fig. 2: The system learned to navigate the *C. elegans* line 1 to a target.**

**Fig. 3: The system could successfully navigate different optogenetic lines to a target.**

**Fig. 4: The system learned to navigate different optogenetic lines to a target with neuron-specific strategies.**

**Fig. 5: Agent policies can predict agent performance on other lines.**

**Fig. 6: Animals with agents can correct errors and generalize to new situations.**

Discovering cognitive strategies with tiny recurrent neural networks

Article Open access 02 July 2025

Learning the dynamics of realistic models of C. elegans nervous system with recurrent neural networks

Article Open access 10 January 2023

A stealthy neural recorder for the study of behaviour in primates

Article Open access 08 November 2024

Data availability

All processed animal tracks used to generate figures are available at https://github.com/ccli3896/RLWorms.git (ref. ⁶³).

Code availability

Analysis code and training code are available at https://github.com/ccli3896/RLWorms.git (ref. ⁶³) and https://doi.org/10.5281/zenodo.11002033 (ref. ⁶⁵).

References

Romano, D., Donati, E., Benelli, G. & Stefanini, C. A review on animal–robot interaction: from bio-hybrid organisms to mixed societies. Biol. Cybern. 113, 201–225 (2019).
Google Scholar
Tankus, A., Fried, I. & Shoham, S. Cognitive-motor brain–machine interfaces. J. Physiol. Paris 108, 38–44 (2014).
Google Scholar
Bostrom, N. & Sandberg, A. Cognitive enhancement: methods, ethics, regulatory challenges. Sci. Eng. Ethics 15, 311–341 (2009).
Google Scholar
Afraz, S.-R., Kiani, R. & Esteky, H. Microstimulation of inferotemporal cortex influences face categorization. Nature 442, 692–695 (2006).
Google Scholar
Bonizzato, M. & Martinez, M. An intracortical neuroprosthesis immediately alleviates walking deficits and improves recovery of leg control after spinal cord injury. Sci. Transl. Med. 13, eabb4422 (2021).
Google Scholar
Enriquez-Geppert, S., Huster, R. J. & Herrmann, C. S. Boosting brain functions: Improving executive functions with behavioral training, neurostimulation, and neurofeedback. Int. J. Psychophysiol. 88, 1–16 (2013).
Google Scholar
Iturrate, I., Pereira, M., Millán, J. & del, R. Closed-loop electrical neurostimulation: challenges and opportunities. Curr. Opin. Biomed. Eng. 8, 28–37 (2018).
Google Scholar
Lafer-Sousa, R. et al. Behavioral detectability of optogenetic stimulation of inferior temporal cortex varies with the size of concurrently viewed objects. Curr. Res. Neurobiol. 4, 100063 (2023).
Google Scholar
Lu, Y. et al. Optogenetically induced spatiotemporal gamma oscillations and neuronal spiking activity in primate motor cortex. J. Neurophysiol. 113, 3574–3587 (2015).
Google Scholar
Salzman, D. C., Britten, K. H. & Newsome, W. T. Cortical microstimulation influences perceptual judgements of motion direction. Nature 346, 174–177 (1990).
Google Scholar
Schild, L. C. & Glauser, D. A. Dual color neural activation and behavior control with Chrimson and CoChR in Caenorhabditis elegans. Genetics 200, 1029–1034 (2015).
Google Scholar
Xu, J. et al. Thalamic stimulation improves postictal cortical arousal and behavior. J. Neurosci. 40, 7343–7354 (2020).
Google Scholar
Park, S.-G. et al. Medial preoptic circuit induces hunting-like actions to target objects and prey. Nat. Neurosci. 21, 364–372 (2018).
Google Scholar
Yang, J., Huai, R., Wang, H., Lv, C. & Su, X. A robo-pigeon based on an innovative multi-mode telestimulation system. Biomed. Mater. Eng. 26, S357–S363 (2015).
Google Scholar
Holzer, R. & Shimoyama, I. Locomotion control of a bio-robotic system via electric stimulation. In Proc. Institute of Electrical and Electronics Engineers/Robotics Society of Japan International Conference on Intelligent Robot and Systems. Innovative Robotics for Real-World Applications 1514–1519 (IEEE, 1997).
Talwar, S. K. et al. Rat navigation guided by remote control. Nature 417, 37–38 (2002).
Google Scholar
Sato, H. et al. A cyborg beetle: insect flight control through an implantable, tetherless microsystem. In Proc. 21st Institute of Electrical and Electronics Engineers International Conference on Micro Electro Mechanical Systems 164–167 (IEEE, 2008); https://doi.org/10.1109/MEMSYS.2008.4443618
Peckham, P. H. & Knutson, J. S. Functional electrical stimulation for neuromuscular applications. Annu. Rev. Biomed. Eng. 7, 327–360 (2005).
Google Scholar
Kashin, S. M., Feldman, A. G. & Orlovsky, G. N. Locomotion of fish evoked by electrical stimulation of the brain. Brain Res. 82, 41–47 (1974).
Google Scholar
Hinterwirth, A. J. et al. Wireless stimulation of antennal muscles in freely flying Hawkmoths leads to flight path changes. PLoS ONE 7, e52725 (2012).
Google Scholar
Sanchez, C. J. et al. Locomotion control of hybrid cockroach robots. J. R. Soc. Interface 12, 20141363 (2015).
Google Scholar
Bergmann, E., Gofman, X., Kavushansky, A. & Kahn, I. Individual variability in functional connectivity architecture of the mouse brain. Commun. Biol. 3, 1–10 (2020).
Google Scholar
Mueller, S. et al. Individual variability in functional connectivity architecture of the human brain. Neuron 77, 586–595 (2013).
Google Scholar
Husson, S. J., Gottschalk, A. & Leifer, A. M. Optogenetic manipulation of neural activity in C. elegans: from synapse to circuits and behaviour. Biol. Cell 105, 235–250 (2013).
Google Scholar
Nagel, G. et al. Channelrhodopsin-2, a directly light-gated cation-selective membrane channel. Proc. Natl Acad. Sci. USA 100, 13940–13945 (2003).
Google Scholar
Kocabas, A., Shen, C.-H., Guo, Z. V. & Ramanathan, S. Controlling interneuron activity in Caenorhabditis elegans to evoke chemotactic behaviour. Nature 490, 273–277 (2012).
Google Scholar
Leifer, A. M., Fang-Yen, C., Gershow, M., Alkema, M. J. & Samuel, A. D. T. Optogenetic manipulation of neural activity in freely moving Caenorhabditis elegans. Nat. Methods 8, 147–152 (2011).
Google Scholar
Wen, Q. et al. Proprioceptive coupling within motor neurons drives C. elegans forward locomotion. Neuron 76, 750–761 (2012).
Google Scholar
Hernandez-Nunez, L. et al. Reverse-correlation analysis of navigation dynamics in Drosophila larva using optogenetics. eLife 4, e06225 (2015).
Google Scholar
Donnelly, J. L. et al. Monoaminergic orchestration of motor programs in a complex C. elegans behavior. PLoS Biol. 11, e1001529 (2013).
Google Scholar
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Google Scholar
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Google Scholar
Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020).
Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Google Scholar
Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).
Google Scholar
Berner, C. et al. Dota 2 with large scale deep reinforcement learning. Preprint at http://arxiv.org/abs/1912.06680 (2019).
Wurman, P. R. et al. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature 602, 223–228 (2022).
Google Scholar
Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).
Google Scholar
Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021).
Google Scholar
Haydari, A. & Yılmaz, Y. Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans. Intell. Transp. Syst. 23, 11–32 (2022).
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. 35th International Conference on Machine Learning 1861–1870 (PMLR, 2018).
Yang, X., Jiang, X.-L., Su, Z.-L. & Wang, B. Cyborg moth flight control based on fuzzy deep learning. Micromachines 13, 611 (2022).
Google Scholar
Ariyanto, M., Refat, C. M. M., Hirao, K. & Morishima, K. Movement optimization for a cyborg cockroach in a bounded space incorporating machine learning. Cyborg Bionic Syst. 4, 0012 (2023).
Google Scholar
Zheng, N. et al. Real-time and precise insect flight control system based on virtual reality. Electron. Lett. 53, 387–389 (2017).
Google Scholar
Zheng, N. et al. Abdominal-waving control of tethered bumblebees based on sarsa with transformed reward. IEEE Trans. Cybern. 49, 3064–3073 (2019).
Google Scholar
Ardiel, E. L. & Rankin, C. H. An elegant mind: learning and memory in Caenorhabditis elegans. Learn. Mem. 17, 191–201 (2010).
Google Scholar
Kim, J. & Shlizerman, E. Deep reinforcement learning for neural control. Preprint at https://arxiv.org/abs/2006.07352 (2020).
Christodoulou, P. Soft actor-critic for discrete action settings. Preprint at https://arxiv.org/abs/1910.07207 (2019).
Wong, C.-C., Chien, S.-Y., Feng, H.-M. & Aoyama, H. Motion planning for dual-arm robot based on soft actor-critic. IEEE Access 9, 26871–26885 (2021).
Google Scholar
Sarma, G. P. et al. OpenWorm: overview and recent advances in integrative biological simulation of Caenorhabditis elegans. Phil. Trans. R. Soc. B 373, 20170382 (2018).
Google Scholar
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 60 (2019).
Google Scholar
Nikishin, E. et al. Improving stability in deep reinforcement learning with weight averaging. Uncertainty in Artificial Intelligence Workshop on Uncertainty in Deep Learning (2018).
Stable Baselines 2.10.2 documentation. Reinforcement Learning Resources https://stable-baselines.readthedocs.io/en/master/guide/rl.html (2021).
Bhardwaj, A., Thapliyal, S., Dahiya, Y. & Babu, K. FLP-18 functions through the G-protein-coupled receptors NPR-1 and NPR-4 to modulate reversal length in Caenorhabditis elegans. J. Neurosci. 38, 4641–4654 (2018).
Google Scholar
Riddle, D. L., Blumenthal, T., Meyer, B. J. & Priess, J. R. Mechanosensory Control of Locomotion. C. elegans II 2nd edn (Cold Spring Harbor Laboratory Press, 1997).
Brandt, R., Gergou, A., Wacker, I., Fath, T. & Hutter, H. A Caenorhabditis elegans model of tau hyperphosphorylation: induction of developmental defects by transgenic overexpression of Alzheimer’s disease-like modified tau. Neurobiol. Aging 30, 22–33 (2009).
Google Scholar
Jospin, M. et al. A neuronal acetylcholine receptor regulates the balance of muscle excitation and inhibition in Caenorhabditis elegans. PLoS Biol. 7, e1000265 (2009).
Google Scholar
Hollenstein, J., Auddy, S., Saveriano, M., Renaudo, E. & Piater, J. Action noise in off-policy deep reinforcement learning: Impact on exploration and performance. Transactions on Machine Learning Research (2022); https://openreview.net/forum?id=NljBlZ6hmG
Andersen, R. A., Aflalo, T., Bashford, L., Bjånes, D. & Kellis, S. Exploring cognition with brain–machine interfaces. Annu. Rev. Psychol. 73, 131–158 (2022).
Google Scholar
Sussillo, D., Stavisky, S. D., Kao, J. C., Ryu, S. I. & Shenoy, K. V. Making brain–machine interfaces robust to future neural variability. Nat. Commun. 7, 1–13 (2016).
Google Scholar
Dong, X. et al. Toward a living soft microrobot through optogenetic locomotion control of Caenorhabditis elegans. Sci. Robot. 6, eabe3950 (2021).
Google Scholar
Tandon, P. pytorch-soft-actor-critic. GitHub https://github.com/pranz24/pytorch-soft-actor-critic (2022).
Li, C. RLWorms. GitHub https://github.com/ccli3896/RLWorms.git (2024).
Kazemipour, A. Discrete SAC PyTorch, GitHub, https://github.com/alirezakazemipour/Discrete-SAC-PyTorch (2020).
Li, C. RLWorms. Zenodo https://doi.org/10.5281/zenodo.11002033 (2024).

Download references

Acknowledgements

We thank S. Bhupatiraju for discussions about RL and comments on the manuscript. We thank T. Hallacy and A. Yonar for guidance in C. elegans experiments and C. McCartan for input on statistical analyses. We would like to thank Dr. Jeffrey Lee for providing us with customized high power LED light sources. We thank K. Blum, C. Pehlevan, G. Anand, A. Bacanu, B. Brissette, D. Hidalgo, R. Huang, H. Megale, W. Weiter, Y. Ilker Yaman, V. Zhuang and S. Zwick for comments on the manuscript. This work was supported in part by National Institute of General Medical Sciences grant no. 1R01NS117908-01 (S.R.), the Dean’s Competitive Fund from Harvard University (S.R., C.L.), National Institutes of Health grant no. R01EY026025 (G.K.), the Fetzer Foundation (G.K.) and a National Science Foundation Graduate Research Fellowship Program fellowship (C.L.).

Author information

Authors and Affiliations

Biophysics, Harvard University, Cambridge, MA, USA
Chenguang Li
Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
Gabriel Kreiman
Center for Brains, Minds and Machines, Cambridge, MA, USA
Gabriel Kreiman
Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, USA
Sharad Ramanathan
Center for Brain Science, Harvard University, Cambridge, MA, USA
Sharad Ramanathan
Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
Sharad Ramanathan
John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
Sharad Ramanathan

Authors

Chenguang Li
View author publications
Search author on:PubMed Google Scholar
Gabriel Kreiman
View author publications
Search author on:PubMed Google Scholar
Sharad Ramanathan
View author publications
Search author on:PubMed Google Scholar

Contributions

All the authors designed the study. C.L. wrote code, performed experiments and did data analysis. All the authors wrote the manuscript.

Corresponding authors

Correspondence to Chenguang Li, Gabriel Kreiman or Sharad Ramanathan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Artur Luczak and Greg Wayne for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–6, Tables 1 and 2 and Videos 1–6.

Reporting Summary

Supplementary Video 1

Line 1 with trained agent. The video is speeded up by 8× and shows that the animal is led to the target by the trained RL agent when coupled to neurons in Line 1. The target is a red circle, and light flashes are denoted by a blue frame in this and all subsequent supplementary videos.

Supplementary Video 2

Line 1 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations in Fig. 2g, calculated to be 0.4647. The video is speeded up by 8×. When random flashes of light activate the neurons in Line 1, the animal is unable to reach the target.

Supplementary Video 3

Line 2 with trained agent. The video is speeded up by 8× and shows that after training, unlike in the control (Video 4), the RL agent excited neurons it was coupled to in Line 2 to direct the animal to the target.

Supplementary Video 4

Line 2 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations for Line 2 in Fig. 3b, calculated to be 0.2896. The video is speeded up by 8×. When random light flashes are used to activate neurons in Line 2 (see Fig. 3a and Supplementary Table 1 for line details), the animal is unable to reach the target.

Supplementary Video 5

Line 3 with trained agent. The video is speeded up by 8× and shows that the trained RL agent is able to learn appropriate patterns of light flashes based on the animal’s posture to lead it to the target.

Supplementary Video 6

Line 3 with random light flashing. The random frequency was matched to the average proportion of light on measured across all standard evaluations for Line 3 in Fig. 3b, calculated to be 0.3844. The video is speeded up by 8× and again, with random flashes of light inhibiting neurons in Line 3, the animal does not reach the target.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, C., Kreiman, G. & Ramanathan, S. Discovering neural policies to drive behaviour by integrating deep reinforcement learning agents with biological neural networks. Nat Mach Intell 6, 726–738 (2024). https://doi.org/10.1038/s42256-024-00854-2

Download citation

Received: 12 July 2023
Accepted: 10 May 2024
Published: 14 June 2024
Version of record: 14 June 2024
Issue date: June 2024
DOI: https://doi.org/10.1038/s42256-024-00854-2

This article is cited by

Credible inferences in microbiome research: ensuring rigour, reproducibility and relevance in the era of AI
- Alberto Caminero
- Carolina Tropini
- Elena F. Verdu
Nature Reviews Gastroenterology & Hepatology (2025)