Abstract
Prediction models that generate neuronal spikes from upstream neural activities offer a promising way to re-establish neural functional connectivity. Traditional methods train these models by supervised learning, which requires downstream recordings as ground truth. However, functional downstream activity cannot be recorded when neurological disorders exist. Here we introduce a reinforcement learning (RL)-based point process framework to generate spike trains that directly maximize behavior-level rewards, thus bypassing downstream recordings. This yields a generative spike model that directly transforms upstream activity into spike patterns modulated to desired behavior. We show that these RL-based generative models produce movement-modulated spike patterns akin to downstream recordings from healthy subjects, providing a biomimetic spike encoding framework. This RL framework outperforms existing methods and demonstrates a strong adaptation capability across different decoder settings, highlighting its potential for neural prostheses in restoring transregional communication with biomimetic cortical stimulation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The datasets that support the findings of this study include both publicly available and proprietary data. Public datasets used in this work are available via refs. 19,20,21. The proprietary dataset was collected from Sprague-Dawley rats for a previously published study60. The data are available for research purposes from the corresponding author. A minimum segment of the proprietary dataset to interpret and verify the research is available via GitHub at https://github.com/WuShenghui97/RLPP and via Zenodo at https://doi.org/10.5281/zenodo.17221566 (ref. 64). Source data are provided with this manuscript.
Code availability
The RLPP framework code is available via GitHub at https://github.com/WuShenghui97/RLPP and via Zenodo at https://doi.org/10.5281/zenodo.17221566 (ref. 64).
References
Rao, R. P. Towards neural co-processors for the brain: combining decoding and encoding in brain–computer interfaces. Curr. Opin. Neurobiol. 55, 142–151 (2019).
Belkacem, A. N., Jamil, N., Khalid, S. & Alnajjar, F. On closed-loop brain stimulation systems for improving the quality of life of patients with neurological disorders. Front. Hum. Neurosci. 17, 1085173 (2023).
Bouton, C. E. et al. Restoring cortical control of functional movement in a human with quadriplegia. Nature 533, 247–250 (2016).
Ajiboye, A. B. et al. Restoration of reaching and grasping movements through brain-controlled muscle stimulation in a person with tetraplegia: a proof-of-concept demonstration. Lancet 389, 1821–1830 (2017).
Capogrosso, M. et al. A brain–spine interface alleviating gait deficits after spinal cord injury in primates. Nature 539, 284–288 (2016).
Bryan, M. J., Jiang, L. P. & Rao, R. P. N. Neural co-processors for restoring brain function: results from a cortical model of grasping. J. Neural Eng. 20, 036004 (2023).
Deadwyler, S. A. et al. A cognitive prosthesis for memory facilitation by closed-loop functional ensemble stimulation of hippocampal neurons in primate brain. Exp. Neurol. 287, 452–460 (2017).
Hampson, R. E. et al. Developing a hippocampal neural prosthetic to facilitate human memory encoding and recall. J. Neural Eng. 15, 036014 (2018).
Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P. & Brown, E. N. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. J. Neurophysiol. 93, 1074–1089 (2005).
Song, D. et al. Nonlinear dynamical modeling of human hippocampal CA3-CA1 functional connectivity for memory prostheses. In Proc. 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER) 316–319 (IEEE, 2015).
Qian, C. et al. Binless kernel machine: modeling spike train transformation for cognitive neural prostheses. Neural Comput. 32, 1863–1900 (2020).
Choi, J. S. et al. Eliciting naturalistic cortical responses with a sensory prosthesis via optimized microstimulation. J. Neural Eng. 13, 056007 (2016).
Upadhyay, U., De, A. & Gomez-Rodrizuez, M. Deep reinforcement learning of marked temporal point processes. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) 3172–3182 (Curran Associates Inc., 2018).
Li, S. et al. Learning temporal point processes via reinforcement learning. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) 10804–10814 (Curran Associates Inc., 2018).
Zhu, S., Li, S., Peng, Z. & Xie, Y. Imitation learning of neural spatio-temporal point processes. IEEE Trans. Knowl. Data Eng. 34, 5391–5402 (2022).
DiGiovanna, J., Mahmoudi, B., Fortes, J., Principe, J. C. & Sanchez, J. C. Coadaptive brain–machine interface via reinforcement learning. IEEE Trans. Biomed. Eng. 56, 54–64 (2009).
Marsh, B. T., Tarigoppula, V. S. A., Chen, C. & Francis, J. T. Toward an autonomous brain machine interface: integrating sensorimotor reward modulation and reinforcement learning. J. Neurosci. 35, 7374–7387 (2015).
Shen, X., Zhang, X., Huang, Y., Chen, S. & Wang, Y. Task learning over multi-day recording via internally rewarded reinforcement learning based brain machine interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 28, 3089–3099 (2020).
International Brain Laboratory et al. A brain-wide map of neural activity during complex behaviour. Nature 645, 177–191 (2025).
Steinmetz, N., Zatka-Haas, P., Carandini, M. & Harris, K. Main dataset from steinmetz et al. 2019. figshare https://doi.org/10.6084/M9.FIGSHARE.9598406.V2 (2019).
Steinmetz, N. A., Zatka-Haas, P., Carandini, M. & Harris, K. D. Distributed coding of choice, action and engagement across the mouse brain. Nature 576, 266–273 (2019).
Narayanan, N. S. & Laubach, M. Top-down control of motor cortex ensembles by dorsomedial prefrontal cortex. Neuron 52, 921–931 (2006).
Li, W. et al. The neural mechanism exploration of adaptive motor control: dynamical economic cell allocation in the primary motor cortex. IEEE Trans. Neural Syst. Rehabil. Eng. 25, 492–501 (2016).
Haan, R. D. et al. Neural representation of motor output, context and behavioral adaptation in rat medial prefrontal cortex during learned behavior. Front. Neural Circuits 12, 75 (2018).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Seidler, R. D., Kwak, Y., Fling, B. W. & Bernard, J. A. in Progress in Motor Control (eds Richardson, M. J. et al.) Vol. 782, 39–60 (Springer, 2013).
Cross, L., Cockburn, J., Yue, Y. & O’Doherty, J. P. Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron 109, 724–738.e7 (2020).
Domenech, P., Rheims, S. & Koechlin, E. Neural mechanisms resolving exploitation-exploration dilemmas in the medial prefrontal cortex. Science 369, eabb0184 (2020).
Sugawara, M. & Katahira, K. Dissociation between asymmetric value updating and perseverance in human reinforcement learning. Sci. Rep. 11, 3574 (2021).
Bermudez-Contreras, E. Deep reinforcement learning to study spatial navigation, learning and memory in artificial and biological agents. Biol. Cybern. 115, 131–134 (2021).
Lubianiker, N., Paret, C., Dayan, P. & Hendler, T. Neurofeedback through the lens of reinforcement learning. Trends Neurosci. 45, 579–593 (2022).
Carmena, J. M., Ganguly, K., Dimitrov, D. F. & Wallis, J. D. Reversible large-scale modification of cortical networks during neuroprosthetic control. Nat. Neurosci. 14, 662–667 (2011).
Orsborn, A. L. et al. Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control. Neuron 82, 1380–1393 (2014).
Zhao, Y., Hessburg, J. P., Kumar, J. N. A. & Francis, J. T. Paradigm shift in sensorimotor control research and brain machine interface control: the influence of context on sensorimotor representations. Front. Neurosci. 12, 579 (2018).
Sakellaridi, S. et al. Intrinsic variable learning for brain–machine interface control by human anterior intraparietal cortex. Neuron 102, 694–705.e3 (2019).
Rowald, A. et al. Activity-dependent spinal cord neuromodulation rapidly restores trunk and leg motor functions after complete paralysis. Nat. Med. 28, 260–271 (2022).
Bonizzato, M. et al. Autonomous optimization of neuroprosthetic stimulation parameters that drive the motor cortex and spinal cord outputs in rats and monkeys. Cell Rep. Med. 4, 101008 (2023).
Nieves-Vazquez, H. A., Kim, E. & Ueda, J. Closed-loop estimation of individualized inter-stimulus interval window for transient neuromodulation via paired mechanical and brain stimulation. IEEE. Trans. Med. Robot. Bionics 5, 110–119 (2023).
Golub, M. D. et al. Learning by neural reassociation. Nat. Neurosci. 21, 607–616 (2018).
Mahmoudi, B. & Sanchez, J. C. A symbiotic brain–machine interface through value-based decision making. PLoS ONE 6, e14760 (2011).
Fidêncio, A. X., Klaes, C. & Iossifidis, I. Error-related potentials in reinforcement learning-based brain-machine interfaces. Front. Hum. Neurosci. 16, 806517 (2022).
Tan, J., Zhang, X., Wu, S., Song, Z. & Wang, Y. Hidden brain state-based internal evaluation using kernel inverse reinforcement learning in brain-machine interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 32, 4219–4229 (2024).
Valle, G. et al. Biomimetic computer-to-brain communication enhancing naturalistic touch sensations via peripheral nerve stimulation. Nat. Commun. 15, 1151 (2024).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017).
Schulman, J., Moritz, P., Levine, S., Jordan, M. & Abbeel, P. High-dimensional continuous control using generalized advantage estimation. Preprint at https://arxiv.org/abs/1506.02438 (2018).
Mei, H. & Eisner, J. M. The neural Hawkes process: a neurally self-modulating multivariate point process. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, Inc., 2017).
Cunningham, J. P. & Yu, B. M. Dimensionality reduction for large-scale neural recordings. Nat. Neurosci. 17, 1500–1509 (2014).
Sadtler, P. T. et al. Neural constraints on learning. Nature 512, 423–426 (2014).
Gallego, J. A., Perich, M. G., Miller, L. E. & Solla, S. A. Neural manifolds for the control of movement. Neuron 94, 978–984 (2017).
Wärnberg, E. & Kumar, A. Perturbing low dimensional activity manifolds in spiking neuronal networks. PLoS Comput. Biol. 15, e1007074 (2019).
Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of cortical population dynamics underlying consistent behavior. Nat. Neurosci. 23, 260–270 (2020).
Perich, M. G., Narain, D. & Gallego, J. A. A neural manifold view of the brain. Nat. Neurosci. 28, 1582–1597 (2025).
Churchland, M. M. et al. Neural population dynamics during reaching. Nature 487, 51–56 (2012).
Pandarinath, C. et al. Inferring single-trial neural population dynamics using sequential auto-encoders. Nat. Methods 15, 805–815 (2018).
Abbaspourazad, H., Choudhury, M., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale low-dimensional motor cortical state dynamics predict naturalistic reach-and-grasp behavior. Nat. Commun. 12, 607 (2021).
Safaie, M. et al. Preserved neural dynamics across animals performing similar behaviour. Nature 623, 765–771 (2023).
Wu, S., Zhang, X. & Wang, Y. Neural manifold constraint for spike prediction models under behavioral reinforcement. IEEE Trans. Neural Syst. Rehabil. Eng. 32, 2772–2781 (2024).
Fetz, E. E., Jackson, A. & Mavoori, J. Long-term motor cortex plasticity induced by an electronic neural implant. Nature 444, 56–60 (2006).
Guggenmos, D. J. et al. Restoration of function after brain damage using a neural prosthesis. Proc. Natl Acad. Sci. USA 110, 21177–21182 (2013).
Wu, S. et al. Spike prediction on primary motor cortex from medial prefrontal cortex during task learning. J. Neural Eng. 19, 046025 (2022).
Hawkes, A. G. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58, 83–90 (1971).
Møller, M. F. A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 6, 525–533 (1993).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).
Wu, S. et al. A generative spike prediction model using behavioral reinforcement for re-establishing neural functional connectivity. Zenodo https://doi.org/10.5281/ZENODO.17221566 (2025).
Acknowledgements
This work was supported by STI 2030-Major Projects under grant no. 2021ZD0200403, the Research Grants Council of the Hong Kong Special Administrative Region, China (project no. HKUST C6049-24G), Special Research Support from the Chau Hoi Shuen Foundation under grant no. R9051, the Seed Fund of the Big Data for Bio-Intelligence Laboratory from HKUST under grant no. Z0428 and the Innovation and Technology Commission under grant no. ITCPD/17-9.
Author information
Authors and Affiliations
Contributions
S.W. and Y.W. conceived the study, developed the methodology and performed data analyses. S.W., Z.S. and Z.W. developed the code. S.W. and Z.S. performed the analyses for the video demonstration. J.T. and M.L. conducted implementations and analyzed results on open datasets. X.Z., Y.H., S.C. and X.S. performed the rat experiments and collected the dataset. Y.C. and K.L. contributed to histology imaging and electrophysiology studies. S.W. wrote the paper. D.F., J.C.P. and Y.W. reviewed and revised the paper. Y.W. supervised the work. All authors helped to prepare or edit the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Daniel N. Zdeblick and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Video 1 | Evaluation of RLPP-generated spike trains in online decoding (download MP4 )
. The video demonstrates the behavioral experimental setup and online decoding results. The rat was performing the two-lever discrimination task. The recorded spike trains of M1 neurons were real-time decoded into trajectories in a 2D space by a Kalman filter (KF) as an online prosthetic control. The spike trains generated from the RLPP model were classified by the movement decoder and subsequently decoded into the 2D trajectories by a KF. A constraint was applied to limit excessive firing in the generated spike trains during the model training. The video presents three consecutive example trials of behavioral decoding, showcasing the rat’s behavior alongside recorded spiking activities. The position of the screen cursor decoded from the experimental recordings accurately matched the rat’s actual movement. In contrast, randomly sampled spike trains failed to control the cursor, reflecting conditions where M1 recordings were unavailable due to neural pathway damage. Notably, our RL-trained model successfully generated spike trains from online mPFC recordings, where these generated spike trains showed similar modulation patterns as the experimental data and produced accurate decoding trajectories that aligned with the rat’s behavior.
Source data
Source Data Fig. 2 (download ZIP )
Statistical source data.
Source Data Fig. 3 (download CSV )
Statistical source data.
Source Data Fig. 4 (download CSV )
Statistical source data.
Source Data Fig. 5 (download CSV )
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, S., Song, Z., Zhang, X. et al. A generative spike prediction model using behavioral reinforcement for re-establishing neural functional connectivity. Nat Comput Sci 6, 179–192 (2026). https://doi.org/10.1038/s43588-025-00915-5
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s43588-025-00915-5


