Abstract
Brain-inspired reinforcement learning is pivotal for artificial general intelligence, yet current artificial neural network-based hardware lacks critical biological mechanisms like third-terminal modulated eligibility traces and dynamic reward signaling. Emerging materials address these challenges by efficiently mimicking complex reinforcement learning dynamics. Here, we demonstrate a brain-inspired spiking neural network-based reinforcement learning computing architecture using α-In2Se3 ferroelectric semiconductor field-effect transistor. By leveraging the intrinsic in-plane and out-of-plane polarization coupling of α-In2Se3, the multi-terminal conductance modulation in the device enables reward signal modulation of reinforcement learning. The ferroelectric relaxation is utilized to implement biological eligibility trace decay, thereby enhancing the algorithm’s processing capability. autonomous driving tasks are then demonstrated with an RL neural network constructed by the α-In2Se3 transistor array, where in-situ reward-based weight updates and eligibility trace decay are performed without any external memory or computing units. Our solution enables a fully functional, energy-efficient, and low-overhead spiking-based reinforcement learning architecture.
Data availability
The data that support the plots within this paper and other findings of this study are available from the corresponding author upon request. Source data are provided with this paper.
Code availability
The code for the SNN with the scheme is available from the corresponding author with detailed explanations upon request.
References
Minsky, M.L. Theory of neural-analog reinforcement systems and its application to the brain-model problem. Princeton University, (1954).
Minsky, M. Steps toward artificial intelligence. Proc. IRE 49, 8–30 (1961).
Frémaux, N. & Gerstner, W. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front. Neural Circuits 9, 85 (2016).
Gerstner, W., Lehmann, M., Liakoni, V., Corneil, D. & Brea, J. Eligibility traces and plasticity on behavioral time scales: experimental support of neohebbian three-factor learning rules. Front. Neural Circuits 12, 53 (2018).
Zhang, W. et al. Neuro-inspired computing chips. Nat. Electron. 3, 371–382 (2020).
Roy, K., Jaiswal, A. & Panda, P. Towards spike-based machine intelligence with neuromorphic computing. Nature 575, 607–617 (2019).
Sutton, RS, Barto, AG. Reinforcement learning: An introduction. MIT Press, (2018).
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Zhou, F. & Chai, Y. Near-sensor and in-sensor computing. Nat. Electron. 3, 664–671 (2020).
Migliato Marega, G. et al. Logic-in-memory based on an atomically thin semiconductor. Nature 587, 72–77 (2020).
Tong, L. et al. 2D materials–based homogeneous transistor-memory architecture for neuromorphic hardware. Science 373, 1353–1358 (2021).
Wang, Z. et al. Reinforcement learning with analogue memristor arrays. Nat. Electron. 2, 115–124 (2019).
Guo, F. et al. Achieving reinforcement learning in a three-active-terminal neuromorphic device based on a 2D vdW ferroelectric material. Mater. Horiz. 10, 3719–3728 (2023).
Lu, Y. et al. In-memory realization of eligibility traces based on conductance drift of phase change memory for energy-efficient reinforcement learning. Adv. Mater. 34, 2107811 (2022).
Zhou, Y. et al. A reconfigurable two-WSe2-transistor synaptic cell for reinforcement learning. Adv. Mater. 34, 2107754 (2022).
Demirağ, Y. et al. PCM-trace: scalable synaptic eligibility traces with resistivity drift of phase-change materials. 2021 IEEE International Symposium on Circuits and Systems (ISCAS); 2021: IEEE p. 1–5. (2021).
Zhou, Y. et al. Computational event-driven vision sensors for in-sensor spiking neural networks. Nat. Electron. 6, 870–878 (2023).
Tavanaei, A., Ghodrati, M., Kheradpisheh, S. R., Masquelier, T. & Maida, A. Deep learning in spiking neural networks. Neural Netw. 111, 47–63 (2019).
Taherkhani, A. et al. A review of learning in biologically plausible spiking neural networks. Neural Netw. 122, 253–272 (2020).
Florian, R. V. Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput. 19, 1468–1502 (2007).
Mozafari, M., Kheradpisheh, S. R., Masquelier, T., Nowzari-Dalini, A. & Ganjtabesh, M. First-spike-based visual categorization using reward-modulated STDP. IEEE Trans. Neural Netw. Learn. Syst. 29, 6178–6190 (2018).
Nair, H, Shen, JP, Smith, JE. A microarchitecture implementation framework for online learning with temporal neural networks. 2021 IEEE Computer Society Annual Symposium on VLSI (ISVLSI); 2021: IEEE. p. 266–271. (2021).
Amirshahi, A. & Hashemi, M. ECG classification algorithm based on STDP and R-STDP neural networks for real-time monitoring on ultra low-power personal wearable devices. IEEE Trans. Biomed. Circuits Syst. 13, 1483–1493 (2019).
Rubino, A, Payvand, M, Indiveri, G. Ultra-low power silicon neuron circuit for extreme-edge neuromorphic intelligence. 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS); 2019: IEEE p. 458–461. (2019).
Ding, W. et al. Prediction of intrinsic two-dimensional ferroelectrics in In2Se3 and other III2-VI3 van der Waals materials. Nat. Commun. 8, 14956 (2017).
Xue, F. et al. Unraveling the origin of ferroelectric resistance switching through the interfacial engineering of layered ferroelectric-metal junctions. Nat. Commun. 12, 7291 (2021).
Han, W. et al. Phase-controllable large-area two-dimensional In2Se3 and ferroelectric heterophase junction. Nat. Nanotechnol. 18, 55–63 (2023).
Gao, J. et al. Intrinsic polarization coupling in 2D α-In2Se3 toward artificial synapse with multimode operations. SmartMat 2, 88–98 (2021).
Niu, J. et al. Dual-logic-in-memory implementation with orthogonal polarization of van der Waals ferroelectric heterostructure. InfoMat 6, e12490 (2024).
Cui, C. et al. Intercorrelated in-plane and out-of-plane ferroelectricity in ultrathin two-dimensional layered semiconductor In2Se3. Nano Lett. 18, 1253–1258 (2018).
Xue, F. et al. Giant ferroelectric resistance switching controlled by a modulatory terminal for low-power neuromorphic in-memory computing. Adv. Mater. 33, 2008709 (2021).
Si, M. et al. A ferroelectric semiconductor field-effect transistor. Nat. Electron. 2, 580–586 (2019).
Io, W. F. et al. Temperature-and thickness-dependence of robust out-of-plane ferroelectricity in CVD grown ultrathin van der Waals α-In 2 Se 3 layers. Nano Res. 13, 1897–1902 (2020).
Wang, S. et al. Two-dimensional ferroelectric channel transistors integrating ultra-fast memory and neural computing. Nat. Commun. 12, 53 (2021).
Wang, L. et al. Exploring ferroelectric switching in α-In2Se3 for neuromorphic computing. Adv. Funct. Mater. 30, 2004609 (2020).
Liu, K. et al. An optoelectronic synapse based on α-In2Se3 with controllable temporal dynamics for multimode and multiscale reservoir computing. Nat. Electron. 5, 761–773 (2022).
Liu, K. et al. Multilayer reservoir computing based on ferroelectric α-In2Se3 for hierarchical information processing. Adv. Mater. 34, 2108826 (2022).
Wu, J. et al. Nonvolatile Electro-optic Response of Graphene Driven by Ferroelectric Polarization. Nano Lett. 24, 11469–11475 (2024).
Wu, J. et al. Reversible thermally driven phase change of layered In2Se3 for integrated photonics. Nano Lett. 23, 6440–6448 (2023).
Sahoo, S. et al. Vertically Integrated Dual-Memtransistor enabled reconfigurable heterosynaptic sensorimotor networks and in-memory neuromorphic computing. ACS Nano 19, 13287–13299 (2025).
Park, Y., Baac, H.W., Heo, J. & Yoo, G. Thermally activated trap charges responsible for hysteresis in multilayer MoS2 field-effect transistors. Appl. Phys. Lett. 108, 083102 (2016).
Late, D. J., Liu, B., Matte, H. R., Dravid, V. P. & Rao, C. Hysteresis in single-layer MoS2 field effect transistors. ACS Nano 6, 5635–5641 (2012).
Gabel, M. & Gu, Y. Understanding microscopic operating mechanisms of a van der Waals planar ferroelectric memristor. Adv. Funct. Mater. 31, 2009999 (2021).
Spaldin, N. A. A beginner’s guide to the modern theory of polarization. J. Solid State Chem. 195, 2–10 (2012).
Norris, JR. Markov chains. Cambridge University Press, (1998).
He, K. et al. Distinct eligibility traces for LTP and LTD in cortical synapses. Neuron 88, 528–538 (2015).
Precup, D. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series 80. (2000)
Singh, S. P. & Sutton, R. S. Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123–158 (1996).
Wijekoon, JH, Dudek, P. Analogue cmos circuit implementation of a dopamine-modulated synapse. 2011 IEEE International Symposium of Circuits and Systems (ISCAS); 2011: IEEE. p. 877–880. (2011).
Yan, M. et al. Ferroelectric synaptic transistor network for associative memory. Adv. Electron. Mater. 7, 2001276 (2021).
Kim, D. et al. Polarization relaxation induced by a depolarization field in ultrathin ferroelectric BaTiO3 capacitors. Phys. Rev. Lett. 95, 237602 (2005).
Verma, N. et al. In-memory computing: Advances and prospects. IEEE Solid-State Circuits Mag. 11, 43–55 (2019).
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Hong, H. et al. Memristor-based adaptive analog-to-digital conversion for efficient and accurate compute-in-memory. Nat. Commun. 16, 9749 (2025).
Bing, Z., Meschede, C., Chen, G., Knoll, A. & Huang, K. Indirect and direct training of spiking neural networks for end-to-end control of a lane-keeping vehicle. Neural Netw. 121, 21–36 (2020).
Ly, D. R. et al. Role of synaptic variability in resistive memory-based spiking neural networks with unsupervised learning. J. Phys. D: Appl. Phys. 51, 444002 (2018).
Querlioz, D., Bichler, O., Dollfus, P. & Gamrat, C. Immunity to device variations in a spiking neural network with memristive nanodevices. IEEE Trans. Nanotechnol. 12, 288–295 (2013).
Li, C. et al. Efficient and self-adaptive in-situ learning in multilayer memristor neural networks. Nat. Commun. 9, 2385 (2018).
Yao, P. et al. Face classification using electronic synapses. Nat. Commun. 8, 15199 (2017).
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
Burr, G. W. et al. Neuromorphic computing using non-volatile memory. Adv. Phys.: X 2, 89–124 (2017).
Bing, Z. et al. End-to-end learning of a spiking neural network based on r-stdp for a lane-keeping vehicle. 2018 IEEE international conference on robotics and automation (ICRA); 2018: IEEE. p. 4725-4732. (2018).
Potjans, W., Morrison, A. & Diesmann, M. Enabling functional neural circuit simulations with distributed computing of neuromodulated plasticity. Front. Comput. Neurosci. 4, 141 (2010).
Acknowledgements
This work was supported by the National Key Research and Development Program of China (No. 2023YFB4402300, Y.H.), Natural Science Foundation of China (No. 92164204 and 62374063, Y.H.), National Natural Science Foundation of China (62425405, Y.C.), MOST National Key Technologies R&D Programme (SQ2022YFA1200118-04, Y.C.), Research Grant Council of Hong Kong (CRS_PolyU502/22, Y.C.), and The Hong Kong Polytechnic University (WZ4X and G-SB6M, Y.C.).
Author information
Authors and Affiliations
Contributions
Y.H. and Y.C. conceived and supervised the project. Y.W. designed the experiment. Y.W., W.X., and J.Y. fabricated the devices. Y.W. and W.X. performed the Raman and Atomic Force Microscope characterizations. Y.W. performed the device measurement. W.X., Y.H. and Y.W. designed and performed the neural network simulations. Y.W., W.X., Y.Z., C.Z., and X.M. analysed the data. Y.W. and Y.C. wrote the paper. All the authors discussed the results and implications and reviewed the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Adrian Ionescu, Shuiyuan Wang and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Y., Xiong, W., Yan, J. et al. Brain-inspired synaptic transistors for in-situ spiking reinforcement learning with eligibility trace. Nat Commun (2026). https://doi.org/10.1038/s41467-026-69898-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-69898-9