Introduction

In the era of intelligent machine vision, target recognition and tracking technology have gained significance in smart transportation, security surveillance, and autonomous driving1,2,3. However, unpredictable dynamic environments, including occlusion scenarios, pose challenges for deep learning algorithms as they hinder the extraction of target features4,5,6 (Fig. 1a). Conventional approaches based on convolutional neural networks (CNNs) frequently compensate for computational accuracy limitations through architectural expansion, typically by stacking additional convolutional layers7,8,9, which places significant strain on low-energy and lightweight devices. This fundamental constraint is particularly pronounced in dynamic vision applications, where the frame-by-frame computational paradigm enforces extensive multiply-and-accumulate operations for global weight updates, making it challenging to meet the stringent real-time and energy efficiency requirements for edge deployment (Supplementary Fig. 1). In contrast, spiking neural networks (SNNs) employ an event-driven and localized weight-update principle10,11,12,13,14, where information transmission occurs only when neurons fire in response to specific stimuli, with weights being asynchronously updated at critical intervals. This sparse temporal coding paradigm inherently eliminates redundant computational overhead and minimizes power-intensive global operations, making SNNs particularly efficacy for dynamic real-time tasks in edge computing. Given the inherent advantages of the SNN learning characteristics and the ever-growing complexity of dynamic image processing, a systematic investigation of biorational learning principles based on neuromorphic electronic devices offers a promising pathway for executing efficient dynamic tracking tasks without compromising architectural simplicity.

Fig. 1: Bioinspired spike-timing-dependent plasticity (STDP) learning rules for in-sensor spiking neural networks.
figure 1

a Schematic diagram of the two similar moving vehicles and the vehicle tracking results under different scenes, during which tracking failures may occur at confusing conditions. b Schematic diagram of the proposed in-sensor neural network by incorporating the advantages of high-order STDP learning rule in multi-dimensional temporal relationship for achieving enhanced feature extraction capability. Δw represents synaptic weight change, Δt represents a couple of electric-light stimuli intervals, and the timing window pair (Δt1, Δt2) denotes the spiking intervals within each triplet signal. c Typical light-electric-dominated conductance tuning behavior in artificial neuromorphic electronic device arrays to map the biological synaptic plasticity. Vbg is defined as gate voltage. d Schematic illustration of pulse stimulations and corresponding response characteristics for biologically expressed multiple-order STDP rules. e Paired-STDP with a single timing window is limited to single-feature-space extraction. f Triplet-STDP with dual spiking timing windows enables enhanced higher-dimensional feature extraction across two feature spaces.

Spike-timing-dependent plasticity (STDP) is a common bioinspired learning rule of SNNs that adjusts synapse weight based on the relative timing of spikes between connected neurons15,16,17. This rule can be implemented at the hardware level, particularly benefiting from the coordinated advancements in neuromorphic devices and diverse STDP models. For example, Leong et al. employed WSe2 floating-gate memtransistors to simulate a set of anti-STDP features, which can reversely adjust the weights to efficiently suppress irrelevant features and improve the convergence rate of the SNN model during the training phase18. Similarly, John et al. implemented diverse STDP behaviors spanning from classic STDP, anti-STDP, to symmetric STDP through reconfigurable optoelectronic devices, providing a way to reshape the function of neural networks19. However, currently reported STDP learning rules generally consider a single timing interval between individual pairs of pre- and postsynaptic spikes, limiting their ability to capture complex temporal patterns essential for feature extraction in multi-dimensional spaces. Given the close link between STDP characteristics and computational capabilities in complex dynamic visual scenes, more efforts on strong spatiotemporal correlation learning rules enabled by neuromorphic electronic blocks should be considered20,21,22,23.

By considering the impact of timing structure on shaping the synaptic plasticity, Triplet-STDP emerges as a sophisticated high-order learning rule that involves more spatiotemporal relationships to execute the multi-dimensional weight plasticity24,25,26,27. This intricate timing-dependent mechanism enhances the feature extraction capability of Triplet-STDP by selectively reinforcing key connections within neural networks (Fig. 1b). However, the complexity of STDP behaviors raises significant challenges for neuromorphic electronic devices in effectively mapping high-order STDP learning rules onto networks for dynamic visual information processing—an achievement that remains unrealized28,29,30. In this work, we demonstrate multiple-order STDP behaviors of an artificial vision system based on van der Waals (vdW) phototransistors to empower the enhanced feature processing capability of SNNs. Beneficial from the interfacial defects-dominated persistent photoconductance (PPC) behavior upon low-frequency noise analysis, both the Paired- and Triplet-STDP can be achieved in the phototransistors by multi-dimensionally tuning the coupling effect of optoelectrical and electrical properties on device conductance. As a neuromorphic vision sensor platform, the timing interval- and sequence-dependent Triplet-STDP characteristics of the optoelectronic devices are further investigated, which underscore the critical merits of vision tasks in shaping the high-order learning dynamics within SNNs. Specifically, through precise synaptic weight mapping on the fabricated vdW phototransistor array, we successfully implement a Triplet-STDP-based SNN hardware and achieve enhanced spatiotemporal feature extraction capabilities in hybrid image classification tasks. Significantly, we further integrate the Triplet-STDP mechanism within a YOLO-SNN architecture for vehicle tracking, which achieves 90.44% accuracy even under occlusion conditions. These findings provide a device physics-shaped learning rule that fosters the innovation of SNN-based architectures toward enhanced edge visual computing technologies.

Results

Bioinspired multi-order spike-timing-dependent plasticity

STDP as a pivotal neuroregulatory mechanism in bioscience, significantly influences learning, memory, and perception functions through dynamic synaptic weight adaptation29. The weight-update rules in STDP are governed by precise spiking patterns (including both spike count and temporal sequence), which collectively define characteristic temporal windows that determine the direction and magnitude of synaptic modifications (Fig. 1c, d)31,32,33. In SNNs, these timing-dependent plasticity rules govern the higher-order temporal pattern encoding capability of the network by facilitating non-linear integration of information across multiple timescales. Taking the Paired-STDP paradigm as an example, constrained by its single temporal window between pre- and postsynaptic spikes, it establishes only a unidimensional mapping axis. This scenario severely limits the spatiotemporal feature extraction capacity of networks. As visualized in Fig. 1e, this architecture leads to substantial overlap in the feature space representation of distinct input patterns with subtle temporal differences, significantly compromising classification performance. In contrast, the Triplet-STDP framework introduces a sophisticated dual-window mechanism. It enables hierarchical temporal processing—the primary window captures fundamental spike sequence correlations while the secondary window resolves finer temporal substructures within these sequences. This multi-scale integration strategy effectively projects similar input patterns into orthogonal feature subspaces (Fig. 1f), dramatically enhancing pattern discriminability by disentangling previously overlapping distributions. Thus, Triplet-STDP featured high-order temporal processing capability is expected to strengthen the sparse network connection, further endowing the SNN with enhanced event-correlation sensitivity and feature extraction characteristics to deal with the complicated dynamic image processing scenes.

Optoelectronic-coupling-dependent synaptic plasticity

Neuromorphic transistor devices rely on the controllable manipulation of conductance changes of the semiconductor channels integrated into the transistor array to map complex biological learning processes (e.g., voltage, light)7,34,35,36. Given the challenges in demonstrating multi-order STDP behaviors for neuromorphic devices, harnessing rich physical properties to investigate the learning dynamics and build effective learning rules holds significant promise for developing advanced computing technologies. Light and electric stimuli as typical conductance modulators via rich light-electric interactions can provide ample opportunities to engineer synaptic behaviors, especially for vdW materials featuring a high surface-to-volume ratio and good optoelectronic properties37,38,39,40. In specific, indium selenide (InSe), with its intrinsic photoresponse characteristic—stemming from its direct bandgap that enhances light absorption41, along with its high carrier mobility and low effective mass that promote efficient charge transport and collection42,43—is frequently employed as an exemplary platform for investigating optoelectronic phenomena and biomimetic optoelectronic synaptic behaviors (Supplementary Table 1).

Hence, for diverse feature shaping and intrinsic mechanism exploration of the Triplet-STDP characteristics, in this work, we develop a vdW phototransistor-based vision sensor device, which consists of a photosensitive InSe flake serving as the semiconductor channel, a thin indium film serving as the passivation layer for the channel region, and a SiO2/Si substrate with a 2D/3D heterostructure (Fig. 2a). The detailed device fabrication process is provided in the Method part. The corresponding optical microscope image and Raman spectrum show the typical characteristic vibrational peaks of the InSe channel (Supplementary Fig. 2). The cross-sectional Transmission electron microscopy (TEM) characterization with Energy-dispersive X-ray spectroscopy (EDS) analysis further confirms the structural constitution and high quality of the fabricated devices (Supplementary Fig. 2). Typically, the measured electrical characteristics describe n-type conduction behavior with a clockwise hysteresis loop under dark conditions, which indicates the existence of the charge-trapping states in the fabricated devices (Fig. 2b). Under light illumination, the on-current increases and the recorded transfer curves shift toward negative voltage, suggesting a controllable photo-induced n-type doping effect with different light intensities (Supplementary Fig. 3). Distinct from the irreversible doping methods, such a photo-doping behavior presents a convertible characteristic under the application of gate bias or spontaneously, which underscores the tunable synaptic plasticity via light-electric interactions (Supplementary Fig. 4).

Fig. 2: Tunable optoelectronic coupling characteristics and Paired-STDP behavior based on vdW phototransistors.
figure 2

a Schematic structure of the InSe phototransistors under light-electric modulation. b Transfer characteristics of the InSe phototransistors recorded under dark, light illumination, and dark conditions, successively. The evolution of the photocurrent before and after light illumination suggests the PPC behavior of the InSe phototransistors. The drain–source voltage (Vds) is 0.1 V. Drain–source current (Ids) is defined as the drain–source current. c Persistent photoconductance response under a light spike with a duration of 5 s and amplitude of 87.83 mW cm−2. The inset histogram shows the distribution of conductance decay variation (ΔG) for the 20 InSe phototransistors (inset). The device variation is defined as the ratio of the standard deviation (\(\sigma\)) to the mean (\(\mu\)). d Experimental and fitted weight variation of asymmetric STDP as a function of time interval of a couple of light-electric spikes. The insets illustrate the applied pulse sequences: in the positive Δt region, the electric pulse precedes the light pulse, denoted as E-L; in the negative Δt region, the light pulse precedes the electric pulse, denoted as L-E. e The evolution of the conductance under the electric-light stimuli sequence in the first quadrant of asymmetric STDP. The measured electric-light pair is a light pulse of 87.83 mW cm−2 (10 ms) and Vbg of −40 V (10 ms). Note that ∆w is defined as (G2 − G1)/(G1 − G0), where G0 is the initial conductance, G1 and G2 are the conductance values following electrical and light stimuli, respectively. f Experimental and fitted weight variation of abnormal STDP as function of time interval of a couple of light-electric spikes. g The evolution of the conductance under the light-electric stimuli sequence in the third quadrant of asymmetric STDP. h The corresponding energy band diagrams to reveal the trap-assisted light-electric (TALE) cooperation behavior. i The corresponding energy band diagrams for TALE competition behavior. The measured light-electric pair is light pulse of 987.82 mW cm−2 (10 ms) and Vbg of 40 V (10 ms).

Basic weight-update behavior of the fabricated vdW phototransistors underscores their potential in shaping biologically plausible STDP characteristics. Under optical stimulation with intensities ranging from 0.26 to 2.63 mW cm−2, the devices show persistent photoconductance (PPC) behavior characterized by photoconductance rise during illumination followed by exponential decay post-stimulation, consistent with photo-doped transfer characteristics (Supplementary Fig. 5a). Systematic modulation of light intensity, pulse width, and pulse number induces a controllable transition from short-term plasticity (STP) to long-term plasticity (LTP), demonstrating precise optical control over synaptic strength (Supplementary Fig. 5). The observed reversible synaptic potentiation-depression transitions under combined optical and electrical stimulation further confirm the critical role of optoelectronic coupling in synaptic weight modulation (Supplementary Fig. 6). Remarkably, these devices achieve ultra-low energy consumption of 8 fJ per spike for electrical read operations, surpassing most reported organic and vdW-based phototransistors (Supplementary Fig. 7). Device-to-device variation characterization across approximately 20 InSe-based units reveals reproducibility in photodoping behavior and synaptic responses, with a variation of 9.5%, confirming the reliability and uniformity of these devices for neuromorphic applications (Fig. 2c and Supplementary Fig. 8).

Typical STDP feature in vdW phototransistors can be described by the synaptic weight change (Δw). Figure 2d exhibits a classical asymmetric Paired-STDP characterized by two synaptic weight curves in the first and third quadrants. Such Δw curves are governed by the time-dependent modulation of device conductance under pulsed stimulation. Taking the formation of Δw in the first quadrant as an example, when a couple of electric-light (E-L) stimuli with intervals (Δt) increasing from 10 to 100 ms are applied, the normalized photoconductance curves exhibit a blue shift tendency (Fig. 2e). Notably, the fitted time constants for positive and negative branches in the classical asymmetric Paired-STDP are 150 ms and 60 ms (Supplementary Note 1), respectively, closely aligning with the millisecond response of human neurons28. Other Paired-STDP behaviors, such as symmetric characteristics located in the first and second quadrants, can also be obtained by tuning the E-L parameters (Supplementary Fig. 9). Interestingly, when a strong light intensity and a positive gate voltage are applied, we observe an atypical STDP behavior differing from typical biological phenomena. As shown in Fig. 2f, Δw evolves oppositely to classic STDP at Δt < 0, with minimal weight change at short time intervals (~10 ms) and increasing at longer intervals. Figure 2g and Supplementary Fig. 10 show the opposite conductance trends at the L-E branch (i.e., the third quadrant for atypical STDP and classical STDP) with different optoelectronic parameters. From the device perspective, such synaptic weight variation is closely related to the controllable regulation of photoconductance through the strong light-electric coupling effect, for which the physical mechanism can be understood in the following section. From a learning rule perspective, this atypical STDP enriches synaptic weight adjustment modalities, offering valuable insights for optimizing weight updating rules of SNN architectures in our work.

Mechanism of the trap-assisted light-electric interactions

The rich STDP behaviors observed in the fabricated vdW phototransistors motivate a systematic investigation of optoelectronic coupling effects on dynamic carrier transport mechanisms. In particular, PPC effect in optoelectronic devices with tunable temporal decay properties has emerged as a critical determinant for achieving programmable synaptic plasticity and dynamic weight modulation of neuromorphic devices (Supplementary Table 4). Given that such PPC behavior typically stems from defect-mediated charge trapping/detrapping processes at heterointerfaces or within intrinsic vdW materials44,45,46, controlled experiments by employing h-BN isolation layers at the 2D/3D semiconductor/dielectric interface are performed for mechanism clarification. Comparative device characterization reveals that h-BN-incorporated structures exhibit photocurrent responses that precisely track illumination signals with microsecond-scale switching dynamics, effectively suppressing PPC effects and demonstrating conventional photodetector behavior (Supplementary Fig. 11). Such a distinct difference in photoconductivity hints at the dominant role of 2D/3D interfacial defects in manipulating the synaptic plasticity of vdW phototransistors. Dynamic carrier transport characterizations are further conducted by low-frequency noise analysis to visualize the hetero-interfacial charge trapping-detrapping phenomenon (Supplementary Fig. 12). The power spectral density (PSD) of the current fluctuations (SI) for both devices exhibit an ideal 1/f behavior, indicating the uniform distribution of charge fluctuations. Following the semi-empirical equation of \({S}_{I}\)/\({I}_{{\mbox{ds}}}^{2}\) = A/f, a critical parameter of noise amplitude (A) can be extracted, which marks the level of the charge fluctuation behaviors. As expected, the device without h-BN layer exhibits a higher A than that of the h-BN-based device, further confirming the critical role of hetero-interfacial defects during the photo-doping process.

Based on the elucidated physical image of the PPC characteristics, the observed rich STDP behaviors can be understood as trap-assisted light-electric (TALE) coupling processes. Taking the STDP feature in the first quadrant (E-L) as an example, the negative electric stimulus results in a charge de-trapping process, which increases the carrier concentration of the vdW channel by lowering its Fermi level (the energy band diagram in Fig. 2h; the following light signal further contributes more free carriers by generating photoexcited electron-hole pairs. Such the TALE cooperating effect on the photoconductivity leads to an exponential decay of Δw as time intervals increase. In contrast, for the STDP feature in the third quadrant (L-E), although the first light signal pulls up the carrier concentration, the following positive electric stimulus induces a depletion effect by raising the EF and trapping carriers, resulting in a TALE competitive effect (Fig. 2i). Interestingly, when the light intensity is weak, the electric-induced trapping probability is lower, resulting in light-dominant photoconductance and normal STDP behavior in the third quadrant of Fig. 2d; while the light intensity is strong, the electric-induced trapping probability is higher, and the carrier depletion-dominant photoconductivity leads to an abnormal STDP behavior in the third quadrant of Fig. 2f. A relevant detailed carrier transport process is provided in Supplementary Fig. 13. Consequently, the TALE interaction effect on synaptic plasticity can be categorized into TALE cooperation, light-dominant TALE competition, and electric-dominant TALE competition, expanding the design space for engineered synaptic plasticity beyond biology to fulfill neural network learning rules.

Triplet-spike-timing-dependent plasticity in phototransistor

The uncovered TALE coupling behaviors of the phototransistor empower the diverse emulation of STDP behaviors. Given the advantages of Triplet-STDP-based SNN in processing complex spatiotemporal information, we further explore the demonstration of Triplet-STDP by manipulating the conductance change under successive triplet signals. The operation schemes featured with two timing windows for Triplet-STDP are sketched in Fig. 3a, b, and can be assigned as two sets of successive triplet stimuli, i.e., light-electric-light (L-E-L) and electric-light-electric (E-L-E). As a result, the recorded photoconductance changes with good repeatability contain diverse TALE coupling characteristics of two paired-spike events, such as electric-dominated negative conductance and light-dominated positive photoconductance behaviors, which provide ampler spacing for engineering the high-order STDP paradigm (Supplementary Fig. 14). For example, the L-E-L branch consists of a L-E and E-L spiking pair, corresponding to enhanced carrier trapping behavior by lowering the Fermi level after light illumination and subsequent re-dominance of photoconductance effects, respectively. For symmetric timing windows of spiking pairs (|Δt1| = |Δt2|), a potentiation STDP characteristic that resembles typical Paired-STDP behavior can be observed (Fig. 3c), suggesting that the light spike plays a dominant role as photoconductance enhancement outweighing the defect trapping-induced conductance reduction under the weak positive electric pulse. While for the E-L-E branch, the successive E-L and L-E spiking events introduce the light-dominated and electric-dominated TALE competition effect on the weight update features, respectively, where the effect of defect-capture carriers under the electric pulse exceeds that of photogenerated carriers under the light pulse.

Fig. 3: The demonstration of Triplet-STDP based on controllable photoconductance.
figure 3

Schematic of the pulse schemes for the post-pre-post (a) and pre-post-pre (b) triplet spiking signals, along with the corresponding current changes, respectively. Note that two sets of triplet spiking configurations of pre-post-pre and post-pre-post are defined as electric-light-electric (E-L-E) and light-electric-light (L-E-L) schemes, respectively. The timing window pair (Δt1, Δt2) denotes the spiking intervals within each triplet signal. For the L-E-L triplet, the spiking pair corresponds to post-pre (Δt1 < 0) and pre-post (Δt2 > 0), whereas for the E-L-E triplet, it corresponds to pre-post (Δt1 > 0) and post-pre (Δt2 < 0). c The evolution of weight variation of the Triplet-STDP for the L-E-L and E-L-E branches with symmetrical spike timing (|Δt1| = |Δt2| ). d Weight variation of the Triplet-STDP as functions of Δt1 and Δt2 with asymmetrical values. For the L-E-L branch, Δt1 = −50 ms and Δt2 varies from 30 to 100 ms; for the E-L-E branch, Δt1 = 50 ms and Δt2 varies from −30 to −100 ms. e The weight variation evolution of the Triplet-STDP for the L-E-L scheme as a function of Δt2 (discrete symbols). The fitted solid curves follow the Triplet-STDP equation. f Weight variation mapping of the Triplet-STDP as functions of Δt1 and Δt2, where potentiation or depression with different synaptic weight variations is obtained under different spike sequences with distinct timing intervals. The inset shows the schematic of the L-E-L and E-L-E sequences. Here, the background color gradients indicate the value of Δw.

To fully investigate the Triplet-STDP characteristics, we examine the synaptic weight variation depending on a variety of triplet stimulus scenarios with different timing windows. Taking the L-E-L framework as an example, for fixed Δt1, the evolution of the Δw exhibits a distinct decay with increasing Δt2, which is derived from the gradual recovery of empty trapping states that can capture more photogenerated carriers (Fig. 3d). Such phenomena can be further described by the exponential function in Fig. 3e, and the higher the Δt1, the faster the degradation of the Δw. This can be understood as the dominant role of light-dominated-TALE competition effect on weight variation gradually weakening as the timing window of Δt1 prolongs. In contrast, Δw shows a depression behavior and decreases along with the increasing Δt2 due to the recombination of the photogenerated carrier which weakens the defect trapping ability during the subsequent electric pulse. Interestingly, we found that the overall evolution of Δw for Triplet-STDP mainly follows the latter timing window, Δt2, showing a last spike pair-dominant STDP behavior. This scenario aligns with neurobiological studies where synaptic plasticity in the visual cortex is especially sensitive to the timing of the final spike as it tends to override earlier ones31,47. To emphasize the flexible tunability of triplet-signals in shaping the multi-order STDP behaviors, we systematically summarize experimental Triplet-STDP results across different Δt1 and Δt2. As shown in Fig. 3f, all patterns of Δw can be categorized into two classes, potentiation, and depression, according to the corresponding input signals, L-E-L and E-L-E, respectively. Specifically, the Δw patterns present a short-interval prominent trend aligning with the biological high-order STDP learning principles of rapid self-adaptation31 and can be precisely located via a couple of timing windows, which provide the fundamental prototype for fully establishing STDP learning rules.

To advance the practical implementation potential of the proposed Triplet-STDP feature in neuromorphic computing systems, we systematically evaluate its robustness and material universality across different phototransistor platforms. First, device-level characterization of eight representative InSe phototransistors confirms reproducible Triplet-STDP behavior under symmetric timing-window stimulation, establishing baseline performance metrics (Supplementary Fig. 15). Building upon the design principle of individual device of trapping-based InSe phototransistor, we further investigate the universality of Triplet-STDP behaviors across typical vdW material systems. For example, molybdenum disulfide (MoS2), as a widely studied layered semiconductor material known for its scalability and fabrication compatibility48,49,50, is employed to fabricate a trapping-based MoS2 phototransistor array by incorporating hetero-dielectric layers to enable controlled charge trapping/detrapping dynamics. As expected, the MoS2 phototransistors successfully replicate multiple Triplet-STDP learning rules under optoelectronic stimulations with good device uniformity across the entire array, mirroring the key optoelectronic properties observed in InSe phototransistors (Supplementary Figs. 16 and 17). These results confirm the generalizability of our proposed phototransistor device design among typical vdW semiconductors and provide a solid foundation for subsequent array-level integration and system demonstration. On the other hand, it is worth mentioning that while InSe phototransistors offer an ideal individual device platform for feature shaping of Triplet STDP, the material vulnerability to chemical solvents such as acetone during the photolithography process leads to severe degradation of carrier transport properties51,52, hindering the practical application of InSe phototransistor arrays in the hardware deployment of Triplet-STDP-based SNNs. This limitation has long been acknowledged as a critical challenge in exploiting InSe electronic devices for scalable logic circuits and in-memory computing53,54. Therefore, considering the good universality of Triplet-STDP behavior in MoS2 phototransistor arrays, we ultimately adopt the MoS2 phototransistor array for the subsequent hardware implementation of SNN-based imaging processing applications.

High-order spike-timing-dependent-plasticity learning rules

Triplet-STDP features multiple timing windows to enable recognition of complex temporal patterns and high-order spatiotemporal correlations, which is expected to boost the spatiotemporal information processing capability of SNNs55,56. According to the typical Paired-STDP, the construction of high-order STDP learning rules, such as the Triplet-STDP ones that contain two timing windows starts with a semi-empirical description:

$${W(\triangle {t}_{1},\triangle {t}_{2})={A}_{3}{{\rm{e}}}}^{{-\frac{|\triangle {t}_{1}|}{{\tau }_{1}}}_{{{\rm{e}}}}-\frac{|\triangle {t}_{2}|}{{\tau }_{2}}}$$
(1)

where triplet spikes (tpre, tpost, tpost) affect synapse weight based on their timing difference, Δt1 = tposttpre and Δt2 = tposttpost. Here, A is the scaling factor and τ as the time constant plays a critical role in determining both the temporal sensitivity to spike timing and the learning window modulation characteristics of the Triplet-STDP mechanism57. To quantitatively assess the uniformity of Triplet-STDP behavior across different vdW phototransistors, we perform systematic characterization and precise extraction of τ values from both device architectures. As evidenced in Supplementary Fig. 18, the extracted time constants exhibit minimal variation across all measured devices, demonstrating exceptional uniformity and robust reproducibility of the STDP characteristics in our fabricated vdW phototransistor arrays.

To fully utilize the device characteristics of vdW optoelectronic devices to boost the computational capability, the experimentally observed typical Triplet-STDP behaviors are considered to reshape the Triplet-STDP learning rules. Specifically, a weight fine-tuning factor of G is introduced into the Triplet-STDP rule, which can enhance the generalization ability of the Triplet-STDP rules (GT-STDP) and provide greater weight adjustment flexibility to accommodate more diverse Δw changes in Fig. 3 (Supplementary Note 1). Also, we construct a Generalized Paired-STDP (GP-STDP) rule for a fair comparison in subsequent image processing applications. It is worth mentioning that, as we surveyed, recent works on emerging neuromorphic devices with biological features generally demonstrated paired or Triplet-STDP functions, whereas the high-order STDP-based network architectures are rarely explored (Supplementary Table 5). Thus, the comprehensive understanding and construction of multi-order STDP learning rules based on the vdW phototransistor-based vision sensor in this work underpin the following implementation of SNN for processing complicated visual tasks.

Hardware implementation of spike-timing-dependent plasticity

High-order STDP learning rules with multi-dimensional information mapping characteristics are anticipated to endow SNNs with enhanced feature extraction capability and computational performance58. To validate this aspect, we first construct a three-layer fully connected SNN architecture incorporating multi-order STDP learning rules to perform a hybrid classification using a self-designed alphanumeric dataset (Fig. 4a). The built dataset consists of several confusing pairs for typical networks, which provides a better platform to showcase the precise feature extraction capabilities of Triplet-STDP. Initial benchmarking on standard MNIST handwritten digit classification reveals comparable accuracy (>90%) between Paired-STDP and Triplet-STDP networks after 1000 training cycles (Fig. 4b and Supplementary Fig. 19), suggesting the simplicity of MNIST task fails to fully exploit the advanced temporal processing capabilities of Triplet-STDP. However, introducing GT-STDP modulation with <10% tuning factor significantly accelerated convergence, achieving >90% accuracy after 500 iterations and 98.8% after 1000 iterations (Fig. 4c), demonstrating enhanced learning efficiency. The true advantage of Triplet-STDP emerges in complex hybrid classification tasks, where it substantially outperforms Paired-STDP, particularly for challenging character pairs like 1/I and 8/B (Fig. 4d and Supplementary Table 6). This performance gap, deliberately accentuated through controlled network scaling to isolate learning rule effects, underscores the enhanced spatiotemporal feature extraction capacity of Triplet-STDP for demanding pattern recognition applications.

Fig. 4: High-order STDP learning rules-based spiking neural networks (SNNs) for image classification tasks.
figure 4

a Schematic of the high-order STDP-enabled SNN based on the proposed vdW phototransistors with Triplet-STDP behavior. The right panel shows the image classification workflow with the alphanumeric dataset. b Training accuracy of image classification as a function of iteration based on Triple-STDP and the corresponding effects of different fine-tuning factor coefficients. The inset shows a local zoomed-in accuracy plot, where Triplet-STDP exhibits the highest training accuracy when the fine-tuning factor is 6%. c Comparison of training accuracy between Paired-STDP and Triplet-STDP under varying fine-tuning factors, indicating that the generalization performance of the SNN can be improved by the tuning factor within a certain range. d Training accuracy comparison between Paired-STDP and Triplet-STDP learning rules for classifying characters with visually similar features. For example, as to the classification of confusing items of digit 1 and letter I, the feature extraction capability of Triplet-STDP SNN can be distinctly enhanced without complicating the SNN model.

Transitioning from simulation to hardware implementation, we develop a MoS2 phototransistor array-based neuromorphic system (Fig. 5a, b), exhibiting device uniformity (device-to-device variation of 4.89% for threshold voltage and 9.76% for On/Off ratio) (Fig. 5c). The hardware architecture integrates a 6×6 phototransistor array with peripheral circuits for performing in-memory computing, system control, and current-mode inference operations (Supplementary Fig. 20). More details can be found in the Method section. Initially, the trained weights are precisely mapped into each array cell via voltage-bias pulse writing, establishing the target conductance states validated by the conductance distribution profiles (Supplementary Fig. 21). During the inference procedure, the selected hybrid patterns are encoded into six asynchronous voltage spike trains as input signals and processed via parallel analog current summation, signal amplification, and conversion to ensure the effective current output (Fig. 5d). By collecting these currents, image classification is implemented successfully where the current range matches the desired ones aligning with the input image (Fig. 5e). As a result, the confusing pattern pairs can be well recognized with less than a 10% loss in accuracy compared to simulations (Fig. 5f), confirming the fidelity of hardware-implemented Triplet-STDP for feature extraction within the phototransistor array-based architecture.

Fig. 5: Hardware implementation of the hybrid image classification based on vdW phototransistor array.
figure 5

a Wire bonding of the MoS2 phototransistor array on a printed circuit board (scale bar: 2 mm). b An optical micrograph of the 6 × 6 MoS2 phototransistor array (scale bar: 200 μm). c The statistical distribution of On/Off ratio and threshold voltage (Vth) obtained from the transfer characteristic curves of the phototransistor array. The solid lines indicate the fit by gaussian distribution f(G)= \(\frac{1}{\sqrt{2\pi }\sigma }{{\mbox{e}}}^{-\frac{{(G-\mu )}^{2}}{2{\sigma }^{2}}}\), where μ and σ represent the mean and standard deviation, respectively. The device variation is defined as the ratio of the standard deviation (\(\sigma\)) to the mean (\(\mu\)). d The schematic diagram indicating how the images are encoded as input to the constructed hardware. e Output current responses of the MoS2 phototransistor array-based neuromorphic computing system to the input images 8, B, 2, and Z during hardware SNN inference. f Hardware inference accuracy across different classification labels.

Spiking neural network mimicking dynamic vehicle tracking

The event-driven characteristics of SNNs make them promising for developing energy-efficient machine vision systems59,60,61. As evidenced by recent advances summarized in Supplementary Table 7, SNN architectures employing diverse learning rules have demonstrated progressive evolution from static image analysis to sophisticated dynamic vision processing, particularly in addressing critical automotive challenges such as real-time object detection and tracking for autonomous driving applications. Vehicle tracking, as one of the most prevalent applications in artificial intelligence-enhanced traffic management systems, imposes a stricter requirement for timely information processing. However, current deep recognition models often struggle with insufficient feature capture, inadequate multi-dimensional processing, and dynamic target detection4,62. Integrating Triplet-STDP with present SNNs potentially offers a solution by enhancing feature extraction ability in complex visual scenes.

To render high-performance vehicle tracking, we construct a sophisticated YOLO-SNN architecture featuring a Triplet-STDP learning rule enabled by the vdW phototransistor vision sensor devices and a self-built vehicle dataset featuring numerous vehicle occlusion samples (Fig. 6a). Figure 6b depicts how the Triplet-STDP governs feature extraction via manipulating the weight update process. Relying on the interrelationships of multiple events in Triplet-STDP, when two neurons exhibit strong spatiotemporal correlation, the weight update process during the model training would be further promoted and strengthened to ensure highly efficient and task-specific feature extraction characteristics, and vice versa. Figure 6c and Fig. 6d visualize the impact of Triplet-STDP on the weight updating process by mapping the training weight in four typical convolutional kernels of the first layer (3 × 3, 16) before and after the STDP layer. Such a significant weight variation clarifies the enhanced feature extraction capability of Triplet-STDP. The enhanced vehicle tracking capability of Triplet-STDP-based SNN can be visualized in Fig. 6e, f. Notably, when the vehicles are overlapped, the network without the Triplet-STDP rule can only track a single target; although partial key features are shielded, the Triplet-STDP-equipped network can still successfully track the overlapped targets. Importantly, such precise tracking is maintained even after the two cars separate in the Triplet-STDP-equipped network; whereas the network lacking the Triplet-STDP rule assigns the wrong identification number. This mismatch causes tracking errors where the network might lose or misidentify targets, reducing overall tracking accuracy and reliability. As a result, the incorporation of the Triplet-STDP layer yields an improvement in detection performance, achieving 90.44% accuracy (Fig. 6g). This represents a significant 14.12% enhancement over the baseline YOLO-SNN architecture (76.32%) and a 5.71% improvement compared to the Paired-STDP variant (84.73%). These quantitative results indicate the boosted feature extraction capabilities enabled by Triplet-STDP’s high-order temporal processing mechanisms of SNNs.

Fig. 6: Vehicle tracking application based on the Triplet-STDP-enabled SNN.
figure 6

a Schematic of YOLO-SNN architecture and working flow of the object detection and tracking. b Illustration of event correlation-dependent process in the Triplet-STDP learning rule, where synaptic strength adjustments are based on pairwise correlations between neurons in the preceding neuron layer. Visualization of the trained weight before (c) and after (d) the Triplet-STDP layer. The specific values can be found in Supplementary Tables 8 and 9. Representative scenes of vehicle tracking results for the original YOLO-SNN (e) and Triplet-STDP-enabled SNN (f), which show the better object detection capability of the triple STDP-based SNN. g Comparison of tracking accuracy among the Triplet-STDP-enabled SNN, Paired-STDP-enabled SNN, and the original YOLO-SNN.

Discussion

In this work, we demonstrate the multi-order STDP characteristics of the vdW phototransistor-based vision sensor devices by manipulating the light-electric interaction behaviors, which are desirable for the development of dynamic perception technology. Through control of light-electric interaction effects, we achieve both paired and Triplet-STDP behaviors, with underlying mechanisms revealed through in-depth analysis of dynamic carrier transport properties. The high-order Triplet-STDP learning rule, with its expanded information mapping dimensions, endows the SNNs with advanced feature extraction capabilities. Importantly, such the high-order STDP characteristics demonstrate universality and uniformity across the vdW phototransistor array, exhibiting minimal device-to-device variation of less than 10%. This consistency enables the Triplet-STDP-enabled SNN to achieve significant performance enhancements during hardware implementation on a 6 × 6 vdW phototransistor array for challenging hybrid object recognition tasks, with a classification accuracy loss within 10% between training and inference phases. Furthermore, Triplet-STDP-endowed enhanced spatiotemporal learning capabilities enable high-accuracy vehicle tracking (90.44%), even under complex overlapping conditions. These findings provide a strategic approach for leveraging rich physical device characteristics to implement high-order learning rules, paving the way for high-performance networks and advancing vision perception technologies.

Methods

Device fabrication

The layered InSe flakes are exfoliated from the InSe bulk crystal onto a PDMS film using a mechanical lift-off method and then transferred onto a Si substrate with 300-nm-thick SiO2 using a precision transfer platform. The passivation layer for the channel region is then formed by directly evaporating an indium layer on top of the exfoliated InSe flakes with 5-nm thickness, endowing a modulated n-type doping behavior for the InSe devices (Supplementary Fig. 22). This doping effect can significantly improve channel mobility63, resulting in lower operating voltage and reduced power consumption of our synaptic device. Subsequently, an aligned copper grid shadow mask is placed on the channel material and Au contacts (50-nm thick) are deposited as the source and drain electrodes using a thermal evaporation machine. As for the fabrication of InSe phototransistors with a h-BN layer, after transferring the channel material InSe, an additional h-BN layer is aligned and transferred on top of the InSe.

To fabricate the MoS2 phototransistor array, Pd bottom electrodes are first deposited on the substrate using electron-beam evaporation, where photolithography and lift-off define the electrode patterns. A high-k HfO₂ dielectric layer with a thickness of 30 nm is then deposited via atomic layer deposition (ALD). Next, a thin seeding layer with 2 nm is introduced as a charge-trapping layer, followed by the deposition of a second HfO2 layer of 8 nm, forming a multi-layer stacked gate dielectric structure. A CVD-grown MoS2 film (from Six-Carbon Technology) is then transferred onto the designated areas and patterned into channels using photolithography and reactive ion etching. Finally, Cr/Au (10/50 nm) source and drain electrodes are defined by a photolithography procedure, deposited via electron-beam evaporation, and structured using a lift-off process.

Structural analysis and electrical characterization

The morphology of the device is characterized by a transmission electron microscope (TEM; JEOL-JEM2100F operated at 200 kV), equipped with an Energy-dispersive X-ray spectroscopy (EDS) system. Raman spectrum is obtained by Raman spectroscopy (LabRam HR-800, Jobin Yvon, λ: 532 nm). Electrical characterization of the vdW phototransistors is performed in a probe station equipped with a semiconductor parameter analyzer (Keysight B1500A). Source Measurement Units (SMUs) and Waveform Generator/Fast Measurement Units (Keysight B1530A) are built into the analyzer to meet experimental testing needs. Optoelectrical properties of the devices are characterized via a self-made precisely tunable power laser system, which is combined with a Keysight DSOX2024a digital storage oscilloscope to modulate the input light peak. The 650 nm laser wavelength is selected for optimal matching with the InSe absorption spectrum and enhanced photogenerated carrier efficiency in our devices (Supplementary Fig. 23). To prevent unnecessary fluctuations caused by the environment, all the electrical measurements are performed in a vacuum (<10−5 Torr) under darkness. LFN is investigated using a Programmable Point-Probe Noise Measuring System (3PNMS) with a noise floor of 10–27 A2 Hz−1.

To evaluate device uniformity, a systematic characterization methodology is employed. For assessing the uniformity of individual InSe phototransistors, 20 devices are fabricated across multiple batches under strictly controlled process parameters, including material thickness and electrode deposition procedures. The electrical characteristics of each device are recorded on a typical semiconductor parameter analyzer (Keysight B1500A) and statistically analyzed. Similarly, for array-level uniformity, all 36 devices in a 6 × 6 MoS₂ phototransistor array are independently characterized using a row–column addressing method, enabling accurate evaluation of device-to-device variation within the array through a full-coverage measurement strategy.

The 6 × 6 transistor array is electrically characterized and validated for multiply-accumulate (MAC) operations using an NI-PXI 4163 modular instrumentation system integrated with a 24-channel PXI-2532B multiplexer for row/column addressing. Each transistor unit is selectively activated through a row–column addressing protocol by applying a bias voltage to the target row via a PXI-4132 SMU, with unselected rows and columns grounded to minimize crosstalk. For MAC validation, synaptic weights are programmed into conductance states by applying pulsed signals, and analog computing is emulated by inputting voltage vectors to rows while measuring the summed output currents across columns.

Training of spiking neural networks for image classification

We construct a directly trained SNN architecture consisting of three layers in the image classification task. The first layer is the input layer, which includes several neurons equivalent to the size of the training dataset, with each image pixel corresponding to one neuron. This layer converts input pixels into spike sequences through Poisson encoding. The second layer is the processing layer, containing a variable number of excitatory neurons paired with an equal number of inhibitory neurons. Through pre-simulation results, we determine that the network performs optimally when the neuron count is set to 650 and the number of epochs is 3 (based on pre-simulation results in Supplementary Fig. 24). When the firing rate reaches the neuron’s voltage threshold, the excitatory neuron sends an excitatory signal. Simultaneously, the inhibitory neuron connects one-to-one with the excitatory neuron receives the corresponding signal and provides lateral inhibition, meaning each excitatory neuron, except the one sending the excitatory signal, is inhibited by its connected inhibitory neuron. This setup fosters competition among the excitatory neurons. The third layer is the output layer, which outputs the classification support obtained after training each image.

The directly trained SNN is trained on the MNIST handwritten digit dataset and a self-defined alphanumeric dataset. The MNIST handwritten digit dataset is a standard benchmark in image recognition that includes 60,000 training samples and 10,000 testing samples, with each 28 × 28-pixel grayscale image representing digits from 0 to 9. To evaluate the robustness of our proposed model, we also incorporate a self-defined alphanumeric dataset containing the digits 0–9 and both uppercase and lowercase letters A–Z. The network model demonstrates strong classification performance on both the widely used MNIST dataset and our self-defined alphanumeric dataset. It should be clarified that recognition accuracy is influenced not only by the learning rule but also by network size and training data. In general, larger network scales tend to diminish the observable differences between learning rules. To better isolate and evaluate the impact of the learning rule on feature extraction, we deliberately select a smaller network configuration that could highlight the contrast between Paired-STDP and Triplet-STDP learning rules, even at the cost of reduced overall accuracy.

Hardware implementation of the image classification

To conduct the hardware implementation of image recognition, a hardware computing system integrating a vdW phototransistor array, peripheral electronics (INA226A ADC, DAC7311 DAC), and a microcontroller unit (MCU, STM32F103RCT6) is developed. The phototransistor array is configured in a matrix layout and mounted onto a printed circuit board (PCB) to enable multi-channel measurements. First, the synaptic weights derive from the trained Triplet-STDP-based SNN are deployed into the vdW phototransistor array, allowing each device to emulate a specific synaptic weight through its programmable conductance state. During inference, input images (28 × 28-pixel grayscale frames) are pre-processed by downsampling them to 6 × 6-pixel resolution via lossy compression to match the 6-channel input configuration of the vdW phototransistor array. The resized images are then encoded into programmable voltage signals corresponding to pixel intensities and applied as array inputs. Processing involved parallel analog current summation, signal amplification, and conversion to ensure effective output current. By collecting these currents, image classification is implemented successfully where the current range matched the desired ones aligning with the input image.

YOLO-spike-neural-network simulation for vehicle tracking

In the vehicle tracking task, we develop a YOLO-SNN architecture composed of an array of convolutional layers, batch normalization layers, activation functions, YOLO layers, feature concatenation layers, and a deconvolution layer. The architecture began with convolutional layers that performed spatial downsampling for feature extraction. The batch normalization layers following the convolutional layers normalize the features, aiding in accelerated training and model stabilization. The model uses ReLU as the activation function, introducing non-linear characteristics to the model. After sequential convolutions and normalizations, the feature maps are processed through feature concatenation layers and YOLO layers, which produces the target bounding boxes and classification results. To improve the spatial resolution of the feature maps, the network model incorporates a deconvolution layer to reduce the number of channels, followed by further convolutions. To ensure more accurate feature representation, the weights of the convolutional layers are updated using a local hardware-friendly STDP learning rule.

We train the proposed YOLO-SNN architecture using a self-defined vehicle dataset, where each image is initially sized at 224 × 224 pixels and encompasses 75 different vehicle categories. During preprocessing, images are resized to 416 × 416 pixels, followed by image augmentation. The network utilizes an SGD optimizer for parameter updates, with convolution primarily employing 3 × 3 kernels. Batch normalization parameters are set with an epsilon of 0.0001 and momentum at 0.03 to control the precision and speed of normalization updates. The YOLO model detected features at a scale of 75, with deconvolution layers having 128 channels and using 3 × 3 convolution kernels. This setup optimizes the network’s ability to effectively detect and identify a variety of vehicles.