Introduction

Bioinspired neuromorphic vision (NV) sensor offers a compelling opportunity to reduce data shuttling, latency, and energy consumption by directly assigning specific computing tasks inside the sensor1,2,3, thus gaining significant interests in the fields of intelligent robotics, surveillance security, and industrial automation. At their core, NV sensors mimic the roles of photoreceptor and bipolar cells in the human retina4, where photoreceptor cells transform incoming light into electrical signals, followed by the signal preprocessing of bipolar cells in an on/off framework to extract image features as illustrated in Fig. 1a. Similarly, the photoresponse of NV sensors can be analogized to the magnitude and direction of information flow in bipolar cells: a positive response implies an incline of target activations triggered by the source stimulus, while a negative response indicates a decline5. Notably, negative photoresponses play an important role in nullifying specific features, offsetting certain inputs, thereby allowing NV sensors to adeptly process complex interrelationships. Therefore, embedding bipolar optoelectronic functionalities in a single device is pivotal for in-sensor edge processing such as feature extraction, pattern classification, and motion tracking.

Fig. 1: Reconfigurable neuromorphic vision sensing and momentum conservation.
figure 1

a Schematic illustration of the human retina consisting of photoreceptor and bipolar cells for vision information processing via the on/off framework. Energy band structure of b MAPbI3, and c Bi2O2Se calculated by the density functional theory. d Temperature-dependent steady-state photoluminescence (PL) spectra of the MAPbI3/Bi2O2Se heterojunction. e Diagram of the interband photoexcitation and hot-carrier transition in the momentum-conserved MAPbI3/Bi2O2Se heterojunction.

Very recently, both positive and negative photoelectrical responses have been manifested in two-dimensional (2D) van der Waals (vdW) heterostructure phototransistors, which are regulated by external gate biases, photogating effects, and ferroelectric fields6,7. Since thinner films are favored for better electric-field manipulations, 2D bipolar phototransistors often suffer from suboptimal optoelectronic performances in terms of narrow wavelength range, low photoresponsivity, and small signal-to-noise ratio (SNR), resulting in remarkably limited recognition precisions in complex real world, especially for dim-light scenarios. In contrast, halide perovskites endowed with direct bandgaps possess robust absorption capabilities irrespective of film thicknesses. The conduction-band bottom of halide perovskites originates from the degenerated Pb 6p orbitals with a narrow dispersion, leading to a high density of states8. Unfortunately, the inherent ion migrations in halide perovskites substantially screen the carrier transport9, consequently restricting the photoconductive gains.

Beyond the absorption loss, high-energy hot carriers above the band edge induce ~30% loss to the optoelectronic efficiency in the form of thermalization10. These hot carriers typically undergo ultrafast relaxations through the carrier-phonon scatterings within femtosecond timescales, preceding their arrival at the band edge for extractions11. Thus, it is imperative that the extraction rate of hot carriers can compete with the cooling rate. In fact, semiconductors with spatial confinements, including quantum dots, nanowires, and nanofilms, exhibit decelerated hot-carrier cooling due to strong Coulomb interactions12, substantial exciton-binding energies13, and diminished electron-phonon couplings14. Intriguingly, halide perovskites have recently been spotlighted as promising hot-carrier absorbers with slow cooling rates, ascribed to the pronounced hot-phonon bottleneck effect stemming from nonequilibrium longitudinal-optical (LO) phonons15. Integrating halide perovskites within 2D vdW heterojunctions might harness a synergistic effect of hot-carrier extractions and tunable band alignments, which could pave the way for the simultaneous capacitation of enhanced optoelectronic sensing and neuromorphic computing. However, effective hot-carrier extractions from such an integrated heterostructure have rarely been explored.

Herein, we rationally manufacture a momentum-conserved vdW heterostructure capable of ultrasensitive bipolar photoresponses. The momentum conservation between methylammonium lead iodide (MAPbI3) perovskite and 2D Bi2O2Se semiconductor is demonstrated to significantly promote hot-carrier extractions and interlayer carrier transports at the reconfigurable MAPbI3/Bi2O2Se heterojunction, leading to an ultrahigh photoresponsivity of 6 × 107 AW−1, accompanied by a specific detectivity of 5.2 × 1011 Jones, and a dynamic range of 110 dB. Moreover, we fabricate Bi2O2Se/MAPbI3 heterotransistor arrays in the size of 3 × 3 that effectively performed in-sensor analog multiply-accumulate (MAC) operations to extract key image features under the challenging illumination of moonlight (i.e., 0.1 μW cm−2). The phototransistor arrays are further employed in convolutional neural networks (CNNs) to improve the detection of traffic lights under dim-light environments and deliver remarkable recall rates of 99.2%, 95.8%, and 92.6% for green, red, and yellow signals, respectively.

Results and discussion

Momentum conservation for hot-carrier transitions

Theoretical insights have suggested that the momentum conservation of energy band levels could facilitate interband hot-carrier extractions16,17. Recently, we have demonstrated Bi2O2Se as an emerging 2D semiconductor that is characterized by its distinctive electronic-ionic layered structure with ultrahigh carrier mobility18,19,20. In addition, the [Bi2O2]n2n+ layer adopts 2D square lattices that are structurally compatible with perovskite crystals21, making it an ideal candidate for assisting the carrier transport of halide perovskites. In this context, the density functional theory (DFT) was employed to calculate the band structures of MAPbI3 and Bi2O2Se (Supplementary Note 1). Figure 1b reveals that MAPbI3 possesses a direct bandgap (Eg) of ~1.4 eV with its conduction-band minimum (CBM) and valence-band maximum (VBM) both located at the Γ point. Conversely, Bi2O2Se exhibits an indirect bandgap of ~0.8 eV, bridging the VBM at the N point and the CBM at the Γ point (Fig. 1c). In particular, the VBM and CBM of MAPbI3, alongside the CBM of Bi2O2Se, all reside at the Γ point, indicating that these band levels are momentum conserved22. This theoretical alignment was further experimentally verified through the temperature-dependent steady-state photoluminescence (PL) spectra (Fig. 1d), wherein the emerging peak at 965 nm was correlated with the interband transfer between the VBM of MAPBI3 and the CBM of Bi2O2Se (i.e., ~1.28 eV), affirming the momentum-matched bandgap at the heterojunction interface. This founding signifies that the MAPbI3/Bi2O2Se heterojunction is primed for interband photocarrier transitions as illustrated in Fig. 1e: the momentum conservation between the VBM of MAPbI3 and the CBM of Bi2O2Se allows direct photoexcitation from the perovskite layer into the high-mobility Bi2O2Se channel, while the momentum-conserved CBMs facilitate hot-carrier extractions from MAPbI3 to Bi2O2Se. Taking together, both are anticipated to markedly boost the photosensing efficiency of MAPbI3/Bi2O2Se heterostructure.

MAPbI3/Bi2O2Se heterostructured phototransistors were manufactured in the architecture of back-gated field-effect transistors as depicted in Fig. 2a. An ultrathin Bi2O2Se layer (i.e., ~5 nm) was adopted for effective gate modulations, while the MAPbI3 thickness (i.e., ~140 nm) was optimized to amplify photon absorption. The cross-sectional high-resolution transmission electron microscopy (HRTEM) in Fig. 2b displays a clear interface between MAPbI3 and Bi2O2Se domains. Notably, the single-crystal Bi2O2Se layer exhibited a body-centered tetragonal structure, closely adjoining the tetragonal perovskite crystals with micrometer-sized grains (Supplementary Fig. 1)23. This crystalline compatibility is beneficial for crafting a high-quality MAPbI3/Bi2O2Se heterojunction that is important for acquiring superior optoelectronic behaviors. Figure 2c shows that the MAPbI3/Bi2O2Se heterotransistor reached excellent mobility exceeding 300 cm2 V−1 s−1 with a high on/off ratio of 107, underscoring the successful establishment of premium MAPbI3/Bi2O2Se heterojunction.

Fig. 2: Momentum conservation facilitates hot-carrier transitions.
figure 2

a Illustration of the MAPbI3/Bi2O2Se heterojunction-based field-effect transistor (FET) for neuromorphic vision sensing. b Cross-sectional high-resolution transmission electron microscopy (HRTEM) image of the MAPbI3/Bi2O2Se heterojunction. The scale bar is 2 nm. c Typical transfer curves of MAPbI3/Bi2O2Se heterojunction-based FETs. d Time-resolved photoluminescence (TRPL) spectroscopy of MAPbI3 and MAPbI3/Bi2O2Se heterojunction films upon the light excitation of 470 nm. Pseudocolor femtosecond transient absorption (fs-TA) plots of e MAPbI3, and f MAPbI3/Bi2O2Se heterojunction. g Normalized fs-TA spectra of MAPbI3 and MAPbI3/Bi2O2Se heterojunction films within the 1.2 ps after the photoexcitation. h Decay dynamics of the hot-carrier temperature within MAPbI3 and MAPbI3/Bi2O2Se heterojunction films. i Decay dynamics of the ground-state photobleaching band peaks at 1.65 eV within MAPbI3 and MAPbI3/Bi2O2Se heterojunction films.

MAPbI3/Bi2O2Se heterojunction film exhibites excellent light absorptions in the broadband visible-NIR spectrum of 300–1500 nm (Supplementary Fig. 2). Photocarrier dynamics at the MAPbI3/Bi2O2Se heterojunction were probed using time-resolved photoluminescence spectroscopy as shown in Fig. 2d. The observed photocarrier decay kinetics align well with the two-body recombination model, indicating the non-geminate recombination is predominated within MAPbI324. The heterojunction displayed an accelerated photocarrier decay rate (i.e., τ1 = 11 ± 1 ns and τ2 = 91 ± 3 ns) as compared with that of MAPbI3 (i.e., τ1 = 68 ± 2 ns and τ2 = 271 ± 5 ns), validating the fast photocarrier transfer from MAPbI3 to Bi2O2Se that ascribed to the momentum conservation between the CBMs of MAPbI3 and Bi2O2Se. Furthermore, femtosecond transient absorption (fs-TA) spectroscopy was employed to scrutinize the hot-carrier dynamics, and a pump photon energy of 3.1 eV (i.e., 1.5 eV above the bandgap of MAPbI3) with a low pump fluence of 1 μJ cm−2 was used to avoid the non-radiative Auger recombination25. Figure 2e, f shows the pseudocolor fs-TA plots of MAPbI3 and MAPbI3/Bi2O2Se heterojunction films, respectively. Within the initial 0.1 ps after photoexcitation, a negative photoinduced absorption (PIA) band centered at 1.58 eV (i.e., the blue region) appeared in both films, which was ascribed to the hot-carrier induced bandgap renormalization. As the PIA band decayed with a blue shift disclosed in the normalized fs-TA spectra (Fig. 2g), a positive ground-state photobleaching (GSB) band centered at 1.65 eV (i.e., the red region) with a high-energy tail extended to 2.0 eV (i.e., the white region) became apparent, further confirming the population of high-energy hot carriers26,27. The high-energy tail of GSB band was observed to reach thermal equilibrium within 1.2 ps, and its decay dynamics can be elucidated by tracking the temporal profile of hot-carrier temperature (Tc). Since the energy of GSB tail is far above the band edge, the hot-carrier distribution can be approximated to the Maxwell–Boltzmann distribution, and then the Tc is derived from: \(\Delta A\left(E\right)=-{A}_{0}(E)\exp (-\frac{E}{{k}_{{{{\rm{B}}}}}{T}_{{{{\rm{c}}}}}})\), where ∆A is the transient absorbance, kB is the Boltzmann constant28. Figure 2h compares the hot-carrier cooling curves of MAPbI3 and MAPbI3/Bi2O2Se heterojunction films. The inherent polar nature of MAPbI3 results in an exponential decline in Tc, primarily driven by the electron-LO-phonon scatterings. In particular, the heterojunction exhibited an increased cooling rate with Tc dropping to ~1500 K within 0.3 ps after photoexcitation as compared that of ~1800 K in MAPbI3, indicating efficient extractions of high-energy photocarriers from MAPbI3 into Bi2O2Se anchored on the momentum conservation between their respective CBMs. The relatively modest exciton binding energy (i.e., 19 ± 3 meV) suggests the simultaneous presence of excitons and free carriers at the band edge of MAPbI3 at room temperature29, and the absence of obvious Moss-Burstein shift in the GSB peak of 1.65 eV implies that the decay dynamics of MAPbI3 and MAPbI3/Bi2O2Se heterojunction films at the band edge are probably dominated by exciton quenching to free carriers24,26. In addition, Fig. 2i reveals an accelerated decay rate for the GSB peak within the MAPbI3/Bi2O2Se heterojunction along the timescale of 600 ps, providing more evidence on the effective photocarrier extractions at the momentum-conserved heterojunction.

Ultrasensitive gate-tunable bipolar photoresponses

Significantly enhanced photocarrier transitions at the MAPbI3/Bi2O2Se heterojunction driven by the momentum conservation anticipates a remarkable boost in the optoelectronic efficiencies of the heterotransistors. Figure 3a presents the transfer curves of MAPbI3/Bi2O2Se heterotransistors under the illumination of 532 nm with varied light intensities. Positive photoresponses were observed at the Vg below 0.5 V, as the light-induced source-drain photocurrent surpassed the dark current. Conversely, the heterotransistors rendered negative photoresponses beyond this Vg threshold as shown in Fig. 3b. Remarkably, the photoresponsivity (R) defined as R = Iph/P exceeded the order of 106 AW−1 for both positive and negative responses, outstripping its MAPbI3 and Bi2O2Se counterparts (Supplementary Fig. 3), where Iph is the net photocurrent, and P is the incident light intensity. The specific detectivity (D*) is determined by \({D}^{*}=\frac{R\sqrt{A\Delta f}}{{i}_{{{{\rm{n}}}}}}\), where A is the effective detection area, Δf is the operational bandwidth, and in is the noise current. The noise spectrum of MAPbI3/Bi2O2Se heterotransistor at varied gate voltages was characterized (Supplementary Fig. 4). An remarkable R of 6 × 107 A/W was achieved accompanied with a D* of 5.2 × 1011 Jones and an external quantum efficiency (EQE) of 1.2 × 108%, marking a pronounced advancement of optoelectronic performances over prior bipolar photosensors2,30,31. More impressively, the heterotransistors consistently delivered high R values over a broad light-intensity range from 0.1 μW cm−2 to 30 mW cm−2 following a sublinear relationship on the logarithmic scale (Fig. 3c). This led to an expansive dynamic range (DR) of 110 dB defined by \({{{\rm{DR}}}}=20\times \log (\frac{{P}_{\max }}{{P}_{\min }})\)32, where Pmax is the maximum light intensity corresponding to the measured maximum photocurrent, and Pmin is the minimum one, thus reinforcing its versatile applicability across diverse optoelectronic applications from bright sunlight to dim moonlight.

Fig. 3: Photoresponse characterization of reconfigurable MAPbI3/Bi2O2Se heterotransistors.
figure 3

a Transfer curves and b net photocurrents of MAPbI3/Bi2O2Se heterotransistor under 532 nm illumination with varied light intensities. c Light-intensity dependence of the photoresponsivity R and the specific detectivity D* under 532 nm illumination at the gate voltage of 0 V. d Pseudo-color plot of the photoresponsivity under 532 nm illumination with varied light intensities, in which the gate voltage was tuned from −6 to 6 V. e Dependence of bipolar photoresponsivity on the gate voltage in the visible illumination range of 400–800 nm. f Benchmark of the photoresponsivity in various unipolar and bipolar photosensors. g Linear dependence of the bipolar photoresponsivity on the gate voltage under 532 nm illumination. h Dependence of the net photocurrent on the light intensity under 532 nm illumination with the gate voltage varying from −3 to 3 V. i Photocurrent curves measured under the on/off recycles for negative photoresponses. jl Illustration of band alignment and interlayer carrier transfer of the MAPbI3/Bi2O2Se heterotransistor under the gate voltage of 0 V, −3 V, and 3 V, respectively.

Bipolar photoresponses of MAPbI3/Bi2O2Se heterotransistors can be modulated by adjusting the gate voltage throughout the visible spectrum (Supplementary Fig. 5). The mapping plots of bipolar photoresponses as a function of the gate voltage as well as the incident light intensity at the wavelengths of 532 nm, 640 nm, and 780 nm are depicted in Fig. 3d and Supplementary Fig. 6, providing a full framework to implement varied photoresponsivities in NV sensors to update the weights for in-sensor convolutional operations33. Figure 3e displays that the photoresponsivity of heterotransistors maintained at a high-level ranging from 103 AW−1 to 106 AW−1 for both positive and negative photoresponses across the visible spectrum of 400–800 nm, demonstrating robust sensitivities for detecting a broad range of colorful images. Figure 3f benchmarks the photoresponsivity of bipolar photosensors crafted from various low-dimensional materials with diverse architectures2,30,31,34,35,36, wherein the MAPbI3/Bi2O2Se heterotransistor markedly achieved the highest value.

In addition, the linear dependency of bipolar photoresponses on gate voltages across the visible spectrum is depicted in Fig. 3g and Supplementary Fig. 7. This linear relationship allows the direct employment of positive and negative bipolar photoresponsivity as CNN weights by adjusting gate voltages. The photocurrent of our heterotransistor photodetectors exhibited sublinear dependences on the light intensity on the logarithmic scale (Fig. 3h and Supplementary Fig. 8). This logarithmic sublinear photoresponse prevents the saturation under high light intensities, similar to human eyes, allowing the device to effectively perceive both bright and dark images37. Moreover, we comprehensively analyzed both the photoresponsivity and photocurrent of all nine devices within the 3 × 3 MAPbI3/Bi2O2Se heterotransistor array under 640 nm illumination (Supplementary Figs. 9 and 10), revealing that the maximum positive photoresponsivity was consistent in the range of (2.4 ± 0.9) × 105 AW−1 and the maximum negative one maintained in the range of (−4.8 ± 1.1) × 105 AW−1 (Supplementary Figs. 11a, b). In addition, all the nine devices within the heterotransistor array maintained uniform photocurrents at varied gate voltages under both low and high light intensities (Supplementary Figs. 11c, d). We also subjected the devices to the stability testing. At the very beginning, all the nine devices in the array exhibited uniform bipolar photoresponses (Supplementary Figs. 12 and 13). After storing in an ambient environment (i.e., 27 ± 1 °C and 55 ± 5 % humidity) for 168 h (i.e., one week), all devices mostly maintained the bipolar behavior (Supplementary Figs. 14 and 15). It is noteworthy that critical challenges of ensuring the long-term stability of MAPbI3 perovskite, maintaining the uniformity of film qualities, and preventing defects during the film transfer process must be addressed to achieve large-scale productions. In addition, the device photoresponse to continuous light pulses was tested under different gate voltages, confirming their excellent reproducibility (Fig. 3i and Supplementary Fig. 16). The response time for negative photoresponses was characterized in the magnitude order of 10 ms that comparable to those of biological bipolar cells38, whereas the response time for positive photoresponses maintained in the magnitude order of 100 ms (Supplementary Fig. 17).

To elucidate the physical mechanisms for the gate-tunable bipolar photoresponses in the MAPbI3/Bi2O2Se heterojunction, ultraviolet photoelectron spectroscopy was employed to determine the VBM and Fermi level (EF) of MAPbI3, which were identified as −5.6 eV and −4.2 eV, respectively (Supplementary Fig. 18). The optical bandgap of MAPbI3 was measured to be 1.6 eV by steady-state photoluminescence spectrum, thus yielding the MAPbI3 CBM of −4.0 eV. Similarly, the VBM, CBM, and EF of Bi2O2Se were characterized at −5.0 eV, −4.2 eV, and −4.3 eV, respectively39. Thus, a type-I band alignment is proposed for the MAPbI3/Bi2O2Se heterojunction in Fig. 3j, where the conduction band offset ΔEc and the valance band offset ΔEv are calculated to be 0.2 eV and 0.6 eV. When subjected to a negative Vg, the electrostatic-doping effect is expected to downshift the EF of both Bi2O2Se and MAPbI3, promoting the formation of a p-n junction at the heterointerface as depicted in Fig. 3k. In this configuration, an increased ΔEv restricts the hole transport, while a negligible ΔEc favors the electron transport into the Bi2O2Se channel, resulting in positive photoresponses. In addition, the direct photoexcitation of electrons from the VBM of MAPbI3 to the CBM of Bi2O2Se due to the momentum conservation will further enhance the positive photoresponses40. In the case subjected to a positive Vg, the EF of MAPbI3 approaches its CBM, leading to an n-n type contact with Bi2O2Se as illustrated in Fig. 3l. This configuration elevates the ΔEc, impedes the electron transport from MAPbI3 to Bi2O2Se, facilitates the hole injection into the Bi2O2Se channel, and eventually yields negative photoresponses. It is noteworthy that the ambipolar feature of MAPbI3 perovskite plays a crucial role in modifying the band alignment via the electrostatic gating41, allowing for the modulation between p-n junction (i.e., positive photoresponses) and n-n junction (i.e., negative photoresponses) in the MAPbI3/Bi2O2Se heterojunction.

Dim-light neuromorphic vision sensing

The MAPbI3/Bi2O2Se heterotransistor array in the size of 3 × 3 was fabricated as shown in Fig. 4a. The detailed manufacturing process is schemed in Supplementary Fig. 19. The photocurrents through the MAPbI3/Bi2O2Se heterotransistor surpassed those through the singular perovskite device by at least three orders of magnitude (Supplementary Figs. 20a–c). Photocurrent mapping further confirms the significantly high photocurrent intensity within the MAPbI3/Bi2O2Se heterojunction zone in comparison to adjacent regions (Supplementary Fig. 20d). This indicates that the predominant photocurrent conduction occurs through the MAPbI3/Bi2O2Se domain, whereby the large electrical resistance and minimal photoresponse of the intervening perovskite layer between pixels act as effective barriers against photocurrent leakages. This array merges the biological functions of photoreceptor and bipolar cells, wherein the photosensing efficiency is enhanced by the momentum-conserved interlayer hot-carrier transfer, while the gate-tunable band alignment paves the way for reconfigurable MAC operations, facilitating the extraction of key image features as illustrated in Fig. 4b, c. In particular, the photoresponsivity Rij of each heterotransistor can be swayed to either positive or negative directions by applying the corresponding gate voltage Vij, drawing parallels with updating weights in neural networks. The photocurrents generated from each line are then collectively integrated across the array, adhering to the Kirchhoff’s law33, to execute in-sensor MAC operations of \({I}_{{{{\rm{out}}}}}={\sum }_{i,j}^{i=3,j=3}{({I}_{{{{\rm{ph}}}}})}_{{ij}}={\sum }_{i,j}^{i=3,j=3}{R}_{{ij}}\times {P}_{{ij}}={I}_{{{{\rm{d}}}}}-{I}_{{{{\rm{d}}}}0}\), where Id is the readout current of the array under illumination, and Id0 is the readout dark current of the array. Indeed, the photoresponsivity emerges as one of major factors that influences the capability of NV sensory under complex environments. A high-photoresponsivity amplifies photocurrents under dim-light conditions, eliminating the need for additional gain circuits and the associated energy costs for signal boosting. These augmented photocurrents increase resistances against signal noises, ensuring the accuracy of in-sensor data processing. In quantifying the energy consumption of the heterotransistor throughout a singular light pulse episode, we suppose the energy consumption into two principal components: the energy E1 requisite for sustaining the photoresponsivity as E1 = Vg×Ig×Tduration, where Vg is the gate voltage essential for sustaining the photoresponsivity, Ig is the gate current, and Tduration is the duration of single light pulse operation; and the energy E2 inherent to the in-sensor computing function as E2 = Ilight×Vds×Tduration. Employing the Tduration parameter set at 100 ms, we estimate the energy consumption spans from 10−3 to 1 nJ dependent on the varied gate voltages.

Fig. 4: Implementation of the heterotransistor array for dim-light classification.
figure 4

a Scanning electron microscopy (SEM) image of as-fabricated MAPbI3/Bi2O2Se heterotransistor array. The scale bar is 10 μm. b Illustration of the reconfigurable heterotransistor array adapted for in-sensor neural network computing. c Schemed circuit for in-sensor analog multiply-accumulate operations. d Training flow of classifying the letters of “P”, “K”, and “U” under the dim-light intensity of 0.1 μW cm−2 by using the heterotransistor arrays with various photoresponsivities. e Output photocurrents of the heterotransistor arrays with various photoresponsivities executing in-sensor analog multiply-accumulate operations for classifying the letter “U”. f Loss and g recognition accuracy for various photoresponsivities over the epoch.

To evaluate the sensory capability of our ultrasensitive heterotransistor array, it was assigned to the practical task of classifying the letters of “P”, “K”, and “U” under the dim-light intensity of 0.1 μW cm−2, which was comparable to that of natural moonlight (Supplementary Note 2). Figure 4d schemes the training process of the reconfigurable heterotransistor array with varied bipolar photoresponsivities in the magnitude order of 102 AW−1, 104 AW−1, and 106 AW−1, respectively. The detailed training algorithm is charted in Supplementary Fig. 21. For each training epoch, total of 210 Gaussian-noised letters were optically projected onto the heterotransistors for training. Through the one-hot encoding approach42, each letter was programmed to activate single output node. The softmax function \(S({I}_{{{{\rm{out}}}}})=\frac{{{{{\rm{e}}}}}^{{I}_{{{{\rm{out}}}}}}}{{\sum }_{k=1}^{3}{e}^{{R}_{k}{P}_{k}}}\) was utilized as the activation function43, denoting the probability of each letter falling into the class k. After each epoch n, the \({R}_{{ij}}^{n}\) was updated via the gradient backpropagation of the loss function \({R}_{{ij}}^{n+1}={R}_{{ij}}^{n}-\frac{\eta }{M}{\sum}_{P}{\nabla }_{{R}_{{ij}}^{n}}\psi\), where η is the learning rate, M is the total number of projected letters, ψ is the loss function \(\psi=-\frac{1}{k}{\sum }_{k=1}^{3}{y}_{m}\log [S({I}_{{{{\rm{out}}}}})]\), and ym is the label44. We note that the updated photoresponsivity \({R}_{{ij}}^{n+1}\) was practically implemented in the heterotransistors by adjusting Vg according to the mapping plot in Fig. 3d.

As the number of training epoch increased, each Rij of the heterotransistor array was progressively refined, ultimately enabling the identification of target letter when its average output current exceeded that of the other two (Fig. 4e and Supplementary Fig. 22). Notably, the high-photoresponsivity heterotransistor in the magnitude order of 106 AW−1 readily yielded amplified readout photocurrents without the need of extra denoising circuits, thus conferring a high SNR as compared with that of low-photoresponsivity ones. An elevated SNR ensures accurate data acquisition, empowering the device to effectively process visual information even in dim-light environments. Figure 4f, g compares the loss function and the corresponding recognition accuracy of phototransistor arrays with varied photoresponsivities. The low photoresponsivity in the magnitude order of 102 AW−1 induced pronounced fluctuations in both loss functions and recognition accuracies that impeded letter classifications. In contrast, the heterotransistors with the photoresponsivity in the magnitude order of 104 AW−1 demonstrated a swift decline in the loss function, and attained a 100% recognition accuracy after 50 training epochs, which further decreased to 20 epochs for the high photoresponsivity in the magnitude order of 106 AW−1, largely minimizing the response time for real-time visual classification tasks.

The heterotransistor array can serve as an in-sensor convolution kernel to extract key features of a full visual image (Supplementary Note 3). According to the Kirchhoff’s law, a full image was sequentially projected through the kernel to divide into 3 × 3 sub-images to perform the convolution operation, and each sub-Iout was reconstructed to generate the kernel-processed image with enhanced features. Benefiting from the ultrasensitive bipolar photoresponses of MAPbI3/Bi2O2Se heterotransistors, a diverse set of tunable photoresponse weights can be implemented within the 3 × 3 kernel to proficiently process image data. As exampled in Fig. 5a, the convolution kernels for Gaussian, Inverse, and Laplacian operators were experimentally implemented in the heterotransistor array to complete the image processing of noise reduction, image stylization, and edge detection, respectively45. The pixel data of a dim-light image captured at Peking University were converted to optical signals, and then projected onto the heterotransistors, wherein specific configurations of Gaussian, Inverse, and Laplacian operators were deployed with their respective Vg values. All operators were successfully processed on the dim-light images. The ultrasensitive heterotransistor array produced more detailed edges of the stone lion and the school gate under the dim-light condition, and the edge-detection entropy of highly responsive heterotransistor array (i.e., R = 2 × 106 AW−1) was 1.9 × 10−3 compared with that of 1.5 × 10−5 for low-responsive one (i.e., R = 5 × 104 AW−1), indicating that more effective information was extracted from the image by the high-photoresponsivity heterotransistors.

Fig. 5: Ultrasensitive dim-light neuromorphic vision sensing.
figure 5

a Implementation of Gaussian, Laplacian, Inverse operators in the MAPbI3/Bi2O2Se heterotransistor array for the image processing of noise reduction, edge detection, and image stylization, respectively. b Scheme of integrating the MAPbI3/Bi2O2Se heterotransistor array with the YOLOv4 network for the traffic-light detection task under dim-light conditions. c Recall–precision curves of green, red, and yellow traffic lights preprocessed by the heterotransistor array under dim-light conditions. d Mean average precisions (mAP) of the raw and the preprocessed images for green, red, and yellow traffic lights.

To evaluate the effect of varying photoresponsivity levels on image processing, we configured the array into Laplacian convolution kernels by adjusting the gate voltages. As shown in Supplementary Fig. 23, the high-responsivity configurations successfully detected edges in both bright and dim images, whereas the low-responsivity configurations led to significant edge information loss in dim images. The ultrasensitive reconfigurable heterotransistor array holds promises for improving dim-light visual perception in CNNs2,46. In this regard, the heterotransistor array was interconnected with the YOLOv4 neural network to detect traffic lights from complex surroundings47,48. In the preprocessing of traffic-light images, each of the RGB channels underwent individual processing facilitated by a 3 × 3 bipolar phototransistor array that serves as a Gaussian filter. This approach involved the application of distinct wavelength and bias parameters to each channel, meticulously configured to optimize the convolution and denoising phases of image preprocessing. Dim-light images sourced from LISA Traffic Lights Dataset were interfered with Gaussian noises (i.e., standard deviation of 0.2) to serve as deteriorated raw images, which were then projected onto the reconfigurable heterotransistors to emulate simultaneous sensing and preprocessing with convolution kernels. The resultant Iout was transformed back into pixel data and fed into the YOLOv4 neural network for further training and detection tests as illustrated in Fig. 5b. The presence of noise within RGB imagery precipitates the blurring of details and the distortion of color fidelity (Supplementary Fig. 24). Conversely, the application of denoising process facilitates the restoration of visual clarity and intricacy, a transformation underscored by the histogram evolution from the homogenously smooth distribution to one characterized by multiple peaks. The integration of denoising into the preprocessing regimen significantly elevates the quality of the image, simultaneously expediting the training convergence of neural network. This efficiency gain is attributed to the diminished necessity for the network to adaptively learn the exclusion of noise artifacts, thereby streamlining the computational process and enhancing the overall performance of the neural network in machine-learning applications. The precision-recall curves of the preprocessed images demonstrated excellent recalls of 99.2%, 95.8%, and 92.6% for green, red, and yellow lights, respectively, as compared with those of 89.1%, 72.6%, and 65.2% from raw images (Fig. 5c and Supplementary Fig. 25). The mean average precision of traffic-light detection was improved from 73.2 to 95.9% by adopting the preprocessing as seen in Fig. 5d, validating the in-sensor processing effectiveness of reconfigurable MAPbI3/Bi2O2Se heterotransistors with excellent optoelectronic photoresponses.

In conclusion, we have demonstrated the cutting-edge reconfigurable NV sensor based on the MAPbI3/Bi2O2Se heterostructure, wherein the momentum conservation between MAPbI3 perovskite and 2D Bi2O2Se semiconductor plays a crucial role in facilitating interlayer hot-carrier transitions, setting up a benchmark with the bipolar photoresponsivity of 107 AW−1, the specific detectivity of 1011 Jones, and the dynamic range of 110 dB. By adopting these ultrasensitive bipolar photoresponses as CNN weights, the heterotransistors have been tailored to execute reconfigurable in-sensor convolution operations. This capability enables proficient early visual processing tasks even under the challenging dim illumination of 0.1 μW cm−2 reminiscent of natural moonlight conditions, obviating the need for additional gain circuits and the associated energy consumptions. Our work marks the significant advancement towards highly sensitive, energy-efficient, and in-sensor neuromorphic sensors. Distinguished by its high responsiveness and in-sensor computing capabilities, the MAPbI3/Bi2O2Se heterotransistor presents a promising avenue for creating systems that boast improved performance and functionality when integrated with neural networks or various electronic components.

Methods

Device fabrications

The devices were manufactured in the architecture of back-gate field-effect transistors. Bi2O2Se films were grown on a mica substrate by chemical vapor deposition, followed by transferring to HfO2/silicon substrates using the poly(methyl methacrylate)-assisted method. To fabricate a patterned Bi2O2Se array, a 3 × 3 mask was applied with electron beam lithography (Raith, Voyager) to define the channel pattern, and then the exposed areas were etched by argon plasma (Leuven Instruments, LCCP−6A reactive ion etcher). The source and drain electrodes (Pd/Au=5/60 nm) were deposited using a combination of standard electron-beam lithography (Raith, Voyager), electron-beam evaporation (DETECH, DE400), and lift-off processes. The substrate hydrophilicity was enhanced by subjecting it to a slight treatment with oxygen plasma (Harrick plasma, PDC-002). Subsequently, the solution of methylammonium lead iodide perovskite dissolved in dimethylformamide was spin-coated onto the substrate, followed by the anti-solvent treatment with chlorobenzene, and then the films were thermally annealed at 100 °C for 15 min under N2 atmosphere.

Device characterizations and electrical measurements

Cross-sectional transmission electron microscopy (TEM) specimens were prepared using an in situ focused ion beam (FIB) lift-out technique. Time-resolved photoluminescence (TRPL) spectroscopy was performed on an Edinburgh Instruments spectrometer (FLS 1000) with a 468 nm excitation light source. High-resolution imaging was subsequently captured by spherical aberration-corrected TEM (Thermo, Themis Z) operating at an accelerating voltage of 200 kV. Femtosecond transient absorption (fs-TA) spectroscopy was characterized by using a commercial spectrometer (Light conversion, harpia/pharos). The pump photon energy was set to 3.1 eV (i.e., 400 nm) with the intensity of 1 μJ cm−2. Time-resolved fs-TA spectrum was scanned from 2.07 eV (i.e., 600 nm) to 1.46 eV (i.e., 850 nm) with a temporal resolution of 100 fs. Electrical characterizations were conducted using a probe station (Lakeshore, CRX-VF) equipped with an Agilent B1500A semiconductor parameter analyzer. Incident illuminations with varied light intensities were processed using a 532 nm (CNI, MDL-III-780 nm), 640 nm (CNI, MDL-III-780 nm), and 780 nm laser (CNI, MDL-III-780 nm), and the power density was measured by a power meter (Newport, 843-R).

Implementation of pattern classification

The task of classifying the letters “P”, “K”, “U” is executed using Multilayer Perceptron (MLP) model. In this task, the architecture encompasses an input layer furnished with 9 neurons \(({x}_{1},{x}_{2},\ldots,{x}_{9})\), mirroring the 9-pixel values of the input letter image, and culminates in an output layer with 3 neurons (\({y}_{{{{\rm{P}}}}},{y}_{{{{\rm{K}}}}},{y}_{{{{\rm{U}}}}}\)), each dedicated to one of the target letters. The interconnection between input and output strata is facilitated via a weight matrix W with the output neuron activation \({a}_{j}\) determined by \({a}_{j}={\sum }_{i=1}^{9}{W}_{{ij}}\cdot {x}_{i}+{b}_{j}\), where \({W}_{{ij}}\) is the weight from the input neuron \(i\) to the output neuron \(j\), and \({b}_{j}\) is the bias term for the output neuron \(j\). This configuration enables the precise recognition of letters, as the network discriminates based on the highest activation value among the output neurons.

In a typical hardware realization, this neural construct can be emulated using an array of bipolar sensors, epitomizing in-sensor computing. In this work, the photoresponsivity \({R}_{{ij}}\) of MAPbI3/Bi2O2Se heterotransistor array are tailored through the gate voltage \({{V}_{{{{\rm{g}}}}}}_{{ij}}\) to reflect the synaptic weights \({W}_{{ij}}\) of a neural network. Upon exposure to light encoding the input data \({P}_{{ij}}\), each device in the array generates a photocurrent \({I}_{{ij}}\) proportional to the product of the input signal and its responsivity: \({I}_{{ij}}={R}_{{ij}}\cdot {P}_{{ij}}\). The total current output \({I}_{{{{\rm{out}}}}}={\sum }_{i,j}^{i=3,j=3}{I}_{{ij}}={\sum }_{i,j}^{i=3,j=3}{R}_{{ij}}\cdot {P}_{{ij}}\), embodies the multiply-accumulate (MAC) operation within the sensor array, harnessing Kirchhoff’s law to execute computation.

For the classification task, a specialized dataset comprising 100 instances of Gaussian-noise afflicted “P”, “K”, and “U” letter patterns (i.e., noise standard deviation of 0.2) is segmented into training and testing sets in a 7:3 ratio. The letters are illuminated onto the sensor array with peak incident light intensity set at 0.1 μW cm−2. Comparative analysis is conducted across three sensor arrays, differentiated by the photoresponsivity of 102 AW−1, 104 AW−1, and 106 AW−1, respectively. The photoresponsivity of each heterotransistor is initialized randomly at the beginning. The projection of the letter image induces a readout current \({I}_{{{{\rm{out}}}}}={\sum }_{i,j}^{i=3,j=3}{I}_{{ij}}={\sum }_{i,j}^{i=3,j=3}{R}_{{ij}}\cdot {P}_{{ij}}\), which is activated by the softmax function \(S({I}_{{{{\rm{out}}}}})=\frac{{e}^{\xi {{{\rm{\cdot }}}}{I}_{{{{\rm{out}}}}}}}{{\sum }_{k=1}^{3}{e}^{\xi \cdot {R}_{k}{P}_{k}}}\), where \(k\) is the elements within the “P”, “K”, “U” set, and ξ is the scaling factor. By integrating impacts of noise and hysteresis into our simulation, and to elucidate the effects of different responsiveness on recognition accuracy, a Gaussian noise with the variance of 0.003 was deployed on the readout photocurrent. A loss function \(\psi=-\frac{1}{k}{\sum }_{k=1}^{3}{y}_{m}\log [S({I}_{{{{\rm{out}}}}})]\) is defined to quantify the error for training, where ym is the label of each input letter. Through iterative training, employing the backpropagation algorithm with a learning rate of 0.001, the photoresponsivity \({R}_{{ij}}\) of each sensor in the array is fine-tuned by modulating the gate voltage \({V}_{{{{\rm{g}}}}}\), thereby minimizing the loss function and enhancing classification accuracy. It is worth noting that, due to the device-to-device variations in the array, which are significant non-ideal factors in neuromorphic computing, we edited the photoresponsivity-gate voltage (RVg) data measured from the nine devices into a lookup table. This enables us to accurately simulate the conductance modulation process of these devices.