8-bit states in 2D floating-gate memories using gate-injection mode for large-scale convolutional neural networks

Cai, Yuchen; Yang, Jia; Hou, Yutang; Wang, Feng; Yin, Lei; Li, Shuhui; Wang, Yanrong; Yan, Tao; Yan, Shan; Zhan, Xueying; He, Jun; Wang, Zhenxing

doi:10.1038/s41467-025-58005-z

Download PDF

Article
Open access
Published: 18 March 2025

8-bit states in 2D floating-gate memories using gate-injection mode for large-scale convolutional neural networks

Yuchen Cai^1,2,
Jia Yang^1,2,
Yutang Hou³,
Feng Wang ORCID: orcid.org/0000-0001-7428-2336^1,2,
Lei Yin ORCID: orcid.org/0000-0002-4543-4510³,
Shuhui Li¹,
Yanrong Wang⁴,
Tao Yan¹,
Shan Yan¹,
Xueying Zhan^1,2,
Jun He ORCID: orcid.org/0000-0002-5998-5225³ &
…
Zhenxing Wang ORCID: orcid.org/0000-0002-9882-4774^1,2

Nature Communications volume 16, Article number: 2649 (2025) Cite this article

8019 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

Electronic devices

Abstract

The fast development of artificial intelligence has called for high-efficiency neuromorphic computing hardware. While two-dimensional floating-gate memories show promise, their limited state numbers and stability hinder practical use. Here, we report gate-injection-mode two-dimensional floating-gate memories as a candidate for large-scale neural network accelerators. Through a coplanar device structure design and a bi-pulse state programming strategy, 8-bit states with intervals larger than three times of the standard deviations and stability over 10,000 s are achieved at 3 V. The cycling endurance is over 10⁵ and the fabricated 256 devices show a yield of 94.9%. Leveraging this, we carry out experimental image convolutions and 38,592 kernels transplanting on an integrated 9 × 2 array that exhibits results matching well with simulations. We also show that fix-point neural networks with 8-bit precision have inference accuracies approaching the ideal values. Our work validates the potential of gate-injection-mode two-dimensional floating-gate memories for high-efficiency neuromorphic computing hardware.

2D (NH₄)BiI₃ enables non-volatile optoelectronic memories for machine learning

Article Open access 13 February 2025

Resource constrained neural network training

Article Open access 29 January 2024

Implementation of binarized neural networks immune to device variation and voltage drop employing resistive random access memory bridges and capacitive neurons

Article Open access 18 June 2024

Introduction

Machine learning and artificial intelligence based on neural networks (NNs) have shown remarkable capabilities across a wide range of applications, including autonomous driving, weather prediction, speech recognition, and image understanding^1,2,3,4. And it has a substantial demand for accelerators like graphics processing units, which are well-suited for handling large-scale, parallel multiply-and-accumulate operations. However, back-and-forth data movement between the physically separated memory and logic units in the conventional von Neumann architecture and the digital data processing paradigm imposes significant limitations on the system efficiency^5,6. Consequently, there is a growing interest in high-efficiency neuromorphic computing hardware (NCH), particularly for intelligent edge devices that can process and store data locally and in an analog manner, akin to the human brain^7,8,9.

At the algorithm level, NNs handle weights with infinite precision, a luxury that NCH cannot afford. To implement NNs at the edge, it is necessary to train and/or infer within device-level nodes that have limited numerical precision. Theoretical simulations have shown that many deep NNs with 8 to 24-bit precision will suffer almost no accuracy degradation compared to a much higher precision, owing to stochastic rounding schemes and the large amounts of parameters they usually contain^10,11,12,13. On the other hand, excessively low precision (such as <8-bit) may lead to performance degradation instead, particularly in small-sized NNs deployed on edge devices that require high energy efficiency, as each parameter has a greater impact on their overall performance¹⁴. Whether training a fixed-precision NN directly or downloading and quantizing a pre-trained NN to achieve a fixed-precision network, devices capable of supporting many distinguishable conductance levels are crucial.

Non-volatile memories, such as floating-gate memories (FGMs)^{15,16,17,18,19,20}, resistive switching memories^{8,21,22,23,24}, phase change memories^25,26, and ferroelectric memories^27,28,29, have emerged as candidates for NCH. Among them, FGMs are especially promising due to their non-volatile charge-based analog storage mode. When utilized as artificial synapses, FGMs exhibit learning rates that align well with those of visual and auditory signals¹⁵. Additionally, FGMs offer a large dynamic range and are compatible with standard complementary metal-oxide-semiconductor (CMOS) technology. In addition, the combination of FGMs with emerging two-dimensional (2D) materials to create 2D FGMs holds great promise for highly integrated NCH^30,31,32,33. This is because the atomic thickness of 2D materials offers them exceptional gate control capability and large storage windows, and the van der Waals surface feature facilitates the feasibility of hetero-integration and compatibility with CMOS processes. Nevertheless, the high sensitivity of 2D materials to interfacial states and defect-related instabilities of dielectrics often result in bad long-term stability, poor endurance, and memory states of fewer than one hundred for 2D FGMs^{31,34,35,36,37,38}. This poses a significant challenge for NCH based on 2D FGMs.

Here, we report gate-injection-mode (GIM) 2D FGMs with 8-bit states as candidates for large-scale NCH. Through a coplanar device structure design, the control gate (CG), floating gate, and channel are decoupled, and storing charges are programmed and erased from the CG through the shared tunneling layer. By adopting a bi-pulse state programming strategy, highly distinguishable (with intervals larger than three times of the standard deviations) and stable (with retention times longer than 10,000 s) 8-bit conductance states are achieved at 3 V programming voltage. This high state number as well as the small operation voltage is better than other types of nonvolatile memories based on field-effect transistors (FETs), including normal 2D FGMs, Si-Flash cells, and ferroelectric field-effect transistors (FeFETs). The devices also show symmetry state programming tendency and good endurance of over 10⁵ cycles. In addition, fabricated 256 devices exhibit a 94.9% yield, good uniformity, and repeatability. Leveraging the above findings, we then carry out experimental image convolutions and project 38,592 convolutional kernel parameters on a 9 × 2 device array with results matching well with that of simulations. Finally, we show that fixed-point NNs with 8-bit precision have inference accuracies approaching the ideal values. Our work demonstrates the potential of GIM 2D FGMs for high-performance neuromorphic computing accelerators.

Results

8-bit-precision programming

GIM 2D FGMs with a device structure shown in Fig. 1a were designed to realize numerous distinguishable conductance levels. Here, monolayer/few-layer MoS₂, 5-nm Pt, and 8 nm Al₂O₃ were used as channel, floating gate (FG), and tunnelling/blocking layer. An individual CG coplanar with the source and drain terminals works as both charge programming and erasing electrodes. Although approximately 22% more area may be required compared to a conventional vertical structure, the coplanar design enables the device to support vertical integration with fewer layers of materials (as analyzed in Supplementary Fig. 1). The detailed fabrication processes can be seen in the Methods section. There are several advantages of this design. First, unlike the vertically overlapped structure in a traditional FGM, here channel, FG, and CG are decoupled into two sections: channel-Al₂O₃-FG stack and CG-Al₂O₃-FG stack. Hence, the gate programming voltage can be easily regulated by changing the capacitive coupling ratio that is proportional to the area ratio between the CG and channel (denoted as A₄/A₀ in the inset of Fig. 1a). Second, a state programming strategy combining two sequential gate voltage pulses with opposite signs can be adopted to de-trap the unstable charges captured in the dielectrics, so that highly stable memory states without affecting the channel can be achieved. Third, the state programming is symmetry because of the shared charge tunneling and blocking layer, and the same charge injection and erasing mechanism. These advantages will be discussed in detail in the following sections.

**Fig. 1: Programming of the GIM 2D FGM.**

The gate-injection mode is evidenced by a counterclockwise hysteresis in the double-sweep transfer curve as shown in Fig. 1b. Electrons can be injected in or erased from the FG when applying negative or positive voltages with high enough amplitudes on the gate terminal. And the stored charges will non-volatilely change the threshold voltage and the conductance of the MoS₂ FET. The large memory window (about 78% of the sweep range) results from the high-k dielectric layer of Al₂O₃, the ultra-thin channel of MoS₂, and most importantly, the high tunneling efficiency enabled by the optimized gate size, which will be further discussed below. Theoretically, the memory states, i.e., the conductance states of the FGM, should be stable because of the high energy barrier at the Pt/Al₂O₃ interface (~4.7 eV, Supplementary Fig. 2)^39,40,41,42. Nevertheless, the source-drain current (I_DS) decreases immediately after voltage programming as seen from the light-colored lines shown in Fig. 1c. This phenomenon is widely observed in 2D FGMs^30,43,44, which is mainly coming from the unstable trapped charges inside the dielectrics during charge injection/erasing process that spontaneously de-trap after programming. And this is the same reason for the well-known bias temperature instability found in many Si-based transistors⁴⁵, especially those with high-k dielectrics like Al₂O₃ that have a range of widely distributed trap states near the conduction band⁴⁶.

To resolve the above problem, a programming method combining two sequential gate voltage pulses with opposite signs was adopted. Let us use the low-resistance state programming process as an example to illustrate this strategy (Fig. 1c–e). When a positive programming voltage (V_prog) is applied, the energy band of the tunneling layer Al₂O₃ can be largely tilted so that a triangle-shaped potential barrier appears (see the first panel of Fig. 1e). Hence, electrons stored in the FG can be erased through Fowler-Nordheim tunneling (FNT, see the first panel of Fig. 1d). The detailed analysis is shown in Supplementary Note 1 and Supplementary Fig. 3. However, some electrons are captured by the trap sites inside the tunneling layer during this process (see the second panels of Fig. 1d, e). After V_prog is withdrawn, the trapped electrons will de-trap into the FG by thermal activation in a slow relaxation process, which induces I_DS to decrease gradually. Note that, the subthreshold slope (SS) was nearly unchanged during the relaxation process, implying that the trap states were induced within the device fabrication process rather than generated by voltage programming (Supplementary Fig. 4)³⁸. By applying a negative tuning pulse (−V_tune) soon after V_prog (see the third panels of Fig. 1d, e), the relaxation process can be largely accelerated through de-trapping the trapped electrons into the FG. Nearly all the trapped electrons can be eliminated after V_tune through optimization (see the fourth panels of Fig. 1d, e). As a result, more stable programmed states were attained (stable states in Fig. 1c). The effect of this strategy is obvious and applicable for the whole conductance range as evidenced by the comparison of time-dependent transfer curves between with and without bi-pulse optimizing (Supplementary Fig. 6). Temperature-dependent state retention properties were further studied using the Arrhenius equation (Supplementary Figs. 7–9). The largely decreased stored charge leakage activation energy after applying V_tune verifies the detraping effect of the bipolar programming strategy^47,48.

Using the above programming method, the GIM 2D FGM can have up to 256 distinguishable states (Fig. 1f and the output curves are shown in Supplementary Fig. 10, a detailed closed-loop programming method and corresponding parameters see Supplementary Figs. 11 and 12), which is equivalent to an 8-bit precision. The densely distributed states can be recognized from each other with an over-3σ variation (σ, the standard deviation for a state) between neighboring states (Fig. 1g, h). That state number is comparable to the advanced commercial Si-Flash cells and unprecedented among the other previously reported nonvolatile multibit memories based on FETs, including normal 2D FGMs and FeFETs (Fig. 1i and Supplementary Table 1). Note that most of the state numbers from the compared literature come from continuous voltage programming measurements or current-voltage curves rather than current-time curves as used here, which means the state stabilities were actually not well studied. By lowering the state variation to 1σ, even a doubled state number of 512 (9-bit precision) can be achieved (Supplementary Fig. 14). Moreover, the programming voltage can be decreased to a level of 3 V by optimizing the gate size, which is among the lowest ones according to literature (Fig. 1i).

Programmability and reliability

To investigate the programmability of the GIM 2D FGMs, we adopted a device circuit configuration shown in Fig. 2a. Here, V_prog was applied on a selected gate, a small source-drain bias (V_DS) of 0.1 V was applied on the drain terminal while the source was kept grounded, and the two equivalent capacitors of channel/FG and FG/gate 1 (C₀ and C₁) were connected in series. In that configuration, the voltage drop on the tunneling layer can be calculated by V_tunnel = V_progC₀/(C₀ + C₁). As a result, the programming efficiency during a single programming operation is strongly related to the capacitive coupling ratio between the equivalent lateral configurated capacitors, that is, the ratio C_i/C₀ (i = 1, 2, 3, …) in Fig. 2a. To systematically investigate the gate-area-dependent programmability, we fabricated GIM 2D FGMs with varying-area multi gates (Fig. 2b, see Supplementary Fig. 15 for the fabrication process and Supplementary Fig. 16 for detailed geometric parameters).

**Fig. 2: Performances of the GIM 2D FGMs.**

It’s worth noting that because the gates share the same oxide layer and FG, and the capacitor value is calculated by C = εA/(4πkd_ox), where ε, A, k, d_ox are dielectric constant, effective area, electrostatic force constant and oxide layer thickness, the capacitor ratio C_i/C₀ can be directly calculated by the area ratio A_i/A₀ (the area ratio between gate i and the channel). As demonstrated in Fig. 2c, d, the dual-sweep transfer curves show an obvious area ratio dependency of the memory window, with the largest memory window of 10.3 V and the smallest memory window of 0.46 V. This difference is a direct result of the area-controlled partial voltage on the gate-Al₂O₃-FG stack. Simulated voltage potential distributions given in Supplementary Fig. 17 show similar results, validating the above analysis. The device can behave more like a transistor with a steep switch and a negligible memory window when the area ratio is very large, such as the case with an area ratio of 0.457 in Fig. 2c. That kind of device can be implemented as node selectors or activation function hardware in NNs.

The programming voltage can be decreased while maintaining a large memory window by using a smaller gate area, as shown in Fig. 2e. This dependency is consistent well with the simulation results (Supplementary Note 2 and Supplementary Fig. 18). The programming voltage can be as low as 3 V, showing potential in low-power applications. In addition, towards realizing the implementation of this device as the basic unit for NCH, the ability to update the device’s weights (conductance states) in a small range under the guidance of a backpropagation algorithm is important for on-chip training processes. That ability was also proved by the quite symmetric state updating in positive and negative directions, which is because of the identical charge injection and erasing mechanism through the coplanar GIM design (Supplementary Fig. 19).

The device also showed stable programmed states for over 10,000 s while maintaining the largest on-off ratio of over 1 × 10⁸ (Fig. 2f). Given the uniform oxide thickness in the channel and gate regions, the device’s retention properties exhibit a clear dependence on the overlap areas between the floating gate and the drain, source, and gate electrodes (Supplementary Figs. 20–22). To further enhance the retention property, an additional blocking layer can be introduced below the source and drain regions to suppress this charge leakage pathway. And a good endurance performance of 10⁵ cycles was also observed, which shows the reliability of being utilized for high-frequency weight update operations for on-chip training (Fig. 2g).

Repeatability of the 8-bit programming ability

We have fabricated 256 devices using a large-scale MoS₂ film grown by chemical vapor deposition (CVD) to study the repeatability of the 8-bit programming ability (Fig. 3, see Methods for the detailed fabrication process). The optical microscope (OM) image of the devices is shown in Fig. 3a, in which a typical area ratio is calculated to be 0.084 (see Supplementary Fig. 24 for geometric parameters). Of the 256 devices, about 13 devices were broken, which might be due to the discontinuous sites on the large-scale MoS₂ film introduced during the material transfer process, resulting a total yield rate of 94.9% (243 out of 256 devices). Apart from that, large hysteresis windows and the evenly distributed 9 programmed states can be observed from the electrical tests (Fig. 3b, c).

**Fig. 3: Uniformity and repeatability of 8-bit programming.**

Moreover, we programmed 120 out of 137 devices with a yield of 87.6% into 256 (8-bit) distinct states, ranging from a current level of 1 pA to 100 nA (the original data are shown in Supplementary Figs. 27–31). The statistics of state current as a function of device number and state number are presented in Fig. 3d. These 120 devices exhibit an overall low device-to-device variation of below 4% for the programmed states (Fig. 3e, f), which can be largely attributed to the accurate programming method employed and the wide memory windows of the devices.

The 8-bit states, low programming voltage, good stability and endurance, good repeatability and scalability shown above demonstrate the potential of GIM 2D FGMs for NCH.

Hardware convolutions based on device arrays

Vector-matrix multiplications are the most important operation in NNs, like the representation transformation processes between neighboring layers and kernel filtering processes in convolution layers for feature extraction. In this section, we fabricated a 9 × 2 array, which is comparable to other configured arrays for analog computing (see Supplementary Table 3)^{29,30,31,49,50,51,52,53}, and carried out hardware convolutions to demonstrate the potential of GIM 2D FGMs for NCH. The optical image of an array bonded on a chip carrier is shown in Fig. 4a (see Methods and Supplementary Fig. 32 for the array fabrication process, see Supplementary Fig. 33 for geometric parameters). A homemade test system was used to experimentally run the convolution process as shown in Supplementary Fig. 34. The gate lines were wired out for the programming operation, while the rows and columns were wired out and connected to every device’s drain and source terminals respectively. As a 3 × 3 convolution kernel, the first column stores positive kernel weights and the second column stores negative kernel weights. That kernel configuration can eliminate possible parasitic currents (as analyzed in Supplementary Fig. 35) And the device structure also shows small parasitic capacitances and device-to-device interferences as thoroughly analyzed in Supplementary Note 3 and Supplementary Figs. 36–39. We adopted a parallel programming method for weight (conductance states) updating (Fig. 4b), i.e., devices in a selected row were programmed simultaneously by gate voltages with the common drain terminals grounded. And a row-by-row validation scheme was used to validate the kernel programmed (Supplementary Fig. 40). Additional discussions on the limitations when operating the device array can be found in Supplementary Note 4.

**Fig. 4: Hardware convolutions using GIM 2D FGMs array.**

Figure 4c uses the convolution operation of image ‘0’ in the MNIST dataset as an example to illustrate the inference process. The image pixels were converted into voltages based on greyscale and grouped into 3 × 3 patches. Then the pixels in each patch were imported as drain inputs to the array and the output currents on the source terminals were collected as the convolution results. With different kinds of kernels that were separately programmed onto the device array (Fig. 4d and Supplementary Fig. 41), the output images after convolutions show different features (Fig. 4e). The convolution results of another image from the Fashion MNIST dataset and the convolution results with large current outputs are also shown in Supplementary Figs. 42 and 43. The experimental output images show almost the same distributions with that of software-based convolutions (Fig. 4f), demonstrating the array works well as physical kernels for feature extraction.

Considering the 8-bit states realized on GIM 2D FGFESTs, more complex kernels can be mapped onto the 9 × 2 array for high-level feature extractions. Take the convolutional base of the large-scale convolutional neural network (CNN) VGG16 as an example. It contains a 5-block convolutional base, with each block containing several convolution layers and a pooling layer (Fig. 4g). All the 38,592 kernel parameters in the first block were mapped onto the 9 × 2 array kernel-by-kernel, as shown in Fig. 4h. We see the hardware-based kernels’ weights show almost the same landscapes as the software-based values (Fig. 4i). A more direct comparison can be seen from the distributions of conductance and weight values (Fig. 4j, k). The above result implies the hardware integration capability for vector-matrix multiplication, and brings us the concept of incorporating GIM 2D FGMs in the whole body of large-scale NNs to validate the potential of constructing advanced NCH.

Convolutional neural networks with 8-bit precision

The accuracy of NNs with limited numerical states (fixed-point NNs) is an important issue for the practical application of NCH. We note that downloading a pre-trained NN to a local NCH and quantizing the weights with limited numerical states (quantization after training) is generally a more energy-efficient approach. Therefore, to demonstrate the potential of GIM 2D FGMs array for NCH (Fig. 5a), pre-trained large-scale convolutional neural networks (CNNs) were used for ImageNet dataset recognition (Fig. 5b). Here, the large number of parameters in these CNNs were quantized to the 8-bit states of the GIM 2D FGM using a nearest-rounding method. According to the simulation results with different bit precisions (the 4-bit, 5-bit, 6-bit, and 7-bit states adopted are shown in Supplementary Fig. 44), a 8-bit precision is sufficient for CNNs to achieve high recognition accuracy compared to their unlimited-precision version (Fig. 5c and Supplementary Fig. 45). It’s important to note that while 8-bit precision achieves a higher recognition accuracy (89.43%) compared to lower precisions (such as 88.96% for 7-bit) for the smallest MobileNet model, 7-bit precision is sufficient for the larger Xception model. This suggests that larger CNNs can operate effectively with lower bit precision. However, from a practical perspective, deploying small-sized NNs on edge devices is typically more energy-efficient. Therefore, the higher 8-bit precision storage for these small-sized CNNs is crucial for enhancing their performance.

**Fig. 5: Image recognition using CNNs with different precisions.**

An alternative approach involves directly training fixed-point NNs on the NCH with limited states (quantization during training). Even though this approach consumes much more energy and time compared to quantization after training, which is mainly due to the large-scale weight updating, it offers greater flexibility by adapting to specific tasks through weight fine-tuning. Through simulation of quantization during training (Fig. 5d), we observed that the advantage of 8-bit precision over lower precisions is still very obvious for both MobileNet and Xception models. This is further supported by results from a simpler model for MNIST recognition (Supplementary Fig. 46). However, an overall accuracy decrease is observed across all fixed precisions compared to quantization after training, likely due to the reduced efficiency of the training process caused by inaccurate weight updates at lower precisions.

Another important point is the choice of rounding scheme. In the above simulations, a nearest-rounding scheme was adopted. However, according to previous reports^10,11,12,13, a stochastic rounding scheme can enhance NNs performance. To validate this, we have reconducted the simulations using a stochastic rounding scheme (Supplementary Fig. 47), and the results showed an obvious accuracy increase for all the fixed precisions, especially for lower precisions such as 5-bit and 6-bit, confirming the benefits of stochastic rounding. Combined with the demonstrated capabilities of vector-matrix multiplication and high repeatability of 8-bit programming, GIM 2D FGMs show great promise for system-level-integrated vector-matrix multiplication arrays in NN accelerators.

Discussion

To sum up, we have designed 2D floating-gate memories working in a gate-injection mode as potential device units for large-scale NCH. The CG, floating gate, and channel are decoupled through this design, so that a bi-pulse state programming strategy could be adopted to realize 8-bit conductance states. This is because the subsequent tuning voltage can promote the de-trapping process of unstable charges captured by the dielectric defects that have a lower potential barrier. The states are highly distinguishable with intervals larger than three times the standard deviations and very stable with retention times longer than 10,000 s. The devices also show good endurance of over 10⁵ cycles. In addition, because charges are injected and erased from the CG through the shared Al₂O₃ layer via FNT, the state programming is almost symmetry. And through changing the capacitance ratio by varying the aera of the CG, a 3 V programming voltage can be achieved. Moreover, the fabricated 256 devices exhibit a 94.9% yield, good uniformity and repeatability. Then, a 9 × 2 device array was fabricated and experimental image convolutions were carried out with results matching well with that of software simulations. Leveraging the device’s multi-state programming capability, we successfully transferred 38,592 convolutional kernel parameters from a pre-trained VGG16 network to the array. Finally, we studied the image recognition accuracies of fixed-point NNs containing different levels of precisions. Notably, no matter whether NNs designed by downloading pre-trained networks or directly training networks locally, the inference accuracies at 8-bit precision could approach the ideal values. Our work validates the potential of GIM 2D FGMs for high-performance neuromorphic computing accelerators.

Methods

Device fabrication

A p-doped silicon substrate with 300 nm thermal-oxidized SiO₂ was firstly coated with poly(methyl methacrylate) (PMMA) and baked for 2 min at 150 °C. After that, the floating gate Pt was patterned and deposited by electron beam lithography (EBL) and electron beam evaporation, respectively. After a standard lift-off process, a layer of Al₂O₃ with 8-nm thickness was deposited on the floating gate by atomic layer deposition (ALD). The ALD was processed at 150 °C, using water and trimethylaluminum as precursors. Then, mechanically exfoliated MoS₂ (purchased from Shanghai Onway Technology Co., Ltd.) with Scotch tape was transferred onto the top surface of the Al₂O₃/Pt stack by a standard wet-transfer method, using polypropylene carbonate and polydimethylsiloxane as holders. At last, source, drain and gate electrodes of Cr/Au (8/80 nm) were patterned and deposited using EBL and thermal evaporation. To fabricate the 256 devices, a large-scale few-layer MoS₂ was grown by CVD on a 1 × 0.5-cm-sized sapphire substrate. The CVD-grown material was transferred with PMMA, patterned through EBL and etched with CF₄ and O₂ through reactive ion etching.

Array fabrication

Before the fabrication of the wired 9 × 2 array, the CVD-grown MoS₂ was transferred onto a substrate on which the wiring metal patterns were pre-deposited, following the transfer process illustrated in Supplementary Fig. 48. During the fabrication process, a 25-nm-thick layer of ALD-deposited HfO₂ was used as the insulating layer for the isolation of the overlapped drain and gate lines in the array. The other array fabrication steps were the same as the abovementioned device fabrication process.

Electronic measurements

Except for the 9 × 2 array, the electronic performance of the as-fabricated devices was tested on a probe station (Lakeshore, TTP4) under a high vacuum condition (<10⁻⁶ Torr), which is equipped with Keysight B1500A semiconductor analyzer system. All tests on the 9 × 2 array were conducted on a homemade probe station equipped with an electrical testing system (National Instruments, cDAQ-9189) under atmosphere conditions.

Simulation of large-scale CNNs

The adopted large-scale CNNs are pre-trained models loaded from the Keras platform and they were coded with Python scripts for convenient handling of the internal weights. The ImageNet samples incorporated here for evaluation were all collected from the ILSVRC2012 validation data set, which contains 50,000 images with each labelled with its class. Before evaluation, all the pre-trained weights of CNNs were replaced by the normalized conductance states with the corresponding bit precisions. During the evaluation process for each kind of CNN, the 50,000 images were clipped to a certain size of 224 × 224 and taken as the inputs of the model one by one. The output scores were translated to the recognized class for every image and all the correctly recognized images were summed for the calculation of the evaluated final recognition accuracy on this data set. The three-layer FCNN was also constructed on the Keras platform layer-by-layer. The relu function was used as the activation function, the cross-entropy method was used to calculate the loss function, and a learning rate of 10⁻³ was adopted for model training.

Data availability

The data that support the findings of this study are available within the paper and supplementary information. Source data are provided with this paper.

Code availability

The code of neural network training and evaluation used in this study is available at https://doi.org/10.7910/DVN/BDVBVC.

References

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article ADS CAS PubMed MATH Google Scholar
Mozaffari, S., Al-Jarrah, O. Y., Dianati, M., Jennings, P. & Mouzakitis, A. Deep learning-based vehicle behavior prediction for autonomous driving applications: a review. IEEE Trans. Intell. Transp. Syst. 23, 33–47 (2022).
Article MATH Google Scholar
Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Article MATH Google Scholar
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
Article ADS CAS PubMed Google Scholar
Zidan, M. A., Strachan, J. P. & Lu, W. D. The future of electronics based on memristive systems. Nat. Electron. 1, 22–29 (2018).
Article MATH Google Scholar
Mead, C. Neuromorphic electronic systems. Proc. IEEE 78, 1629–1636 (1990).
Article MATH Google Scholar
Xia, Q. & Yang, J. J. Memristive crossbar arrays for brain-inspired computing. Nat. Mater. 18, 309–323 (2019).
Article ADS CAS PubMed MATH Google Scholar
Burr, G. W. et al. Neuromorphic computing using non-volatile memory. Adv. Phys. -X 2, 89–124 (2017).
MATH Google Scholar
Gupta, S., Agrawal, A., Gopalakrishnan, K. & Narayanan, P. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML) (eds Bach, F. & Blei, D.) 1737–1746 (JMLR, 2015).
Merolla, P., Appuswamy, R., Arthur, J., Esser, S. K. & Modha, D. Deep neural networks are robust to weight binarization and other non-linear distortions. Preprint at arXiv https://doi.org/10.48550/arXiv.1606.01981 (2016).
Na, T., Ko, J. H., Kung, J. & Mukhopadhyay, S. On-chip training of recurrent neural networks with limited numerical precision. In International Joint Conference on Neural Networks (IJCNN) (eds Choe, Y., Jayne, C., Hammer, B., & King, I.) 3716–3723 (IEEE, 2017).
Sakr, C., Kim, Y. & Shanbhag, N. Analytical guarantees on numerical precision of deep neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML) (eds Precup, D. & The, Y. W.) 3007–3016 (JMLR, 2017).
Bianco, S., Cadene, R., Celona, L. & Napoletano, P. Benchmark analysis of representative deep neural network architectures. IEEE Access 6, 64270–64277 (2018).
Article Google Scholar
Hasler, P., Diorio, C., Minch, B. A. & Mead, C. Single transistor learning synapses. In Proceedings of the 7th International Conference on Neural Information Processing Systems (NeurIPS) (eds Tesauro, G., Touretzky, D. S., & Leen, T. K.) 817–824 (MIT Press, 1994).
Diorio, C., Hasler, P., Minch, A. & Mead, C. A. A single-transistor silicon synapse. IEEE Trans. Electron Devices 43, 1972–1980 (1996).
Article ADS CAS Google Scholar
Diorio, C., Hasler, P., Minch, B. A. & Mead, C. A. A floating-gate MOS learning array with locally computed weight updates. IEEE Trans. Electron Devices 44, 2281–2289 (1997).
Article ADS Google Scholar
Bayat, F. M. et al. Redesigning commercial floating-gate memory for analog computing applications. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS) (eds Fernandes, J. & Serdijn, W.) 1921–1924 (IEEE, 2015).
Gu, X., Wan, Z. & Iyer, S. S. Charge-trap transistors for CMOS-only analog memory. IEEE Trans. Electron Devices 66, 4183–4187 (2019).
Article ADS CAS MATH Google Scholar
Danial, L. et al. Two-terminal floating-gate transistors with a low-power memristive operation mode for analogue neuromorphic computing. Nat. Electron. 2, 596–605 (2019).
Article CAS MATH Google Scholar
Chua, L. Memristor-The missing circuit element. IEEE Trans. Circuit Theory 18, 507–519 (1971).
Article MATH Google Scholar
Ielmini, D. & Wong, H. S. P. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Article MATH Google Scholar
Jo, S. H. et al. Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett. 10, 1297–1301 (2010).
Article ADS CAS PubMed MATH Google Scholar
Song, W. et al. Programming memristor arrays with arbitrarily high precision for analog computing. Science 383, 903–910 (2024).
Article ADS MathSciNet CAS PubMed MATH Google Scholar
Suri, M. et al. Phase change memory as synapse for ultra-dense neuromorphic systems: application to complex visual pattern extraction. In 2011 International Electron Devices Meeting (IEDM) (eds Ishimaru, K., Misra, V. & Ghani, T.) 4.4.1–4.4.4 (IEEE, 2011).
Burr, G. W. et al. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. IEEE Trans. Electron Devices 62, 3498–3507 (2015).
Article ADS MATH Google Scholar
Aziz, A. et al. Computing with ferroelectric FETs: devices, models, systems, and applications. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) (eds Madsen, J. & Coskun, A. K.) 1289–1298 (IEEE, 2018).
Khan, A. I., Keshavarzi, A. & Datta, S. The future of ferroelectric field-effect transistor technology. Nat. Electron. 3, 588–597 (2020).
Article MATH Google Scholar
Ning, H. et al. An in-memory computing architecture based on a duplex two-dimensional material structure for in situ machine learning. Nat. Nanotechnol. 18, 493–500 (2023).
Article ADS CAS PubMed MATH Google Scholar
Migliato Marega, G. et al. Logic-in-memory based on an atomically thin semiconductor. Nature 587, 72–77 (2020).
Article ADS CAS PubMed Google Scholar
Migliato Marega, G. et al. A large-scale integrated vector-matrix multiplication processor based on monolayer molybdenum disulfide memories. Nat. Electron. 6, 991–998 (2023).
Article CAS MATH Google Scholar
Liu, L. et al. Ultrafast non-volatile flash memory based on van der Waals heterostructures. Nat. Nanotechnol. 16, 874–881 (2021).
Article ADS CAS PubMed MATH Google Scholar
Cao, W., Kang, J., Bertolazzi, S., Kis, A. & Banerjee, K. Can 2D-nanocrystals extend the lifetime of floating-gate transistor based nonvolatile memory? IEEE Trans. Electron Devices 61, 3456–3464 (2014).
Article ADS CAS Google Scholar
Giusi, G., Marega, G. M., Kis, A. & Iannaccone, G. Impact of interface traps in floating-gate memory based on monolayer MoS₂. IEEE Trans. Electron Devices 69, 6121–6126 (2022).
Article ADS CAS Google Scholar
Sup Choi, M. et al. Controlled charge trapping by molybdenum disulphide and graphene in ultrathin heterostructured memory devices. Nat. Commun. 4, 1624 (2013).
Article ADS MATH Google Scholar
Bertolazzi, S., Krasnozhon, D. & Kis, A. Nonvolatile memory cells based on MoS₂/graphene heterostructures. ACS Nano 7, 3246–3252 (2013).
Article CAS PubMed Google Scholar
Kaczer, B. et al. A brief overview of gate oxide defect properties and their relation to MOSFET instabilities and device and circuit time-dependent variability. Microelectron. Reliab. 81, 186–194 (2018).
Article ADS CAS MATH Google Scholar
Zafar, S., Kumar, A., Gusev, E. & Cartier, E. Threshold voltage instabilities in high-κ gate dielectric stacks. IEEE Trans. Device Mater. Reliab. 5, 45–64 (2005).
Article CAS Google Scholar
Wu, H. et al. Multifunctional half-floating-gate field-effect transistor based on MoS₂-BN-graphene van der Waals heterostructures. Nano Lett. 22, 2328–2333 (2022).
Article ADS CAS PubMed MATH Google Scholar
Lee, S., Peng, R., Wu, C. & Li, M. Programmable black phosphorus image sensor for broadband optoelectronic edge computing. Nat. Commun. 13, 1485 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Hu, W. et al. Ambipolar 2D semiconductors and emerging device applications. Small Methods 5, e2000837 (2021).
Article PubMed Google Scholar
Liu, X., Choi, M. S., Hwang, E., Yoo, W. J. & Sun, J. Fermi level pinning dependent 2D semiconductor devices: challenges and prospects. Adv. Mater. 34, e2108425 (2022).
Article PubMed Google Scholar
Wu, E. et al. Tunable and nonvolatile multibit data storage memory based on MoTe₂/boron nitride/graphene heterostructures through contact engineering. Nanotechnology 31, 485205 (2020).
Article CAS PubMed MATH Google Scholar
Wang, Y. et al. Band‐tailored van der Waals heterostructure for multilevel memory and artificial synapse. InfoMat 3, 917–928 (2021).
Article CAS MATH Google Scholar
Cartier, E., Linder, B. P., Narayanan, V. & Paruchuri, V. K. Fundamental understanding and optimization of PBTI in nFETs with SiO₂/HfO₂ gate stack. In 2006 International Electron Devices Meeting (IEDM) (ed. Mistry, K.) 1–4 (IEEE, 2006).
Vais, A. et al. On the distribution of oxide defect levels in Al₂O₃ and HfO₂ high-k dielectrics deposited on InGaAs metal-oxide-semiconductor devices studied by capacitance-voltage hysteresis. J. Appl. Phys. 121, 144504 (2017).
Article ADS Google Scholar
Molas, G. et al. Reliability of charge trapping memories with high-k control dielectrics (Invited Paper). Microelectron. Eng. 86, 1796–1803 (2009).
Article CAS MATH Google Scholar
Bersuker, G. et al. Mechanism of electron trapping and characteristics of traps in HfO₂ gate stacks. IEEE Trans. Device Mater. Reliab. 7, 138–145 (2007).
Article CAS MATH Google Scholar
Mennel, L. et al. Ultrafast machine vision with 2D material neural network image sensors. Nature 579, 62–66 (2020).
Article ADS CAS PubMed Google Scholar
Tong, L. et al. 2D materials–based homogeneous transistor-memory architecture for neuromorphic hardware. Science 373, 1353–1358 (2021).
Article ADS CAS PubMed MATH Google Scholar
Pan, C. et al. Reconfigurable logic and neuromorphic circuits based on electrically tunable two-dimensional homojunctions. Nat. Electron. 3, 383–390 (2020).
Article CAS Google Scholar
Lee, M.-P. et al. Silicon–van der Waals heterointegration for CMOS-compatible logic-in-memory design. Sci. Adv. 9, eadk1597 (2023).
Article CAS PubMed PubMed Central Google Scholar
Huang, X. et al. An ultrafast bipolar flash memory for self-activated in-memory computing. Nat. Nanotechnol. 18, 486–492 (2023).
Article ADS CAS PubMed MATH Google Scholar
Luo, Z. D. et al. Dual-ferroelectric-coupling-engineered two-dimensional transistors for multifunctional in-memory computing. ACS Nano 16, 3362–3372 (2022).
Article CAS PubMed MATH Google Scholar
Zhou, Y. et al. A reconfigurable two-WSe₂-transistor synaptic cell for reinforcement learning. Adv. Mater. 34, 2107754 (2022).
Article CAS Google Scholar
Wang, S. et al. Two-dimensional ferroelectric channel transistors integrating ultra-fast memory and neural computing. Nat. Commun. 12, 53 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Kim, M.-K., Kim, I.-J. & Lee, J.-S. CMOS-compatible compute-in-memory accelerators based on integrated ferroelectric synaptic arrays for convolution neural networks. Sci. Adv. 8, eabm8537 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Kim, K. H. et al. Scalable CMOS back-end-of-line-compatible AlScN/two-dimensional channel ferroelectric field-effect transistors. Nat. Nanotechnol. 18, 1044–1050 (2023).
Article ADS CAS PubMed MATH Google Scholar
Soliman, T. et al. First demonstration of in-memory computing crossbar using multi-level Cell FeFET. Nat. Commun. 14, 6348 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Li, Y. et al. Low-voltage ultrafast nonvolatile memory via direct charge injection through a threshold resistive-switching layer. Nat. Commun. 13, 4591 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Liu, C. et al. A semi-floating gate memory based on van der Waals heterostructures for quasi-non-volatile applications. Nat. Nanotechnol. 13, 404–410 (2018).
Article ADS CAS PubMed MATH Google Scholar
Wu, L. et al. Atomically sharp interface enabled ultrahigh-speed non-volatile memory devices. Nat. Nanotechnol. 16, 882–887 (2021).
Article ADS CAS PubMed MATH Google Scholar
Won, U. Y. et al. Multi-neuron connection using multi-terminal floating-gate memristor for unsupervised learning. Nat. Commun. 14, 3070 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Wang, W. et al. A memristive deep belief neural network based on silicon synapses. Nat. Electron. 5, 870–880 (2022).
Article MATH Google Scholar
Lee, S. T., Kwon, D., Kim, H., Yoo, H. & Lee, J. H. NAND flash based novel synaptic architecture for highly robust and high-density quantized neural networks with binary neuron activation of (1, 0). IEEE Access 8, 114330–114339 (2020).
Article MATH Google Scholar
Lee, S. T. et al. Operation scheme of multi-layer neural networks using NAND flash memory as high-density synaptic devices. IEEE J. Electron Devices Soc. 7, 1085–1093 (2019).
Article CAS MATH Google Scholar
Choi, H. S. et al. 3-D floating-gate synapse array with spike-time-dependent plasticity. IEEE Trans. Electron Devices 65, 101–107 (2018).
Article ADS CAS MATH Google Scholar
Gao, S. et al. Programmable linear RAM: a new flash memory-based memristor for artificial synapses and its application to speech recognition system. In 2019 IEEE International Electron Devices Meeting (IEDM) (eds Takayanagi, M., Datta, S. & Grasser, T.) 14.11.11–14.11.14 (IEEE, 2019).
Yang, F.-S. et al. Oxidation-boosted charge trapping in ultra-sensitive van der Waals materials for artificial synaptic features. Nat. Commun. 11, 2972 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Xiang, D., Liu, T., Zhang, X., Zhou, P. & Chen, W. Dielectric engineered two-dimensional neuromorphic transistors. Nano Lett. 21, 3557–3565 (2021).
Article ADS CAS PubMed MATH Google Scholar
Sangwan, V. K. et al. Multi-terminal memtransistors from polycrystalline monolayer molybdenum disulfide. Nature 554, 500–504 (2018).
Article ADS CAS PubMed Google Scholar
Nam, J. H. et al. Low power MoS₂/Nb₂O₅ memtransistor device with highly reliable heterosynaptic plasticity. Adv. Funct. Mater. 31, 2104174 (2021).
Article ADS CAS Google Scholar
Judy, M. et al. A digitally interfaced analog correlation filter system for object tracking applications. IEEE Trans. Circuits Syst. I-Regul. Pap. 65, 2764–2773 (2018).
Article MATH Google Scholar
Radisavljevic, B., Radenovic, A., Brivio, J., Giacometti, V. & Kis, A. Single-layer MoS₂ transistors. Nat. Nanotechnol. 6, 147–150 (2011).
Article ADS CAS PubMed Google Scholar
Jiang, Y. et al. A scalable integration process for ultrafast two-dimensional flash memory. Nat. Electron. 7, 868–875 (2024).
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (2021YFA1201500), National Natural Science Foundation of China (Nos. U24A2055, U24A20302, 62374048, 62274046, 12204122, 62204217). The authors also gratefully acknowledge the support of Youth Innovation Promotion Association CAS and CAS Key Laboratory of Nanosystem and Hierarchical Fabrication.

Author information

Authors and Affiliations

CAS Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology, Beijing, P. R. China
Yuchen Cai, Jia Yang, Feng Wang, Shuhui Li, Tao Yan, Shan Yan, Xueying Zhan & Zhenxing Wang
Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing, P. R. China
Yuchen Cai, Jia Yang, Feng Wang, Xueying Zhan & Zhenxing Wang
Key Laboratory of Artificial Micro- and Nano-structures of Ministry of Education, School of Physics and Technology, Wuhan University, Wuhan, P. R. China
Yutang Hou, Lei Yin & Jun He
Institute of Semiconductors, Henan Academy of Sciences, Zhengzhou, P. R. China
Yanrong Wang

Authors

Yuchen Cai
View author publications
Search author on:PubMed Google Scholar
Jia Yang
View author publications
Search author on:PubMed Google Scholar
Yutang Hou
View author publications
Search author on:PubMed Google Scholar
Feng Wang
View author publications
Search author on:PubMed Google Scholar
Lei Yin
View author publications
Search author on:PubMed Google Scholar
Shuhui Li
View author publications
Search author on:PubMed Google Scholar
Yanrong Wang
View author publications
Search author on:PubMed Google Scholar
Tao Yan
View author publications
Search author on:PubMed Google Scholar
Shan Yan
View author publications
Search author on:PubMed Google Scholar
Xueying Zhan
View author publications
Search author on:PubMed Google Scholar
Jun He
View author publications
Search author on:PubMed Google Scholar
Zhenxing Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.C., F.W., and Z.W. conceived the idea and designed experimental schemes. Y.H. and L.Y. provided the CVD-grown materials. Y.C. and J.Y. fabricated the devices and carried out electrical measurements. S.L., Y.W., T.Y., and S.Y. assisted with the array fabrication. Y.C. and F.W. analyzed the data and co-wrote the manuscript in consultation with all the other authors. F.W., S.L., Z.W., X.Z., and J.H. provided funding support. Z.W. supervised the project. All authors participated in manuscript revision and refinement.

Corresponding authors

Correspondence to Feng Wang or Zhenxing Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Cai, Y., Yang, J., Hou, Y. et al. 8-bit states in 2D floating-gate memories using gate-injection mode for large-scale convolutional neural networks. Nat Commun 16, 2649 (2025). https://doi.org/10.1038/s41467-025-58005-z

Download citation

Received: 07 July 2024
Accepted: 10 March 2025
Published: 18 March 2025
DOI: https://doi.org/10.1038/s41467-025-58005-z