Ferroelectric NAND for efficient hardware bayesian neural networks

Song, Minsuk; Koo, Ryun-Han; Kim, Jangsaeng; Han, Chang-Hyeon; Yim, Jiyong; Ko, Jonghyun; Yoo, Sijung; Choe, Duk-hyun; Kim, Sangwook; Shin, Wonjun; Kwon, Daewoong

doi:10.1038/s41467-025-61980-y

Download PDF

Article
Open access
Published: 25 July 2025

Ferroelectric NAND for efficient hardware bayesian neural networks

Nature Communications volume 16, Article number: 6879 (2025) Cite this article

8297 Accesses
12 Citations
7 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

The rapid advancement of artificial intelligence has enabled breakthroughs in diverse fields, including autonomous systems and medical diagnostics. However, conventional deterministic neural networks struggle to capture uncertainty, limiting their reliability when handling real-world data, which are often noisy, imbalanced, or scarce. Bayesian neural networks address this limitation by representing weights as probabilistic distributions, allowing for natural uncertainty quantification and improved robustness. Despite their advantages, hardware-based implementations face significant challenges due to the difficulty of independently tuning both the mean and variance of weight distributions. Herein, we propose a 3D ferroelectric NAND-based Bayesian neural network system that leverages incremental step pulse programming technology to achieve efficient and scalable probabilistic weight control. The page-level programming capabilities and intrinsic device-to-device variations enable gaussian weight distributions in a single programming step, without structural modifications. By modulating the incremental step pulse programming voltage step, we achieve precise weight distribution control. The proposed system demonstrates successful uncertainty estimation, enhanced energy efficiency, and robustness to external noise for medical images.

Bringing uncertainty quantification to the extreme-edge with memristor-based Bayesian neural networks

Article Open access 20 November 2023

Neural sampling machine with stochastic synapse allows brain-like learning and inference

Article Open access 11 May 2022

Uncertainty quantification in multivariable regression for material property prediction with Bayesian neural networks

Article Open access 08 May 2024

Introduction

Rapid advancements in artificial intelligence (AI) have catalyzed the evolution of various machine learning paradigms. Among these, artificial neural networks (ANNs) have significantly enhanced learning efficiency and generalization capabilities, achieving breakthroughs in applications such as image recognition, autonomous driving, healthcare diagnostics, and weather prediction^1,2,3,4,5,6. However, the growing complexity of ANN models has led to an exponential increase in the computational demand, exposing the inherent limitations of the traditional von Neumann architecture, particularly its memory and processing bottlenecks. To address these challenges, hardware-accelerated neural networks that employ non-volatile memory (NVM) have emerged as compelling alternatives. By integrating data storage and parallel computation through vector-matrix multiplication (VMM) operations that leverage Kirchhoff’s current law for input voltages and stored conductances directly within the memory array, these architectures can overcome the inefficiencies of conventional designs and offer a more scalable and energy-efficient solution for next-generation AI workloads^{7,8,9,10,11,12}.

As synaptic elements in neural networks, NVMs should not only store weight information but also represent analog values for precise computations through complex VMM operations, requiring a multilevel capability to encode analog weights, stable retention, and robust endurance^13,14,15. In this regard, hafnium oxide (HfO₂)-based ferroelectric field-effect transistors (FeFETs) have gained significant attention for their low programming voltage, fast switching speed, high scalability, stable multilevel states, robust reliability, and CMOS compatibility^{16,17,18,19,20,21,22} among various NVM technologies, such as memristors^{23,24,25,26,27}, ferroelectric devices^{28,29,30,31,32,33,34,35}, magnetic random-access memory (MRAM)^36,37,38, and flash memory^{39,40,41,42,43}. Therefore, the implementation of hardware neural networks using ferroelectric-based synaptic devices has attracted significant attention from both academia and industry.

Existing neural networks lack robust uncertainty quantification and fail to provide reliable assessments of their predictions^44,45. These models often exhibit overconfidence and produce incorrect predictions with high certainty, which poses significant risks in high-reliability applications such as healthcare and autonomous driving⁴⁶. Furthermore, real-world data frequently contains noise or deviates from the training distribution, making it challenging for conventional models to deliver accurate and reliable predictions⁴⁷. In hardware-based neural networks, additional challenges arise from phenomena such as endurance, retention, and drift, which can alter the weight values over time, leading to significant performance degradation^48,49.

To address these challenges, Bayesian neural networks (BNNs) have been developed by applying Bayes’ theorem to conventional neural networks. Unlike traditional models, which represent weights as fixed values, BNNs model weights as probability distributions, enabling quantitative uncertainty evaluation (Fig. 1a)^50,51,52,53. This capability makes BNNs particularly suitable for applications in safety-critical domains, such as autonomous driving, robotics, and medical diagnosis, where understanding and managing uncertainty are essential. As the posterior distribution is computationally intractable, Bayesian methods typically approximate it using techniques such as variational inference (VI)^54,55 and Markov chain Monte Carlo (MCMC) sampling^56,57. BNNs can instantiate conventional neural networks by sampling weight values from these distributions. During the training, multiple networks were generated by sampling M times from the BNN. When the same input data are provided to these networks, a predictive distribution of the output is obtained. Unlike traditional networks that primarily focus on fitting data, BNNs inherently model uncertainty, mitigate overfitting, and perform well even in scenarios with limited data. In addition, their probabilistic framework allows them to assess uncertainty, rendering them effective at identifying and classifying previously unseen images.

**Fig. 1: Proposed Bayesian neural network architecture using 3D FeNAND.**

To implement a BNN in hardware where weights are represented as probability distributions, an effective method for realizing these weight distributions is essential. Recent studies have proposed various approaches for expressing the synaptic weight distributions of BNNs in hardware^{58,59,60,61,62,63}. The inherent stochasticity of NVMs aligns well with the probabilistic nature of the synapses in BNNs. For instance, cycle-to-cycle variability has been utilized by performing repeated program-and-erase cycles, repeated read operations, and combining multiple cells to represent the distribution of a single synapse’s distribution^58,59,60,61. In addition, random telegraph noise (RTN) induced by traps has been explored as a method for generating weight distributions⁶². Another approach involves directly mapping the weights sampled M times in the software to M hardware arrays⁶³. However, these methods have several challenges. Achieving the desired distribution through repeated programming and erasing of a single cell is time-intensive and lacks efficiency, which is a significant limitation. Furthermore, the proposed architecture requires substantial design modifications or the addition of peripheral circuits to accommodate the vast number of parameters required for modern deep learning, which leads to increased hardware complexity and design challenges. Additionally, RTNs are inherently difficult to manage, often resulting in excessively narrow distributions during repeated read operations, which pose considerable challenges to the efficient and practical realization of BNNs in hardware systems.

In this study, we demonstrate the practical implementation of BNNs in commercial 3D NAND architecture without any structural changes (Fig. 1b). By utilizing ferroelectric materials, which enable low-voltage operation and the programming of multiple cells sharing a word line (WL) simultaneously through the NAND structure, the proposed approach achieves significant improvements in time/area efficiency and reliability (Fig. 1c). Leveraging page-level programming of 3D ferroelectric NAND-type memory (FeNAND) and device-to-device variation, BNN weight distribution is implemented in a single process. The incremental step pulse programming (ISPP) technique, which is a standard programming method in NAND flash memory, is employed to generate multiple states, creating Gaussian weight distributions without the need for additional circuitry. The wide-ranging standard deviation (σ) of these distributions is simply controlled by adjusting the ISPP voltage step (V_step) (Fig. 1d). The program voltage is incrementally increased by V_step until all the cells in the WL reach the target current. The cells that reach the target current first have an inhibitory voltage applied to stabilize their state and prevent further changes. Moreover, the mean (μ) is fine-tuned using a program-and-verify method, ensuring that cells are programmed until the cell current reaches the target current value (Fig. 1e). Therefore, the mean and distribution dispersion of multiple cells sharing a WL can be finely adjusted using only an ISPP program-and-verify operation without multiple programming and erasing iterations of a single cell. The final weight distribution is determined by subtracting G⁺ and G⁻, reflecting the weight’s inherent distribution. This allows for the representation of a wide range of distributions, from narrow to broad, by adjusting V_step. To validate this approach, we fabricated a 24×12 HfZrO₂ (HZO)-based FeNAND array. Experimental results confirmed that both the weight distribution of σ and μ can be precisely controlled by varying V_step and the target current. Additionally, to verify the compatibility with the 3D NAND structure, the weight distributions trained in the software were mapped onto the FeNAND arrays. Inference operations were then performed using the proposed NAND-based BNN framework and evaluated using SPICE simulations, thereby confirming both the feasibility and practicality of the implementation. The proposed system offers the most compatible and reliable solution for hardware BNNs currently available in the memory industry.

Results

Device characteristics

Figure 2a shows a schematic of the method for emulating biological synapses using FeFETs. In the biological brain, the strength of synaptic connections between presynaptic and postsynaptic neurons is updated based on the degree of interaction between neurons in response to external stimuli. This process can be replicated by utilizing the partial polarization of the FeFETs. The extent of polarization in the ferroelectric layer domains varies according to the magnitude of the external programming pulse applied to the gate of the FeFET, thereby enabling the simulation of the synaptic connection strength. Figure 2(b-1) and (b-2) show the top optical image of the fabricated 24×12 HZO-based FeNAND array. The array structure consists of 24 WLs intersecting 12-bit lines (BLs) and source lines (SLs), forming a total of 288 memory cells. Rows can be selected via WLs, whereas BLs connect to multiple cell drains, allowing selective read and write operations through column selection. Figure 2c shows the cross-sectional TEM image of a single FeNAND cell, where a 1.18 nm SiO₂ gate oxide layer is deposited on the Si channel, and a 6.63 nm HZO layer serves as the ferroelectric layer. The WL consists of a TiN layer with a thickness of 100 nm (see the Methods section and Supplementary Fig. 1 for the detailed fabrication process). Supplementary Fig. 3 shows the energy-dispersive X-ray spectroscopy (EDS) results of the FeFET gate stack. Figure 2d shows the current (I) measured through the FeFET gate as a function of the applied voltage (V ) based on the PUND measurements. Figure 2e shows the leakage-removed polarization-switching current density ( J) versus voltage (left axis) and its integration (polarization: P) over time (right axis). The HZO layer shows remnant polarization (2P_r) exceeding 50 μC/cm², confirming the robust polarization switching capability of the fabricated FeNAND cells. Unclosed loop is due to the partial domain switching caused by domain pinning centers, such as oxygen vacancies or dopants, which may render some domains slower to flip. However, from the perspective of BNN performance, these minor non-idealities do not undermine the key advantage of our system: it reliably supports the statistical weight distributions needed for Bayesian inference.

**Fig. 2: Electrical characteristics of the FeNAND array.**

Figure 2f shows the pristine-state DC IV double-sweep characteristics of the 24×12 FeNAND array. FeFETs with NAND architecture exhibit variations in both erase and program operations owing to process parameters, including dielectric properties, ferroelectric layer thickness, and doping concentration. The step-like signal observed when the order changes is attributed to the B1500A measurement setup. Supplementary Fig. 4a shows the pristine-state DC IV single-sweep characteristics of the 24×12 FeNAND array. Supplementary Fig. 4b shows current maps corresponding to each WL and BL. Similar to the double-sweep results, device-to-device variations are observed. Although such variations typically degrade the memory system accuracy, they can be beneficially exploited as weight uncertainty representations in BNNs through appropriate operation schemes. Fig. 2g shows the I–V curves of 30 randomly selected cells after the program-and-erase operations. The program state is induced by applying a 5.5 V program voltage (V_PGM) for 2 μs, while the erase state is achieved using a −5 V erase voltage (V_ERS) for 5 μs, with both BLs and SLs grounded. During the read operation, the voltage applied to the unselected cells (V_pass) was fixed at 2 V. The rationale behind this choice is reported in Supplementary Note 1 (Determination of V_pass value for read operations). Additionally, the inhibition operation during the program and erase processes is discussed in Supplementary Note 2 (Inhibition process of the FeNAND arrays during program and erase operations). In Fig. 2g, the threshold voltage (V_th) decreases in the program state and increases in the erase state compared to the initial state, confirming that the memory state of the FeNAND cell can be effectively stored based on the polarization state of the HZO layer. Figure 2h shows the programming and erasing speeds of the fabricated FeNAND device. Under grounded BL and SL conditions, the threshold voltage shift (ΔV_th) is measured based on the magnitude of V_PGM and V_ERS and the applied pulse width. The FeNAND device shows low-voltage programming (<5 V) and fast switching speeds (<10 μs), attributed to the excellent switching characteristics of the HZO layer.

Figure 2i shows the retention characteristics of four distinct states programmed by applying specific pulses to the WL (State 1: 5 V, 2 μs; State 2: 3.8 V, 2 μs; State 3: 3.2 V, 2 μs; and State 4: −5 V, 5 μs). Figure 2i shows the I_D-V_GS characteristics over 10⁶seconds at ambient temperature (25 °C), with negligible V_th drift. Figure 2j plots the stability of I_D over time, further confirming that the fabricated FeNAND device possesses retention properties suitable for use as synaptic memory cells in hardware-based neuromorphic systems. Figure 2k presents the endurance characteristics of the fabricated FeNAND device under repeated programming and erasing conditions. A 5.5 V (2 μs) program pulse and a −5 V (5 μs) erase pulse are repeatedly cycled. Even after 10⁷ endurance cycles, minimal V_th degradation is observed, confirming the robust endurance and reliability characteristic of devices. Figure 2l shows the I_D-V_GS curves obtained when programming the FeNAND cell using the ISPP scheme. In the ISPP scheme, the programming start voltage (V_start) is set to 3.00 V, the voltage increment per pulse (V_step) is 0.15 V, and the final programming voltage (V_end) is 5.25 V. According to the ISPP scheme, V_th of the FeNAND cell gradually decreases, demonstrating that successful conductance modulation is achievable through the ISPP scheme. Supplementary Fig. 9 shows the process of achieving different target current values (a: 50 nA and b: 80 nA) using the ISPP scheme. Figure 2m shows the I_D changes in the FeNAND cells under repeated ISPP and erase pulses, demonstrating stable and consistent I_D shifts. These results verify that the FeNAND cells can be effectively used for memory and synaptic applications in hardware-based neuromorphic systems.

Control µ and σ in FeNAND array

In this section, we propose a scheme that simultaneously controls the conductance and variation of FeNAND cells by adjusting V_step and V_end. Figure 3a shows the I_D-V_end graph with V_start fixed at 3.00 V and varying V_step values. As V_end increases, the magnitude of the final applied voltage increases, resulting in an increase in I_D. In addition, for the same V_end, a smaller V_step leads to a larger I_D owing to the application of a greater number of pulses. Therefore, by selecting an appropriate V_end corresponding to the applied V_Step, the desired I_D level can be achieved. Figure 3b demonstrates that different target I_Ds (10, 50, and 100 nA) were successfully attained for 100 FeNAND cells under varying V_step conditions ((b-1): 0.05 V, (b-2): 0.10 V, and (b-3): 0.20 V). Read operations were carried out at 0.4 V in the subthreshold region. This minimizes disturbances from pass-resistor effects during on-state reads and ensures a high on/off ratio. This indicates precise control over the mean current. Interestingly, even for the same target cell, the variability among the cells increased as the magnitude of V_step increased. As V_step increases, more cells experience large shifts in a single pulse, and excessive overshoots in some cells result in a wider overall distribution. Conversely, setting up a smaller V_step can suppress these shifts and narrow the distribution, but it requires significantly longer programming times. Moreover, previous studies on charge-trap flash-based 3D NAND architectures have shown that the expected number of over-programmed cells increases with the applied ISPP voltage. They mathematically modeled this behavior to demonstrate that as V_step increases and the voltage increment per programming step becomes larger, leading to more frequent trapping events and a higher probability of over-programming, the expected value of over-programming occurrences also increases^64,65,66. Supplementary Fig. 10 illustrates how cells reach a given target current as V_PGM increases. It also shows that once certain cells surpass the target current, applying an inhibit voltage (2.7 V) prevents any further changes in their state. Supplementary Fig. 11 shows the retention characteristics measured at intermediate current levels. These characteristics were evaluated over 10⁴seconds in 10 devices, with programming performed in 10 nA increments from 10 nA to 100 nA. Reliable retention was observed for the entire duration at each programmed current level, indicating that the weight state remains stable. Supplementary Fig. 12 shows the I_D histograms of 100 cells programmed with different V_step values (0.05, 0.1, 0.15, and 0.2 V) for all target I_Ds. Despite consistent mean I_D values close to the target, increasing V_step (from 0.05 to 0.20 V) results in broader standard deviations (from 7.64×10^-10 to 2.59×10^-9), validating the modulation of σ through V_step adjustments. Since the ISPP scheme applies V_inhibit solely to cells that exceed the target current and performs no further trimming, the resulting distribution is right-skewed around the intended μ. Figure 3c shows current variations of 100 FeNAND cells programmed with varying target currents (10, 20, 50, 80, and 100 nA) and V_step values, confirming increased dispersion with larger V_step. Fig. 3d shows the dependence of I_D standard deviation on both V_step and target current, indicating consistent increases in σ with larger V_step values, regardless of the target current level. To investigate the effect of programming pulse width, Supplementary Fig. 13 shows the relationship between V_step, μ_ID, and σ_ID for pulse widths of 20 and 100µs, respectively. Regardless of the pulse width magnitude, an increasing trend in σ_ID is observed with increasing V_step. Therefore, by proposing a scheme that adjusts V_step and V_end, we demonstrate that μ_ID and σ_ID can be independently controlled within a FeNAND cell array. Importantly, this ISPP scheme has already been utilized in commercial NAND cells, offering the significant advantage of not requiring additional circuitry or changes in operational modes. Figure 3e shows the difference between the target μ_ID and the measured μ_ID for five different FeNAND arrays (each array consists of 288 cells) through the control of V_end in the proposed ISPP scheme. It is confirmed that programming could be successfully achieved over a wide μ_ID range (10–130 nA). Figure 3f shows the difference between the target σ_ID and the measured σ_ID for five different FeNAND arrays (each array consists of 288 cells) through the control of V_step in the proposed ISPP scheme. In off-chip BNN implementations, verifying cycle-to-cycle consistency is essential. Supplementary Fig. 14 shows the weight distribution on the same 100 FeNAND devices over five programming cycles at target currents of 50 nA and 100 nA. Both the μ and standard σ remained highly stable, varying by less than 10% across all five cycles, and identical trends were observed in different FeNAND arrays, confirming the robustness of the proposed scheme. If the distribution of devices produced via ISPP changes over the endurance period, it will affect the network’s performance. Supplementary Fig. 15 shows the retention characteristics for three states (50, 80, 100 nA), measured over 10⁴ seconds after applying 10³ cycling damage. Experimentally, it was verified that all three states exhibited distinguished retention, with less than a 10% loss over 10⁴seconds. Supplementary Figs. 16 and 17 show retention measurements at 55 °C and 85 °C, respectively; see Supplementary Note 4 for further details. The mean error in Supplementary Fig. 11 is caused by the inherent right‑skew of ISPP, not retention; retention‑induced drift is <5 % over 10⁴seconds and therefore negligible during the sub‑millisecond ISPP window. Figure 3g presents the controllable range of μ_ID and σ_ID achieved by adjusting V_end and V_step in the proposed ISPP scheme. The ability to adjust over a wide range demonstrates the potential for enhancing the expressive power of BNN models.

Fig. 3: Control of μID and σID in the FeNAND array via the ISPP scheme. — **Fig. 3: Control of μ_ID and σ_ID in the FeNAND array via the ISPP scheme.**

Validation of 3D FeNAND Design Feasibility through SPICE Simulation

SPICE simulations were performed using pretrained weights mapped onto the fabricated array to validate the implementation of a BNN on the 3D FeNAND structure. The ability of the 3D NAND structure to program cells simultaneously enhances time efficiency. Unlike planar NAND structures, the 3D NAND architecture enables VMM operations through a BL. Figure 4a shows a BNN implemented on the 3D FeNAND architecture trained to classify the two types of patterns, with the dataset comprising 20 training examples with 4×2 pixel dimensions (see Supplementary Fig. 19 for the entire dataset). The input data were binarized into 0 and 1, and the classification was determined based on whether a shaded pattern appeared in the upper or lower portions of the image. The parameters used in more detailed software simulations are shown in Supplementary Table 2.

**Fig. 4: Image classification using 3D FeNAND with SPICE simulation.**

Depending on the input, the string select line (SSL) is configured to apply either a pulse or ground (GND), while a read voltage (0.4 V, 100 µs) is applied to the WL containing the transferred weights. V_pass (2 V) is applied to all the other unselected WLs. The current flowing through each string is summed along the BL, making VMM operations essential for BNN inference. The detailed VMM operation in NAND structure is reported in Supplementary Note 3 (Vector Matrix Multiplication operation in 3D FeNAND architecture). Each cell serves as a synaptic device to store the individual weight values. The cumulative currents from multiple BLs collectively represent the weight distribution required for the BNN, ensuring the robust execution of its probabilistic computations. For negative weight distributions, weights were divided into G⁺ and G^-, and the final output was calculated using a subtractor. This output passes through the neuron and activation circuits, serving as the input for the subsequent layers.

In a 3D NAND flash structure, the cells in each string are connected in series, allowing only one WL to be read simultaneously. Previous studies have mapped one network layer per WL, which led to significant inefficiency when the network size was small. To address this, multiple layers were mapped onto a single WL depending on the network size (Fig. 4b). Figure 4c illustrates the overall system process of the BNN implemented using 3D FeNAND arrays. The architecture stored weight distribution parameters (µ and σ) in each layer of WL. The combined currents were processed through subtractor circuits and ADCs, followed by activation and neuron circuits before being propagated to the next layer. To provide input to the subsequent layer, the output is converted using a DAC. This optimization reduces the hardware overhead and improves the efficiency of the FeNAND array, making it more suitable for larger and more complex neural network architectures.

The synaptic weights for pattern classification were first trained using a Python-based neural network simulation. In BNNs, synaptic weights are modeled as probability distributions defined by the posterior distribution p(W | D). Using the VI, the posterior was approximated as a Gaussian distribution. During training, the μ and σ of the weights were optimized through backpropagation (see the Methods section for BNN training). The training was conducted over 100 epochs, ensuring alignment with the measured μ and σ of the device. As training progressed, the loss gradually decreased, and test accuracy reached 100% (Supplementary Fig. 23a). During inference, sampling is performed 50 times (M = 50) to obtain a predictive distribution of the output.

The μ and σ of the trained synaptic weights were transferred to the fabricated FeNAND array (see the Methods section for mapping synaptic weight distributions to the FeNAND array). Sixteen synaptic distributions were mapped to G⁺ and G^- based on the weight sign. To match the σ of the target synaptic distribution, the ISPP voltage step is adjusted, whereas the cells representing μ were precisely tuned to the target conductance using a program-and-verify method (Fig. 4d). The tuning process was terminated when the conductance reached an error margin of 5×10^-10 of the target value. Figure 4e, f and Supplementary Fig. 21 display the overall 16 weight distributions obtained from the software and their counterparts implemented on the FeNAND array. Figure 4e shows the G⁺ and G^- distributions for weight 1, derived from the FeNAND array. G⁺ and G^- values were determined based on the sign of each weight. Figure 4f compares the weight distribution calculated from the BL current differences (G⁺ – G^-) with the values sampled using the Monte Carlo software. This comparison demonstrated that the distributions obtained from the FeNAND array closely matched the software-generated distributions, validating the accuracy and reliability of the proposed implementation.

The inference process was validated using SPICE simulations to confirm the consistency of the 3D FeNAND structures. To ensure proper connectivity with the fabricated FeNAND array, each string consisted of 24 cells, with inputs provided through eight SSLs and 50 BLs allocated per output node, resulting in 100 BLs. The measured transfer characteristics were modeled using the BSIM model in SPICE to verify the reliability of the simulation (see the Methods section for the detailed fitting process). As illustrated in Fig. 4g, both states were accurately fitted, demonstrating the precision of the SPICE model. The weight distributions were transferred to the 3D FeNAND array using MATLAB for SPICE fitting (see Methods for detailed MATLAB fitting procedures). After successfully transferring all weight distributions, inference was conducted on the test images. Figure 4h shows the predictive distribution results obtained from the software and SPICE simulations, and Fig. 4i shows the average values derived from the sampled predictive distributions. Supplementary Fig. 22 shows the inference processes for both simulation and measurement. In the simulation, inference was conducted by sampling from a learned ideal Gaussian weight distribution. In the measurement, a dispersion was generated using the σ value obtained from the software as the target. Subsequently, an additional BL was applied to fine-tune the μ. The adjusted parameters were then fitted to the SPICE simulation to perform the inference. The predictive distributions from the software closely matched those produced through SPICE inference, exhibiting consistent trends across all test images (see Supplementary Fig. 23b–d for additional test images. b and c present the predictive distributions for different images, and d illustrates the average values of the downward patterns). The agreement between the measured and simulated data validates the accuracy of the weight-transfer process and highlights the feasibility of the proposed hardware-based classification framework, demonstrating its efficiency and reliability.

Medical data analysis using 3D FeNAND-based bayesian neural networks

BNNs possess a distinctive capability to quantify the uncertainty in their predictions, making them particularly advantageous for medical data analysis, where diagnostic precision is critical. Biomedical data obtained using medical devices are inherently vulnerable to various sources of external noise, including limited instrument resolution, calibration inaccuracies, and patient movement during data collection. These factors introduce significant challenges to accurate data analysis and obscure critical diagnostic information. Therefore, evaluating the uncertainty in noisy data is essential for making reliable clinical decisions. By effectively assessing the uncertainty level, medical professionals can account for potential inaccuracies and provide more precise diagnoses, ultimately improving patient outcomes through informed clinical judgment.

In this study, the performance of the 3D FeNAND-based BNN system was evaluated using a chest X-ray dataset (Fig. 5a). Chest X-ray data are a critical tool in medical diagnostics, providing a noninvasive method to assess lung health by detecting structural abnormalities and signs of disease⁶⁷. This plays a vital role in identifying various respiratory conditions, including lung opacities, viral infections, and COVID-19-related complications (lower panel of Fig. 5a). In particular, the diagnosis of pneumonia relies heavily on chest X-ray images, as they can reveal fluid-filled alveoli and blocked bronchioles caused by inflammation and mucus buildup (right panel of Fig. 5a). In a healthy lung, the alveoli remain air-filled and facilitate efficient gas exchange, whereas the bronchioles remain clear, ensuring smooth airflow (left panel of Fig. 5a). However, when pneumonia or similar conditions occur, these structures are compromised, resulting in breathing difficulties and reduced oxygen supply. The ability of chest X-rays to detect such critical features makes them indispensable for diagnosing and managing respiratory diseases. Figure 5b shows a schematic representation of the disease diagnosis system utilizing a 3D FeNAND-based BNN trained for medical data analysis. An explanation of the chest X-ray dataset, along with the training and inference processes of the BNN, is reported in the methods section.

**Fig. 5: Medical data analysis using the 3D FeNAND-based BNN system.**

Figure 5c illustrates the classification accuracy and loss of the deterministic (standard ResNet) and probabilistic (BNN) prediction methods within the same ResNet-101 architecture (see Supplementary Fig. 25 for details). In both approaches, the classification accuracy increased as training progressed, ultimately reaching approximately 96%. Supplementary Fig. 26 presents the confusion matrix of the BNN, which validates the reliable and accurate predictive performance of the network. Although both networks demonstrated comparable performance under noise-free conditions, the performance gap became significant as the external noise levels increased (Fig. 5d). The classification accuracy of the standard ResNet degrades significantly owing to its deterministic predictions and inherent inability to account for uncertainties. In contrast, BNNs employing probabilistic predictions exhibit robust resilience against external noise, maintaining higher classification accuracy through uncertainty-aware inference. This resilience arises from the ability of the BNN to model uncertainty, enabling adaptive decision-making under noisy conditions by considering multiple possible representations of the input data, unlike the fixed-point estimates used in standard ResNet models. This highlights the intrinsic advantages of BNNs in handling noisy data environments. Figure 5e depicts the prediction counts for a specific class when randomly selected noisy chest X-ray data were fed into both the standard ResNet and BNN. Despite receiving chest X-ray data from healthy individuals, the standard ResNet frequently misdiagnoses the data as lung opacity, owing to its susceptibility to external noise. This limitation is further exacerbated by unwarranted confidence in these incorrect predictions. In contrast, the BNN, which employs probabilistic predictions, significantly increases the frequency of correct diagnoses. This behavior highlights the reduced overconfidence of the network and prediction uncertainty in the presence of external noise.

Beyond classification accuracy, various evaluation metrics, such as receiver operating characteristic (ROC), precision–recall (PR), and calibration curves, provide deeper insights into model behavior, particularly under uncertain and noisy conditions. These metrics capture different aspects of predictive performance, including discrimination ability, class imbalance handling, and confidence calibration, enabling a more comprehensive assessment of the reliability and robustness of the networks (see the Methods section for more details). Figure 5f shows the ROC curves for the standard ResNet and BNN under an external noise level (σ/μ) of 0.1. The ROC curve offers a comprehensive evaluation of the model performance by plotting the true positive rate (TPR) against the false positive rate (FPR) across various classification thresholds. The curve (AUC) quantifies the overall discriminative ability of the model. A higher AUC indicates better performance, reflecting the capacity of the model to distinguish between different classes, even when subjected to noisy input data. The BNN demonstrates a larger AUC than the standard ResNet, confirming its superior robustness and uncertainty-aware predictive capability. Figure 5g represents the PR curves for the standard ResNet and BNN under an external noise level (σ/μ) of 0.1. The PR curve is particularly valuable in imbalanced datasets, where positive cases are rare, because it evaluates the tradeoff between precision (positive predictive value) and recall (sensitivity). A higher AUC indicates more reliable detection of relevant cases with fewer false positives. The BNN consistently achieved a higher AUC, reflecting its better handling of uncertain data by reducing the false-positive predictions. Figure 5h depicts the calibration curves for the standard ResNet and BNN under an external noise level (σ/μ) of 0.1. The calibration curve compares the predicted probabilities with the actual observed frequencies, indicating how well the model’s predicted confidence aligns with reality. A perfectly calibrated model aligns with a diagonal line where the predicted probabilities match the actual likelihoods. The standard ResNet demonstrates signs of overconfidence, predicting high probabilities even when incorrect, whereas the BNN demonstrates a more reliable calibration by producing less overconfident and better-calibrated predictions, reflecting its ability to appropriately express uncertainty. Furthermore, the distribution of the mean and standard deviation values of the synaptic weights in the trained BNN was within the range of the measured characteristics of FeNAND (Supplementary Fig. 27). This alignment indicates that the 3D FeNAND-based BNN system can be effectively trained to approximate an ideal model, enabling optimal performance by leveraging the inherent device characteristics. The integration of 3D FeNAND technology with the probabilistic inference capabilities of BNNs enables scalable and energy-efficient medical data analysis with enhanced reliability, even in the presence of external noise. In off-chip trained BNNs, retention loss can affect classification accuracy. To evaluate this impact, we incorporated the retention characteristics (see Supplementary Fig. 11) into our simulations. After 10⁴ seconds, an average retention loss of 5 % was observed across all states; we then extrapolated these data to 10⁵ seconds to cover a broader timescale. Supplementary Fig. 28 shows the performance degradation of the BNN system due to the retention loss, confirming that strong retention characteristics yield only a ∼3 % accuracy drop after 10⁵ seconds. Should further retention loss lead to greater accuracy degradation, the excellent cycle-to-cycle stability of our FeNAND allows for a simple refresh operation to restore the programmed weight distributions.

Discussion

In this study, we demonstrated the hardware implementation of a BNN using a 3D FeNAND architecture based on HZO. By leveraging the ISPP technology, a widely adopted technique in conventional NAND flash memory, we achieved precise and efficient weight distribution control without requiring additional circuitry. The intrinsic device-to-device variations of FeNAND, combined with its page-level programming capability, enabled the generation of Gaussian-distributed weights in a single operation, eliminating the need for iterative program-erase cycles. Furthermore, by modulating the V_step of the ISPP technology, we established a tunable weight distribution spanning a broad range, surpassing previous studies in flexibility and applicability (see Supplementary Table 1). Furthermore, our FeNAND-based BNN implementation achieves better power efficiency compared to SRAM- or other CTF-based NAND flash approaches (see Supplementary Note 6 for detailed power calculations).

To validate the feasibility of our approach in the 3D FeNAND, pre-trained weight distributions were accurately mapped onto the FeNAND array using a custom algorithm and verified through SPICE simulations. The integration of 3D FeNAND technology with probabilistic inference capabilities of BNNs enabled scalable and energy-efficient medical data analysis, enhancing robustness and reliability under noisy conditions. Notably, compared to deterministic models, our FeNAND-based BNN system exhibited superior resilience to external noise, maintaining high classification accuracy through uncertainty-aware inference. These findings establish 3D FeNAND as a promising platform for hardware-based BNNs, bridging the gap between probabilistic computing and non-volatile memory technology while paving the way for next-generation AI hardware optimized for uncertainty-aware deep learning applications.

Method

Fabrication process of HZO based FeNAND

A non-volatile HZO-based MFIS FeNAND device was fabricated on a 100 nm-thick silicon-on-insulator (SOI) wafer. First, Si etching was performed to create BLs with a width of 1 µm through patterning. Subsequently, a 1 nm-thick SiO₂ layer and a 6 nm-thick HZO layer (Hf:Zr = 2:1) were deposited via atomic layer deposition (ALD). A 100 nm-thick TiN layer was then deposited using sputtering, followed by patterning to form WLs with a length of 5 µm (line spacing = 500 nm). Source and drain implantation (P + , dose: 1×10¹⁵) was conducted using a gate patterning mask. To crystallize the HZO layer and activate the dopant, rapid thermal annealing (RTA) was performed at 650 °C for 30 s. The device fabrication was completed using back-end-of-line processing. For further details, please refer to Supplementary Fig. 1.

Electrical measurements

The ferroelectric properties of the fabricated HZO-based FeFETs were investigated using a probe station and Keithley 4200-SCS semiconductor parameter analyzer (Keithley 4200-SCS). DC I_D-V_GS measurements were conducted using a semiconductor parameter analyzer (Keysight B1500A) equipped with a source measurement unit (SMU). A B1500A equipped with a waveform generator/fast measurement unit (B1530A, WGFMU) module and a high-voltage semiconductor pulse generator unit (B1525A, SPGU) were used for the pulse measurements. To characterize the fabricated array, a custom-made probe card (B1500A) and switching matrix (Keysight E5250A) were used. Each input was distributed across multiple channels via the switching matrix and controlled using LABVIEW. The overall experimental setup for the device measurements is shown in Supplementary Fig. 2. All the electrical measurements and characterizations were performed in ambient air at room temperature.

BNN training and inference process

The training of a BNN is based on Bayes’ theorem, in which the posterior distribution p(W | D) combines the prior belief p(W) with the training dataset D to identify the most probable models. Mathematically, Bayes’ theorem is expressed by Eq. (1):

$$\,p\left(W|D\right)=\frac{p\left(D|W\right)p\left(W\right)}{p\left(D\right)},$$

(1)

where $p\left(D|W\right)\,$ represents the likelihood, $p\left(W\right)$ the prior, and $p\left(D\right)$ the evidence. The posterior distribution determines the weight of the network and captures the uncertainty inherent in the training process. In BNNs, calculating the exact posterior distribution is often intractable. Therefore, methods such as VI or Markov chain MCMC sampling were used to approximate the posterior distribution. VI approximates the posterior $p\left(W,|D\right)$ with a simpler, computationally tractable distribution $q\left(W|\theta \right)$, often assumed to follow a normal distribution. Typically, the variational distribution is modeled as a gaussian distribution, where the variational parameters $\theta=(\,\mu,\sigma )$ represent the mean and standard deviation. Unlike traditional neural networks that optimize single-weight values, these parameters are optimized to model a distribution over weights and capture the uncertainty in the network. The optimization of parameters is performed by minimizing the Kullback–Leibler (KL) divergence, which measures the similarity between the distributions p(W | D) and q(W|θ).

$${\theta }^{*}={argmin}\left({KL} \left(p(W\left|D\right.)\left| \right| q(\left.W\right| \theta)\right)\right.$$

(2)

During training, the parameters were optimized using the Bayes by Backprop algorithm, a form of backpropagation adapted for BNNs. Inference in Bayesian models involves multiple steps, in which weights are sampled from the posterior distribution for each step. The network output was then obtained by averaging the predictions across these samples, typically using Monte Carlo sampling with N iterations, as shown in Eq. (3).

$$p(\,\hat{y}{|x},D)\approx \frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}p({\,y|x},{W}_{i})$$

(3)

where x and y denote the test input and output, respectively, N is the number of Monte Carlo samples used to approximate the posterior predictive distribution, D represents the training data, and ${W}_{i}\,$ denotes the weight sampled from the posterior distribution.

Mapping synaptic weight distribution to FeNAND arrays

Supplementary Fig. 8 illustrates the overall off-chip training process of the BNN. After training the software, the synaptic weight distributions were transferred to an FeNAND array for off-chip processing. Upon completion of training, each weight was represented by its µ and σ. To map the software-trained weight range to the device’s range, a scaling factor was applied to both the µ and σ.

$${{\rm{\mu }}}^{{hardware}}=\alpha \times \,{{\rm{\mu }}}^{{software}}\,,\,{\sigma }^{{hardware}}=\alpha \times \,{\sigma }^{{software}}\,$$

(4)

In addition, the number of programmed BLs was adjusted based on the number of samples used for inference. To represent negative weights, the devices for σ and µ were separated, and weight distribution was expressed as the difference between them. The process of transferring the weight distribution to FeNAND is as follows: First, all cells connected to the selected WL are erased by applying −5.5 V for 100 µs. After erasing, the current at the read voltage (0.4 V) was recorded. V_PGM was initially set at 3 V and increased incrementally in predefined steps. For σ transfer, the process continued until the conductance of all cells connected to the WL exceeded the target conductance value or reached the maximum V_PGM (6 V). Owing to the NAND structure, cells on the same WL may change their state because of V_PGM. To prevent further changes in their state, V_inhibit was applied to the BL of cells that exceeded the target conductance (Supplementary Fig. 10).

For μ tuning, precise control was achieved by recalculating the error after each application of the programming voltage, with programming and erasing repeated iteratively until the conductance falls within the specified margin (5e^-10). If the value is smaller than the target value, the program voltage gradually increases. Conversely, if the target is exceeded, the erase voltage is incrementally increased to adjust the value within the error margin. This approach ensures that the entire synaptic weight distribution is transferred to the FeNAND array simultaneously rather than transferring individually sampled weights.

SPICE Fitting

The measured I–V transfer curve was fitted using the SPICE Berkeley Short-Channel IGFET Model (BSIM4) (Level = 54). Various parameters of the BSIM model, such as the threshold voltage, subthreshold swing, and gate-induced drain leakage, were adjusted to accurately fit the program and erase states (Fig. 4g). The SPICE-based 3D FeNAND simulation accurately reflects the actual weight distribution transferred from the software-trained weights. To properly match the µ and σ of each weight distribution obtained from FeNAND, a MATLAB-based SPICE fitting algorithm was developed.

The algorithm runs SPICE in MATLAB and dynamically adjusts the threshold voltage of the SPICE model. To ensure precise and efficient current fitting, the threshold voltage was adjusted based on the calculated error magnitude. When the difference between the read voltage current of the model and experimentally measured current was large, a larger voltage adjustment was applied. Conversely, when the difference was small, the adjustment was reduced. The fitting process was terminated when the error fell below a predefined threshold (Algorithm 1). Empirical, skewed conductance histograms were embedded unchanged in the SPICE model to quantify their effect on network accuracy.

Algorithm 1

Measured current distribution for SPICE simulation

1: Margin: Allowed current error

2: G_M : Measured FeNAND current

3: V_s: Applied Voltage

4: i_max: Maximum number of iterations

5: G: Fitted FeNAND current

6: G ← RESET

7: i ← 0

8: V_s ← V_init

9: while i < i_max do

10: i ← i + 1

11: if |G − G_M | ≤ 5 × 10⁻¹⁰ then

12: END

13: else if |G − G_M | > 0.1 × |G_M | then

14: V_s ← V_s ∓ 0.2

15: else if 0.05 × |G_M | < |G − G_M | < 0.1 × |G_M | then

16: V_s ← V_s ∓ 0.015

17: else

18: V_s ← V_s ∓ 0.005

19: end if

20: end while

Chest X-ray dataset

The chest X-ray dataset consisted of 16,932 training and 2116 validation images, each represented as grayscale images with a resolution of 224 × 224 pixels. The dataset is organized into four distinct classes: one representing healthy individuals (“Normal”) and three corresponding to specific respiratory conditions (“COVID,” “Lung opacity,” and “Viral pneumonia”). This multiclass structure allowed for a comprehensive evaluation of the ability of the proposed BNN system to distinguish between normal and diseased lungs.

Evaluation metrics

The 3D FeNAND-based BNN system was evaluated using three performance metrics: ROC, PR, and calibration curves. These metrics offer a comprehensive assessment of the predictive capabilities of a network by capturing different aspects of performance, particularly in the context of uncertain and noisy medical data.

1) ROC curve: The ROC curve visualizes the tradeoff between TPR and FPR across varying classification thresholds. The TPR represents the proportion of correctly identified positive cases, whereas the FPR indicates the proportion of incorrectly identified negative cases. A perfect balance model maximizes the TPR while minimizing the FPR. The ROC AUC quantifies the overall discriminative ability of the model. An AUC approaching 1 indicates superior performance in distinguishing between classes, even in the presence of external noise. Because the chest X-ray dataset used in this study included four classes, individual ROC curves were generated for each class, and a single representative curve was obtained using macro-averaging, averaging the AUC scores across all classes.

2) PR curve: The PR curve evaluates the relationship between the precision (the proportion of correctly predicted positive samples among all predicted positives) and recall (the proportion of correctly predicted positives among all actual positive samples). This metric is particularly useful in datasets with class imbalance because it focuses on the positive class and avoids distortion by a large number of negative samples, which is a common limitation of ROC curves. A higher PR AUC value indicates superior positive class detection and fewer false positives. Similar to the ROC analysis, individual PR curves were generated for each of the four dataset classes, and macro-averaging was applied to aggregate the results into a single PR curve.

3) Calibration curve: The calibration curve evaluates the alignment between the predicted probabilities and actual observed frequencies, providing insight into the reliability of the model in representing uncertainty. A perfectly calibrated model aligns along the diagonal line (y = x) where the predicted and actual probabilities match exactly. Deviations from this diagonal indicate calibration errors. Overconfidence occurs when the model predicts probabilities higher than the actual likelihood, resulting in points below the diagonal. In contrast, underconfidence occurs when the model predicts probabilities lower than the actual likelihood, placing points above the diagonal.

Uncertainty estimation in MNIST images

Unlike conventional neural networks, where weights are fixed values, a BNN represents weights as distributions. Through multiple Monte Carlo samplings, BNNs provide predictive distributions for the output values. In BNNs, total uncertainty, or predictive uncertainty, is quantified by combining the aleatoric uncertainty inherent in the data with the epistemic uncertainty, which reflects the model’s uncertainty regarding its parameters. The total uncertainty expressed in Eq. (5) can be applied to classification or entropy calculations using the SoftMax values of the predictive distributions⁶³.

$${U}_{{total}}=\,-\mathop{\sum }\limits_{l=1}^{N}\left(\frac{1}{M}\mathop{\sum }\limits_{k=1}^{M}p\left(y=l|X,{W}_{k}\right)\right)\log \left(\frac{1}{M}\mathop{\sum }\limits_{k=1}^{M}p\left(y=l|X,{W}_{k}\right)\right)\,(5)$$

where M denotes the number of sampling iterations, N represents the number of output classes, and y refers to the output values from the SoftMax function. The total uncertainty is composed of the sum of the aleatoric and epistemic uncertainties.

$${U}_{{aleatoric}}=\,-\frac{1}{M}\mathop{\sum }\limits_{l=1}^{N}\mathop{\sum }\limits_{k=1}^{M}p\left( y=l\left|x,{W}_{k}\right)\right)\log \left(\,p\left( y=l|x,{W}_{k}\right)\right)\,(6)$$

Aleatoric and epistemic uncertainties represent distinct types of information, making it essential to distinguish them. Aleatoric uncertainty arises from the inherent properties or ambiguity of the data itself and occurs in situations where a specific input can simultaneously support multiple outcomes. By contrast, epistemic uncertainty originates from unknown situations and is primarily caused by data that the model has not encountered during training.

Supplementary Fig. 16 presents the uncertainty density distribution results obtained using the Modified National Institute of Standards and Technology (MNIST) images. The network consists of two convolutional layers followed by two fully connected layers (256–256–10), reflecting the range of µ and σ of the hardware. Each convolutional layer uses a 5×5 kernel with stride 1 and is followed by average pooling. The first convolutional layer produces 6 feature maps, and the second takes those 6 channels as input and outputs 16 feature maps.The network was trained using VI, and predictions were made with 50 sampling iterations. The uncertainty was calculated for three distinct cases: Correct, Unseen, and Incorrect. To compute the epistemic uncertainty, Fashion MNIST data were incorporated into the test dataset. Supplementary Fig. 24 presents the calculated aleatoric and epistemic uncertainties for the correct predictions (dark cyan), incorrect predictions (red), and unseen data (purple). In both cases, the uncertainty values for the correct images were mostly below 0.4, indicating low uncertainty. In contrast, for the unseen images, the uncertainty values were generally above 0.5, indicating significantly higher uncertainty. These results demonstrate that BNN can assess uncertainty and recognize novel, previously unseen images.

Data availability

All relevant data are available within the article and in the Supplementary Information. The data generated in this study are provided in Source Data files. All other data are available from the corresponding author upon request. Source data are provided with this paper.

Code availability

All relevant codes used for the simulation are available in the article and Supplementary Information. All other codes are available from the corresponding authors on request.

Change history

14 August 2025
In the acknowledgments section of this paper funding sources were omitted. The original article has been corrected.

References

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article ADS CAS PubMed Google Scholar
Shah, A. et al. A comprehensive study on skin cancer detection using artificial neural network (ANN) and convolutional neural network (CNN). Clin. eHealth https://www.keaipublishing.com/en/journals/clinical-ehealth/ (2023).
Mall, P. K. et al. A comprehensive review of deep neural networks for medical image processing: Recent developments and future opportunities. Healthc. Anal., 100216 https://doi.org/10.1016/j.health.2023.100216 (2023).
Kurani, A., Doshi, P., Vakharia, A. & Shah, M. A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting. Ann. Data Sci. 10, 183–208 (2023).
Article Google Scholar
Chouard, T. & Venema, L. Machine intelligence. Nature 521, 435 (2015).
Article ADS CAS PubMed Google Scholar
Atakishiyev, S., Salameh, M., Yao, H. & Goebel, R. Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions. IEEE Access https://doi.org/10.48550/arXiv.2112.11561 (2024).
Ielmini, D. & Wong, H.-S. P. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Article Google Scholar
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Article ADS CAS PubMed Google Scholar
Misra, J. & Saha, I. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 239–255 (2010).
Article Google Scholar
Aguirre, F. et al. Hardware implementation of memristor-based artificial neural networks. Nat. Commun. 15, 1974 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64 (2015).
Article ADS CAS PubMed Google Scholar
Burr, G. W. et al. Neuromorphic computing using non-volatile memory. Adv. Phys. X 2, 89–124 (2017).
Google Scholar
Kim, K., Song, M. S., Hwang, H., Hwang, S. & Kim, H. A comprehensive review of advanced trends: from artificial synapses to neuromorphic systems with consideration of non-ideal effects. Front. Neurosci. 18, 1279708 (2024).
Article PubMed PubMed Central Google Scholar
Rao, M. et al. Thousands of conductance levels in memristors integrated on CMOS. Nature 615, 823–829 (2023).
Article ADS CAS PubMed Google Scholar
Zhao, M., Gao, B., Tang, J., Qian, H. & Wu, H. Reliability of analog resistive switching memory for neuromorphic computing. Appl. Phys. Rev. 7, https://doi.org/10.1063/1.5124915 (2020).
Shin, W. et al. Self‐curable synaptic ferroelectric FET arrays for neuromorphic convolutional neural network. Adv. Sci. 10, 2207661 (2023).
Article CAS Google Scholar
Mikolajick, T., Park, M. H., Begon‐Lours, L. & Slesazeck, S. From ferroelectric material optimization to neuromorphic devices. Adv. Mater. 35, 2206042 (2023).
Article CAS Google Scholar
Schroeder, U., Park, M. H., Mikolajick, T. & Hwang, C. S. The fundamentals and applications of ferroelectric HfO2. Nat. Rev. Mater. 7, 653–669 (2022).
Article ADS Google Scholar
Jerry, M. et al. Ferroelectric FET analog synapse for acceleration of deep neural network training. 2017 IEEE International Electron Devices Meeting (IEDM). IEEE 10.1109/IEDM.2017.8268338 (2017).
Koo, R. H. et al. Polarization pruning: reliability enhancement of hafnia‐based ferroelectric devices for memory and neuromorphic computing. Adv. Sci. 11, 2407729 (2024).
Article CAS Google Scholar
Mulaosmanovic, H. et al. Ferroelectric field-effect transistors based on HfO2: a review. Nanotechnology 32, 502002 (2021).
Article CAS Google Scholar
Ali, T. et al. High endurance ferroelectric hafnium oxide-based FeFET memory without retention penalty. IEEE Trans. Electron Devices 65, 3769–3774 (2018).
Article ADS CAS Google Scholar
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Article ADS CAS PubMed Google Scholar
Kim, S. et al. Memristive architectures exploiting self-compliance multilevel implementation on 1 KB crossbar arrays for online and offline learning neuromorphic applications. ACS nano 18, 25128–25143 (2024).
Article CAS PubMed Google Scholar
Duan, X. et al. Memristor‐based neuromorphic chips. Adv. Mater. 36, 2310704 (2024).
Article CAS Google Scholar
Lu, C. et al. Self-rectifying all-optical modulated optoelectronic multistates memristor crossbar array for neuromorphic computing. Nano Lett 24, 1667–1672 (2024).
Article ADS CAS PubMed Google Scholar
Zhong, Y. et al. Dynamic memristor-based reservoir computing for high-efficiency temporal signal processing. Nat. Commun. 12, 408 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, J. et al. Analog reservoir computing via ferroelectric mixed phase boundary transistors. Nat. Commun. 15, 9147 (2024).
Article CAS PubMed PubMed Central Google Scholar
Li, Q. et al. High-performance ferroelectric field-effect transistors with ultra-thin indium tin oxide channels for flexible and transparent electronics. Nat. Commun. 15, 2686 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Gong, T. et al. First Demonstration of a Bayesian Machine based on Unified Memory and Random Source Achieved by 16-layer Stacking 3D Fe-Diode with High Noise Density and High Area Efficiency. 2023 International Electron Devices Meeting (IEDM). IEEE (2023).
Chen, L. et al. Ultra-low power Hf 0.5 Zr 0.5 O 2 based ferroelectric tunnel junction synapses for hardware neural network applications. Nanoscale 10, 15826–15833 (2018).
Article CAS PubMed Google Scholar
Garcia, V. & Bibes, M. Ferroelectric tunnel junctions for information storage and processing. Nat. Commun. 5, 4289 (2014).
Article ADS CAS PubMed Google Scholar
Kim, I.-J., Kim, M.-K. & Lee, J.-S. Highly-scaled and fully-integrated 3-dimensional ferroelectric transistor array for hardware implementation of neural networks. Nat. Commun. 14, 504 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, M.-K., Kim, I.-J. & Lee, J.-S. CMOS-compatible ferroelectric NAND flash memory for high-density, low-power, and high-speed three-dimensional memory. Sci. Adv. 7, eabe1341 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, G. H. et al. Ferroelectric field-effect transistors for binary neural network with 3-D NAND architecture. IEEE Trans. Electron Devices 69, 6438–6445 (2022).
Article ADS Google Scholar
Bhatti, S. et al. Spintronics based random access memory: a review. Mater. Today 20, 530–548 (2017).
Article Google Scholar
Xu, N. et al. STT-MRAM design technology co-optimization for hardware neural networks. 2018 IEEE International Electron Devices Meeting (IEDM). IEEE (2018).
Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601, 211–216 (2022).
Article ADS CAS PubMed Google Scholar
Kim, J. et al. Demonstration of in‐memory biosignal analysis: novel high‐density and low‐power 3D flash memory array for arrhythmia detection. Adv. Sci. 26, 2308460 (2024).
Lee, S.-T. & Lee, J.-H. Neuromorphic computing using NAND flash memory architecture with pulse width modulation scheme. Front. Neurosci. 14, 571292 (2020).
Article PubMed PubMed Central Google Scholar
Guo, X. et al. Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology. 2017 IEEE International Electron Devices Meeting (IEDM). IEEE (2017).
Wang, P. et al. Three-dimensional NAND flash for vector–matrix multiplication. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27, 988–991 (2018).
Article Google Scholar
Song, M. S. et al. Kernel mapping methods of convolutional neural network in 3D nand flash architecture. Electronics 12, 4796 (2023).
Article CAS Google Scholar
Kabir, H. D., Khosravi, A., Hosen, M. A. & Nahavandi, S. Neural network-based uncertainty quantification: A survey of methodologies and applications. IEEE Access 6, 36218–36234 (2018).
Article Google Scholar
Gawlikowski, J. et al. A survey of uncertainty in deep neural networks. Artif. Intell. Rev. 56, 1513–1589 (2023).
Article Google Scholar
Kristiadi, A., Hein, M. & Hennig, P. Being bayesian, even just a bit, fixes overconfidence in relu networks. International Conference on Machine Learning. 5436-5446 (PMLR, 2020).
Abdar, M. et al. A review of uncertainty quantification in deep learning: techniques, applications and challenges. Inf. Fusion 76, 243–297 (2021).
Article Google Scholar
Boniardi, M. et al. Statistics of resistance drift due to structural relaxation in phase-change memory arrays. IEEE Trans. Electron Devices 57, 2690–2696 (2010).
Article ADS Google Scholar
Park, J. et al. Intrinsic variation effect in memristive neural network with weight quantization. Nanotechnology 33, 375203 (2022).
Article CAS Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K. & Wierstra, D. Weight uncertainty in neural network. International Conference on Machine Learning, 1613–1622 (PMLR, 2015).
Neal, R. M. Bayesian learning for neural networks. Vol. 118 (Springer Science & Business Media, 2012).
MacKay, D. J. A practical Bayesian framework for backpropagation networks. Neural Comput 4, 448–472 (1992).
Article Google Scholar
Jospin, L. V., Laga, H., Boussaid, F., Buntine, W. & Bennamoun, M. Hands-on Bayesian neural networks—A tutorial for deep learning users. IEEE Comput. Intell. Mag. 17, 29–48 (2022).
Article Google Scholar
Shridhar, K., Laumann, F. & Liwicki, M. A comprehensive guide to bayesian convolutional neural network with variational inference. ArXiv abs/1901.02731 (2019).
Tzikas, D. G., Likas, A. C. & Galatsanos, N. P. The variational approximation for Bayesian inference. IEEE Signal Process. Mag. 25, 131–146 (2008).
Article ADS Google Scholar
Andrieu, C., De Freitas, N., Doucet, A. & Jordan, M. I. An introduction to MCMC for machine learning. Mach. Learn. 50, 5–43 (2003).
Article Google Scholar
Hoffman, M. D. & Gelman, A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014).
MathSciNet Google Scholar
Sebastian, A. et al. Two-dimensional materials-based probabilistic synapses and reconfigurable neurons for measuring inference uncertainty using Bayesian neural networks. Nat. Commun. 13, 6139 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Lin, Y. et al. Uncertainty quantification via a memristor Bayesian deep neural network for risk-sensitive reinforcement learning. Nat. Mach. Intell. 5, 714–723 (2023).
Article Google Scholar
Dalgaty, T., Esmanhotto, E., Castellani, N., Querlioz, D. & Vianello, E. Ex situ transfer of bayesian neural networks to resistive memory‐based inference hardware. Adv. Intell. Syst. 3, 2000103 (2021).
Article Google Scholar
Oh, S. et al. Bayesian neural network implemented by dynamically programmable noise in vanadium oxide. 2023 international electron devices meeting (IEDM). IEEE (2023).
Lin, Y. et al. Bayesian neural network realization by exploiting inherent stochastic characteristics of analog RRAM. 2019 IEEE International Electron Devices Meeting (IEDM). IEEE (2019).
Bonnet, D. et al. Bringing uncertainty quantification to the extreme-edge with memristor-based Bayesian neural networks. Nat. Commun. 14, 7530 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Yang, Giho et al. Improved ISPP scheme for narrow threshold voltage distribution in 3-D NAND flash memory.”. Solid-State Electron 202, 108607 (2023).
Article CAS Google Scholar
Park, Sung-Ho et al. Reliability improvement in vertical NAND flash cells using adaptive incremental step pulse programming (A-ISPP) and incremental step pulse erasing (ISPE). IEEE Trans. Electron Devices 71, 1834–1838 (2024).
Article ADS Google Scholar
Park, Chanyang et al. Investigation of program efficiency overshoot in 3D vertical channel NAND flash with randomly distributed traps. Nanomaterials 13, 1451 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, G. et al. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nat. Biomed. Eng 5, 509–521 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea Government Ministry of Science and ICT (MSIT) under Grant RS-2023-00260527 (MS Song, D.K.), in part by the National Research Foundation of Korea(NRF) grant funded by the Korea government (MSIT) (RS-2024-00441473, D. K.), in part by the Nano & Material Technology Development Program through the National Research Foundation of Korea(NRF) funded by Ministry of Science and ICT(RS-2024-00444182, MS Song, D. K.), and by the Samsung Advanced Institute of Technology (Grant No. IO240408-09521-01, D. K.).

Author information

These authors contributed equally: Minsuk Song, Ryun-Han Koo, Jangsaeng Kim.

Authors and Affiliations

Department of Nanoscale Semiconductor Engineering, Hanyang University, Seoul, 04763, Republic of Korea
Minsuk Song & Daewoong Kwon
Department of Electrical and Computer Engineering and Inter-university Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea
Ryun-Han Koo & Jonghyun Ko
Department of Electronic Engineering, Sogang University, Seoul, 04107, Republic of Korea
Jangsaeng Kim
Department of System Semiconductor Engineering, Sogang University, Seoul, 04107, Republic of Korea
Jangsaeng Kim
Department of Electrical Engineering, Hanyang University, Seoul, 04763, Republic of Korea
Chang-Hyeon Han & Jiyong Yim
Thin Film Technical Unit, Device Research Center (DRC), Samsung Advanced Institute of Technology, Suwon, 16678, Republic of Korea
Sijung Yoo, Duk-hyun Choe & Sangwook Kim
Department of Semiconductor Convergence Engineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea
Wonjun Shin

Authors

Minsuk Song
View author publications
Search author on:PubMed Google Scholar
Ryun-Han Koo
View author publications
Search author on:PubMed Google Scholar
Jangsaeng Kim
View author publications
Search author on:PubMed Google Scholar
Chang-Hyeon Han
View author publications
Search author on:PubMed Google Scholar
Jiyong Yim
View author publications
Search author on:PubMed Google Scholar
Jonghyun Ko
View author publications
Search author on:PubMed Google Scholar
Sijung Yoo
View author publications
Search author on:PubMed Google Scholar
Duk-hyun Choe
View author publications
Search author on:PubMed Google Scholar
Sangwook Kim
View author publications
Search author on:PubMed Google Scholar
Wonjun Shin
View author publications
Search author on:PubMed Google Scholar
Daewoong Kwon
View author publications
Search author on:PubMed Google Scholar

Contributions

MS Song conducted research and performed device fabrication, experiments, electrical measurements, and simulations. He wrote the manuscript. R.H.K. performed electrical measurements and analyses and wrote the manuscript. J.K. performed the simulations and wrote the manuscript. C.H.H. fabricated the devices. J.Y. supported the electrical measurements. J.H.K. supported power estimation simulations. S.Y., D.H.C., and S.K. supervised the analysis. W.S. supported the simulations. D.K. directed the study and drafted the manuscript.

Corresponding authors

Correspondence to Jangsaeng Kim or Daewoong Kwon.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Saptarshi Das, Amritanand Sebastian, Huaqiang Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.”

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Song, M., Koo, RH., Kim, J. et al. Ferroelectric NAND for efficient hardware bayesian neural networks. Nat Commun 16, 6879 (2025). https://doi.org/10.1038/s41467-025-61980-y

Download citation

Received: 01 February 2025
Accepted: 07 July 2025
Published: 25 July 2025
Version of record: 25 July 2025
DOI: https://doi.org/10.1038/s41467-025-61980-y

This article is cited by

Recent advances in ferroelectric materials, devices, and in-memory computing applications
- Hwiho Hwang
- Sangwook Youn
- Hyungjin Kim
Nano Convergence (2025)