Pruning random resistive memory for optimizing analog AI

Li, Yi; Wang, Songqi; Zhao, Yaping; Wang, Shaocong; Wang, Bo; Zhang, Woyu; He, Yangu; Lin, Ning; Cui, Binbin; Chen, Xi; Zhang, Shiming; Jiang, Hao; Lin, Peng; Zhang, Xumeng; Zhang, Feng; Qi, Xiaojuan; Wang, Zhongrui; Xu, Xiaoxin; Shang, Dashan; Liu, Qi; Wang, Han; Cheng, Kwang-Ting; Liu, Ming

doi:10.1038/s41467-025-67960-6

Download PDF

Article
Open access
Published: 10 January 2026

Pruning random resistive memory for optimizing analog AI

Nature Communications volume 17, Article number: 1190 (2026) Cite this article

4490 Accesses
Metrics details

Subjects

Abstract

The rapid expansion of AI models has intensified concerns over energy consumption. Analog in-memory computing with resistive memory offers a promising, energy-efficient alternative, yet its practical deployment is hindered by programming challenges and device non-idealities. Here, we propose a software-hardware co-design that trains randomly weighted resistive-memory neural networks via edge-pruning topology optimization. Software-wise, we tailor the network topology to extract high-performing sub-networks without precise weight tuning, enhancing robustness to device variations and reducing programming overhead. Hardware-wise, we harness the intrinsic stochasticity of resistive-memory electroforming to generate large-scale, low-cost random weights. Implemented on a 40 nm resistive memory chip, our co-design yields accuracy improvements of 17.3% and 19.9% on Fashion-MNIST and Spoken Digit, respectively, and a 9.8% precision-recall AUC improvement on DRIVE, while reducing energy consumption by 78.3%, 67.9%, and 99.7%. We further demonstrate broad applicability across analog memory technologies and scalability to ResNet-50 on ImageNet-100.

Energy efficient training of private recommendation systems using multi-armed bandit models and analog in-memory computing

Article Open access 01 September 2025

A full-stack memristor-based computation-in-memory system with software-hardware co-development

Article Open access 03 March 2025

Echo state graph neural networks with analogue random resistive memory arrays

Article Open access 13 February 2023

Introduction

Recent advances in artificial intelligence (AI), particularly deep learning, have revolutionized natural language and image processing, enabling capabilities that increasingly rival human intelligence^1,2. However, the trajectory toward larger, more sophisticated models necessitates immense computational resources, raising significant concerns regarding energy consumption and environmental sustainability^3,4.

Revisiting analog computing—a technique predating digital architectures-presents a compelling solution⁵. By harnessing emerging analog devices such as resistive memory^6,7,8,9,10, analog computing processes informative analog signals directly, enhancing energy efficiency in three fundamental ways^{11,12,13,14,15,16,17}. First, traditional digital systems physically separate memory and processing units, incurring substantial latency and energy overheads due to frequent data shuttling-the so-called von Neumann bottleneck^{18,19,20,21,22,23}. In contrast, analog resistive memory collocates storage and processing within the same physical device^24,25,26,27. Second, as complementary metal-oxide-semiconductor (CMOS) transistors approach physical scaling limits, the pace of Moore’s law is slowing^28,29,30,31. Unlike CMOS, analog resistive memory offers high scalability and stackability^{12,32,33,34,35,36}. Third, while standard digital memory (e.g., dynamic random access memory, DRAM) is volatile, non-volatile analog resistive memory retains data without a continuous power supply^{37,38,39,40,41,42,43,44,45}.

Despite these advantages, emerging analog computing systems face persistent hurdles: programming non-idealities and high programming costs. Analog resistive memory exhibits inherent stochasticity and nonlinearity during programming^{46,47,48,49,50,51,52,53}. Furthermore, the energy and time overheads associated with programming these devices are substantially higher than those of their digital counterparts^54,55,56. Consequently, harnessing the efficiency of analog computing while mitigating these programming drawbacks remains a major open challenge for the AI hardware and electronics communities.

To address these challenges, we introduce a software-hardware co-design framework based on edge-pruning topology optimization for randomly weighted resistive memory neural networks. Inspired by clipped Hebbian-rule-based structural plasticity⁵⁷, this approach emulates the brain’s postnatal development: synaptic overproduction, consolidation of functional synapses, and the elimination of redundant ones following extended learning^58,59. In contrast to conventional weight optimization methods that rely on precise tuning of resistive memory conductance^{12,14,15,42,60,61,62}, our strategy directly engineers the topology of a randomly initialized network by selectively “turning off" insignificant weights while preserving the rest. Moreover, we leverage the intrinsic electroforming stochasticity of resistive memory to generate large-scale, low-cost hardware random weights, thereby transforming programming variability into a functional asset. Building on the theory by Ramanujan et al.⁶³-which posits that a sub-network pruned from a sufficiently large, randomly weighted network can match the accuracy of a fully optimized one-we physically reset (or set) resistive memory to prune (or reinstate) network edges. By avoiding precise programming, our approach is inherently robust to device non-idealities and eliminates the laborious conductance tuning and verification required in traditional optimization.

We validate our co-design on three representative tasks-image classification, audio classification, and image segmentation-using a hybrid analog-digital system featuring a 40 nm, 256 K resistive memory-based in-memory computing core. With identical network architectures and parameter counts, our approach achieves accuracy improvements of 17.3% and 19.9% on the Fashion-MNIST and Spoken Digit datasets, respectively, while reducing programming operations by 99.94% and 99.93% compared to hardware-in-the-loop weight optimization. The corresponding scaled inference energy per sample is reduced by 78.3% and 67.9% relative to state-of-the-art GPUs. Additionally, simulations of U-Net segmentation on the DRIVE dataset yield an area under the precision-recall curve (PR-AUC) of 0.91 and a receiver operating characteristic (ROC) AUC of 0.97 (improvements of 9.8% and 2.0% over GPUs), accompanied by a 99.7% reduction in inference energy enabled by in-memory computing and sparsity. To further demonstrate scalability, we simulate our co-design with ResNet-50 on ImageNet-100, achieving an average Top-1 accuracy of 87.6% alongside 99.3% energy savings. This work offers a universal solution for AI leveraging analog computing with emerging resistive memory (see Supplementary Figs. 13,14 and Supplementary Tables 11,12 for Llama 3^64,65 LoRA⁶⁶ fine-tuning using edge-pruning topology optimization).

Results

Software-hardware co-design: edge-pruning topology optimization for randomly weighted resistive memory neural networks

Figure 1 schematically illustrates the proposed software-hardware co-design, featuring edge-pruning topology optimization applied to a randomly weighted resistive memory neural network.

**Fig. 1: Software-hardware co-designed edge-pruning topology optimization.**

From a software perspective, our bio-inspired edge-pruning topology optimization emulates the human brain’s postnatal development, characterized by synaptic overproduction, the consolidation of functional synapses, and the elimination of redundant ones following extended learning (Fig. 1, upper left). In contrast to conventional weight optimization methods that fine-tune weights to minimize loss, this approach engineers the network architecture to uncover an effective sub-network without altering weight values. This methodology is grounded in a corollary to the Lottery Ticket Hypothesis⁶³, which posits that pruned sub-networks derived from overparameterized neural networks can achieve accuracy competitive with that of the original, fully optimized networks. As depicted in the second panel of Fig. 1, the process initiates with the random initialization of the resistive memory network via electroforming stochasticity. Each edge is assigned a fixed random weight and a corresponding importance score. During the forward pass, edges with low scores in each layer-identified as redundant-are pruned to define the sub-network structure; if a neuron’s connections are heavily pruned, the neuron itself may be subsequently eliminated (see Supplementary Table 8). In the backward pass, scores are updated, and select edges are reinstated (replaced) to refine the sub-network topology and minimize training error (see “Methods” and Supplementary Table 1 for algorithmic details).

In terms of hardware implementation, the proposed optimization is realized on a hybrid analog-digital computing system comprising two main components: an analog core based on a 40 nm, 256 K resistive memory in-memory computing macro, which generates random weights, accelerates computation-intensive matrix multiplications, and executes edge pruning; and a digital core utilizing a Xilinx System-on-Chip (SoC) (see “Methods”). By collocating weight storage and computation, the analog core significantly mitigates data movement between memory and processing units, thereby offering superior energy efficiency and parallelism compared to conventional digital architectures (see Supplementary Fig. 15 for details on the fully integrated resistive memory chip). Prior to training, the resistive memory array is partitioned into positive and negative conductance sub-arrays (G⁺ and G⁻), which encode the weight matrix via differential conductance (Fig. 1, right). Initially insulating, the as-deposited cells display a narrow weight distribution centered near zero. Subsequent electroforming induces random analog weights within the G⁺ and G⁻ arrays, resulting in differential-pair conductances that follow a mixture of two quasi-normal distributions. This process leverages electroforming stochasticity as an intrinsic entropy source, providing large-scale, low-cost true random weights for the overparameterized network (see Supplementary Table 3 for randomness analysis). During training, unnecessary edges are physically pruned by resetting the corresponding resistive memory differential pair, a process that ruptures the conductive filaments and zeros the conductance. Conversely, to reinstate a previously pruned but critical connection, the pair is set back to a conducting state, restoring the filament (see Methods for reset/set operations). Upon completion of training, the high-performing sub-network is frozen, yielding a conductance distribution characterized by a mixture of three quasi-normal distributions. Unlike conventional pruning strategies that primarily seek to sparsify networks to reduce computational complexity, our topology optimization introduces a training scheme specifically tailored for online learning in CIM edge systems, effectively circumventing weight optimization. This approach successfully mitigates key challenges associated with resistive memory, notably programming non-idealities and high programming costs (see Supplementary Fig. 10 for the impact of programming non-idealities).

Physical origin of resistive memory array programming stochasticity

Resistive memory cells were fabricated at the nanoscale and electroformed under identical conditions. The programmed cells were then sliced using a focused ion beam (FIB) before being examined by a high-resolution transmission electron microscope (TEM) to elucidate the microstructural origin of the programming stochasticity.

Figure 2a shows an optical photo of the resistive memory in-memory computing macro. This monolithically integrated 256 K macro adopts a crossbar structure, with resistive memory cells in each row sharing bottom electrodes and those in each column sharing top electrodes. The cells are integrated with CMOS on a 40 nm standard logic platform between the metal 4 and metal 5 layers via a back-end-of-line process, as revealed by high-angle annular dark-field (HAADF) scanning transmission electron microscopy (STEM) (Fig. 2b). Figure 2c highlights clear differences in cross-sectional HAADF-STEM images between pristine and electroformed cells. In the pristine cell (left), the tantalum and tantalum oxide (Ta/TaO_x) resistive layer shows uniform structure (e.g., green-boxed area), whereas electroformed cells (middle and right) display brighter contrast regions (e.g., red-boxed areas) between electrodes, indicating structural variations likely due to conducting channels. To analyze composition, energy-dispersive X-ray spectroscopy (EDS) line scans were performed, revealing that brighter regions (red arrow) have a lower oxygen-to-tantalum ratio than darker regions (green arrow), confirming the presence of conducting channels^46,47,67. The red line profiles of the two cells also differ markedly, aligning with the electroforming stochasticity observed in Fig. 2h, i. Complementary low-energy electron energy loss spectroscopy (EELS) plane scans and center-point spectra (Fig. 2d, e) of electroformed cells further demonstrate variations in oxygen-vacancy concentrations and valence between dark (Area1) and bright (Area2) regions. Losses in Area1 (25.2 and 25.1 eV) resemble the plasmon peak of insulating TaO₂ (25.7 eV), while those in Area2 (24.6 and 24.4 eV) match metallic TaO (24.5 eV), indicating oxygen-vacancy migration during electroforming^67,68-consistent with EDS findings. Additionally, EELS low-loss peak maps visualize the conductive channels, showing oxygen-vacancy-rich TaO_x clusters spanning nearly all of Area2, with no complete channels in Area1 (Fig. 2f). The varied patterns of these channels and corresponding EELS-peak distributions (Fig. 2g) underscore the inherent stochasticity of electroforming, driven by thin-film inhomogeneity and random oxygen-ion motion, enabling low-cost, scalable random conductance for physically implementing randomly weighted neural networks.

**Fig. 2: Physical and electrical characterization of resistive memory (RM) electroforming stochasticity.**

The intrinsic stochasticity of electroforming is harnessed to physically implement randomly weighted neural networks for the proposed edge pruning topology optimization. Figure 2h illustrates the random weight generation process using the resistive memory array. Initially, electroforming voltages are sampled from a small-scale sub-array, defined as the minimum voltage switching cell resistance from ~30 MΩ to below 300 KΩ. Based on the cumulative probability, a uniform 3.4 V pulse (10 ms width) is applied to the G⁺ (G⁻) array at 120 °C, yielding a random conductance matrix with approximately half the cells electroformed and the rest insulating (sparsity of 0.5). High-temperature forming enhances cell retention and Ron/Roff uniformity (see Supplementary Fig. 3). Finally, each cell in the G⁻ array is electroformed to a complementary state relative to G⁺ (see Supplementary Table 5 for impacts of random weight distributions). During training, weights are pruned by resetting corresponding resistive memory differential pairs. As shown in Fig. 2i, analog hardware weights follow a mixture of three quasi-normal distributions, with pruned pairs exhibiting a mean conductance of ~0.07 μS and low variance (see Supplementary Figs. 5–7 and Notes for robustness studies). Figure 2j demonstrates data retention in resistive memory cells, with minimal conductance fluctuation over 10,000 read cycles (0.1 V amplitude, 2 s width), which mitigates overfitting during training as discussed later (see Supplementary Fig. 2 for 150 °C baking retention tests).

Image classification of FashionMNIST using the co-design

The co-design was evaluated on a 4-layer convolutional neural network (CNN)⁶⁹—a standard vision model-for classifying the FashionMNIST dataset (simulations were also performed with ConvMixer⁷⁰ on CIFAR-10 and ResNet-50⁷¹ on ImageNet100 to further demonstrate applicability; see Supplementary Figs. 11–12 and Supplementary Table 7).

Figure 3a illustrates example feature maps in garment classification using edge pruning topology optimization on the hybrid analog-digital system. The FashionMNIST dataset comprises 70,000 frontal images across 10 garment categories. Test images are down-sampled to 14 × 14, quantized to 4 bits, and input to a randomly weighted 4-layer CNN with two convolutional and two fully connected layers (see “Methods” for CNN details). Initialized with 62 K random weights (124 K resistive memory cells), the model shows differential conductance heatmaps after electroforming and topology optimization in the upper schematic. Figure 3b depicts the corresponding distributions: post-electroforming weights follow a mixture of two quasi-normal distributions with means of −27.1 μS and 27.2 μS; after optimization, half the pairs are pruned (sparsity of 0.5), adding a third distribution with a mean of −0.05 μS (see Supplementary Fig. 6 for CNN hyperparameter studies). Figure 3c visualizes 3D principal component analysis (PCA) of embedded features for the classification head before and after edge pruning topology optimization, with points color-coded by garment category. Topology optimization transforms overlapping embeddings into distinct clusters, yielding discriminative features.

**Fig. 3: Experimental image classification on the FashionMNIST dataset using the co-design.**

As shown in Fig. 3d, the experimental edge pruning topology optimization (hardware TO) achieves a classification accuracy of 87.4% compared with 79.7% of hardware weight optimization (hardware WO) with free updates, as the latter is affected by programming noise. This corresponds to a 0.4% accuracy difference compared to the software weight optimization (software WO) baseline on GPU. The edge pruning topology optimization also exhibits higher learning efficiency, showing a 17.3% accuracy margin over weight optimization with gradient threshold under the same budget of weight updates (see Methods for weight optimization details). Additionally, hardware-in-the-loop TO improves accuracy by 0.9% over software TO on GPU, mainly by mitigating overfitting via RM read noise (see Supplementary Fig. 9 for read noise impacts). This accuracy is confirmed by the predominantly diagonal confusion matrix in Fig. 3e. Figure 3f compares training complexity: TO reduces hardware weight updates in convolutional and fully connected layers by 99.74% and 99.98% on average (for fair comparison, protocols are initialized with similar parameter sizes and terminated at comparable accuracy; see Methods and Supplementary Data 1). Figure 3g contrasts single-image inference energy between the hybrid system (~3.67 μJ) and GPU (~5.76 μJ). Scaling the 40 nm design to 5 nm (matching GPU node) reduces hybrid energy to ~1.25 μJ, yielding 78.3% savings. The right panel highlights TO’s sparsity benefit, cutting RM forward-pass energy to ~12.94 nJ versus ~18.65 nJ for WO, demonstrating 30.6% energy saving (see Supplementary Tables 4, 9–10, Supplementary Note 1, and Supplementary Data 4 for energy estimation details).

Audio classification for Spoken Digit with the co-design

In the second experiment, the co-design was applied to audio classification using a convolutional recurrent neural network (CRNN)⁷²—a standard model for extracting spatial and temporal audio features via convolutional and recurrent layers. The Spoken Digit dataset⁷³ was used, which includes 3000 recordings from 6 speakers sampled at 8 kHz.

Figure 4a illustrates experimental forward-pass feature maps on the hybrid analog-digital system. Spoken digits are transformed to the frequency domain and converted into 23 × 15 acoustic feature maps, then fed into a randomly weighted 5-layer CRNN with 2 convolutional layers, 1 recurrent layer, and 2 fully connected layers (see “Methods” for CRNN details). The model incorporates 68.5 K stochastic weights, realized via 137 K randomly initialized resistive memory cells. Hardware weight heatmaps after electroforming and edge pruning are shown in the upper schematic. After electroforming, weights in convolutional, recurrent, and fully connected layers follow a mixture of two quasi-normal distributions with means of −27.3 μS and 27.1 μS (Fig. 4b, left). During topology optimization, half the differential pairs (sparsity of 0.5) are reset to near-zero conductance, with remaining cells fixed, resulting in three quasi-normal distributions (Fig. 4b, right). Figure 4c visualizes 3D PCA of classification-head embeddings after electroforming and topology optimization, with points color-coded by digit category. Similar to image classification, initially overlapping embeddings form distinct clusters post-optimization, indicating discriminative features.

**Fig. 4: Experimental audio classification on the Spoken Digit dataset using the co-design.**

As shown in Fig. 4d, hardware edge pruning topology optimization achieves 98.1% accuracy versus 90.8% for hardware-in-the-loop weight optimization, affected by programming stochasticity-revealing a 0.2% difference relative to the software baseline. Like the image task, hardware topology optimization mitigates overfitting using inherent RM read noise (see Supplementary Fig. 7 for CRNN hyperparameter studies). Performance is confirmed by the confusion matrix in Fig. 4e, featuring prominent diagonals. Figure 4f shows that topology optimization reduces hardware weight updates versus unrestricted weight optimization, with average reductions of 99.64%, 99.97%, and 99.96% in convolutional, recurrent, and fully connected layers, respectively. In Fig. 4g, the energy per spoken-digit inference is ~2.02 μJ on the hybrid system (scaling to ~0.68 μJ at 5 nm) versus ~2.15 μJ on a GPU, corresponding to a 67.9% reduction. The right panel further highlights sparsity benefits, with topology optimization consuming ~4.79 nJ compared to ~6.46 nJ for weight optimization (a 25.9% reduction).

Image segmentation of DRIVE with the co-design

In addition to classification, edge pruning was evaluated on biomedical image segmentation using a U-Net^74,75 for the DRIVE⁷⁶ dataset. DRIVE contains 40 retinal fundus images (565 × 584) for blood vessel segmentation, including 7 images with pathological cases. As illustrated in Fig. 5a, each 584 × 565 image is first partitioned into 96 × 96 patches, which are then processed by a U-Net comprising 2 convolutional layers (input and output), 4 contracting layers (D1-D4), and 4 expansive layers (U1-U4). The network outputs segmented patches of the same size, which are concatenated to form the final segmentation. Figure 5b shows the 768 × 768 simulated weights of the D4 layer after electroforming and pruning. Initial weights are sampled from the measured conductance distribution of electroformed resistive-memory differential pairs. Following topology optimization, 50% of the weights are pruned according to a predefined sparsity of 0.5 (white pixels), while the remaining weights are drawn from the conducting-state distribution. Representative segmentation results are shown in Fig. 5c, where the simulated probability maps, binary predictions, and ground truth (from left to right) exhibit close agreement, delineating major vessels and fine capillaries (see Supplementary Fig. 8 for U-Net hyperparameter studies). Figure 5d, e compares the precision-recall (PR) and ROC curves for software and simulated WO versus simulated edge-pruning TO. The simulated TO achieves PR-AUC and ROC-AUC of 0.91 and 0.97, respectively-only 0.01 below the software WO baseline, yet yielding 9.8% and 2.0% improvements over simulated WO, owing to its robustness to programming stochasticity. The corresponding F1-score (F1) and AUC evolution during training are plotted in Fig. 5f, where simulated TO gradually converges toward the software WO baselines. The confusion matrix in Fig. 5g further validates performance: the simulated U-Net with the co-design attains 97% background and 83% vessel pixel accuracy, closely matching the software WO baseline of 98% and 80% (see Methods for PR, AUC, F1, and accuracy definitions). Figure 5h shows the estimated energy consumption of the GPU computing unit and the RM core for inferring a single image. The energy consumption for the RM core is approximately 3.5 μJ while that of the GPU computing unit is approximately 1339.5 μJ, resulting in a 99.7% energy saving enabled by the co-design.

**Fig. 5: Simulated image segmentation on the DRIVE dataset using the co-design.**

Discussion

In this work, a software-hardware co-designed edge pruning topology optimization for randomly weighted resistive memory neural networks was developed to tackle challenges in implementing AI via analog computing with emerging resistive memory. Hardware-wise, the intrinsic stochasticity of resistive memory electroforming is leveraged to generate large-scale, low-cost random weights and optimize network topology directly through reset operations. This approach avoids precise conductance tuning and provides a time- and energy-efficient, robust method to harness analog in-memory computing benefits for AI. Software-wise, edge pruning topology optimization utilizes true random weights from resistive memory arrays to initialize overparameterized networks and reduces programming overhead by eliminating redundant connections. Integrated with sparse weight mapping⁷⁷, this method further minimizes memory footprint (see Supplementary Table 2). This co-design addresses the primary obstacles of programming stochasticity and cost in analog computing, laying the foundation for the next generation of AI hardware with high energy efficiency (see Supplementary Table 6 for edge pruning topology optimization works for other commonly used analog computing devices).

Methods

Fabrication of resistive memory chip

Under the 40 nm technology node, the fabricated resistive memory chip integrates a 512 × 512 crossbar array, with cells formed between the metal-4 and metal-5 layers using a backend-of-line process. Each cell consists of bottom and top electrodes (BE and TE) and a transition-metal-oxide dielectric layer. The BE via (60 nm diameter) is defined by photolithography and etching, filled with TaN by physical vapor deposition, and capped with a 10 nm TaN buffer layer. A 5 nm Ta layer is then deposited and oxidized to form an 8 nm TaO_x dielectric layer. The TE is realized by sequential PVD deposition of 3 nm Ta and 40 nm TiN. After cell formation, the remaining interconnect metals are completed using a standard logic process. Cells in the same row share a common BE line, whereas cells in the same column share a common TE line. Following a 30-min post-anneal at 400 °C in vacuum, the 40 nm resistive memory chip exhibits excellent properties, including high yield and robust endurance (see Supplementary Figs. 1,2 for detailed device characteristics).

Hybrid analog-digital hardware system

The hybrid analog-digital hardware system (see Supplementary Fig. 4) comprises a 40 nm resistive memory chip and a Xilinx ZYNQ SoC, which integrates a field-programmable gate array (FPGA) and an ARM processor on a printed circuit board (PCB). The resistive memory chip operates in three modes under the edge-pruning topology optimization scheme: an electroform mode for generating random conductance weights, a reset mode for pruning selected weights, and a multiplication mode for vector-matrix operations. In electroform mode, dielectric breakdown is induced in the resistive memory array to form random conductance matrices: all source lines (SLs) are biased to a fixed programming voltage supplied by an eight-channel 16-bit analog-to-digital converter (DAC, DAC80508, Texas Instruments), while bit lines (BLs) are grounded and word lines (WLs) are biased by the DAC to enforce a compliance current and prevent hard breakdown. The SL voltage amplitude and pulse width tune the post-breakdown conductance distribution and sparsity. In reset mode, selected cells are returned to the off state: the target BL is biased by the DAC, the corresponding SL is grounded, and the remaining SLs are left floating. In multiplication mode, a 4-channel analog multiplexer (CD4051B, Texas Instruments) controlled by an 8-bit shift register (SN74HC595, Texas Instruments) applies DC voltages to the BLs. During each training step, the resistive memory chip is read, and the vector-matrix products, encoded in the SL currents, are converted to voltages by transimpedance amplifiers (OPA4322-Q1, Texas Instruments) and digitized by 14-bit ADCs (ADS8324, Texas Instruments). The resulting data are transferred to the Xilinx SoC for further processing. The FPGA implements the control logic for driving the resistive memory and exchanges data with the ARM processor via direct memory access to DRAM. It also accelerates selected neural network operations in hardware, such as activation and pooling.

Multi-Bit vector-matrix multiplications

To perform vector-matrix multiplication, the analog input vector is first converted into an m-bit binary vector. In this process, each input element is encoded as a binary number with m bits, where m equals 4 for CNN, 3 for CRNN, and 6 for U-Net. The analog multiplication is thus approximated by m successive multiplications using binary input vectors at different significance levels. In each step, a row is biased to a small fixed voltage (e.g., 0.1 V) if the corresponding bit is “1”, and is grounded if the bit is “0”. The resulting column currents are sequentially read out through a column multiplexer. Finally, these currents are scaled by their respective bit significance and summed in the digital domain.

Reset and set operations

Edge pruning topology optimization

The weight pruning process is physically implemented by driving the corresponding resistive-memory differential pairs into the off state via reset operations, while reinstatement is achieved by returning these cells to their conducting states via set operations. The set operation is realized by applying identical pulses with a 3.3 V amplitude and 300 ns width to the BLs of the resistive-memory array, thereby switching pruned cells back to the conducting state and reinserting them into the sub-network. The reset operation is implemented by applying identical pulses with a 2.6 V amplitude and 400 ns width to the SLs, annihilating the conductive filaments and driving the cells into the off state. Importantly, there is a large margin between the off state and the conducting state, so precise conductance programming is unnecessary.

Weight optimization

For weight optimization, identical pulses with a 1.5 V amplitude and 500 ns width are used to program the resistive-memory cells. The programming current is limited by the transistor gate voltage on the WL to control the conductance distribution. Each cell in a differential pair is tuned to its target conductance using a closed-loop write scheme⁷⁸, followed by write-verify steps that ensure an approximate 10% conductance error margin.

Edge pruning topology optimization algorithm

The edge-pruning topology optimization algorithm consists of forward and backward steps. First, a randomly weighted neural network is initialized on the analog resistive-memory chip, where each weight value also serves as the score of the corresponding edge. In the forward pass, a sub-network is selected by pruning hardware weights whose scores fall below a threshold determined by a predefined sparsity (e.g., a sparsity of 0.5 corresponds to pruning 50% of the weights). The inputs are then fed through this sub-network for forward propagation and loss evaluation. In the backward pass, a general-purpose digital processor computes the gradients of the loss function to update the edge scores while keeping the weight values fixed. The scores are updated using a straight-through gradient estimator^63,79:

$${\widetilde{s}}_{ij}={s}_{ij}-\eta \,\frac{\partial L}{\partial {I}_{j}}\,{w}_{ij}{Z}_{i},$$

(1)

where s_ij and ${\widetilde{s}}_{ij}$ denote the edge scores between the hidden nodes i and j before and after the update, respectively, η is the learning rate, $\frac{\partial L}{\partial {I}_{j}}$ is the partial derivative of the loss L with respect to the input of node j (I_j), and w_ijZ_i is the weighted output of node i. These processes are repeated until a well-performing sub-network is selected from the randomly initialized neural network.

Threshold learning rule

To improve the energy and time efficiency during the training process, we follow Yao’s work⁸⁰ and use the same threshold learning rule to reduce unnecessary programming of resistive memory.

Edge pruning topology optimization

$$\Delta s=\left\{\begin{array}{ll}\Delta s,& \,{{\rm{if}}}\,\,\left|\Delta s\right|\ge {T}_{s}(t),\\ 0,& \,{{\rm{otherwise}}}\,,\end{array}\right.$$

(2)

where $\Delta s$ is the edge-score update, and ${T}_{s}\left(t\right)$ is the dynamic threshold updated by the following rule:

$${T}_{s}(t)=\left\{\begin{array}{ll}{T}_{s}^{{{\rm{init}}}},\hfill& \,{{\rm{if}}}\,\,t=0,\\ {T}_{s}(t-1)-\frac{{T}_{s}^{{{\rm{init}}}}-{T}_{s}^{{{\rm{end}}}}}{\alpha },& \,{{\rm{otherwise}}}\,,\end{array}\right.$$

(3)

where ${T}_{s}^{{{\rm{init}}}}$, ${T}_{s}^{{{\rm{end}}}}$, and α represent the pre-designed initial threshold, end threshold, and update step, respectively. When a new best accuracy is achieved, the value of t is increased by 1. This decaying threshold speeds up model convergence and is inspired by the learning-rate decay technique⁸¹.

Weight optimization

$$\Delta w=\left\{\begin{array}{ll}\Delta w,& \,{{\rm{if}}}\,\,\left|\Delta w\right|\ge {T}_{w},\\ 0,& \,{{\rm{otherwise}}}\,,\end{array}\right.$$

(4)

where $\Delta w$ represents the weight update, and T_w denotes the pre-designed threshold constant used to ascertain whether a particular cell requires programming. When T_w equals zero, each cell is programmed during every training step, corresponding to weight optimization with free updates. By increasing the value of T_w, the frequency of cell programming decreases, and weight optimization with limited updates is obtained when its programming count is similar to that in the edge pruning topology optimization.

Details of the experimental neural networks

When running randomly weighted neural networks, the analog resistive-memory in-memory computing chip accelerates the most computationally intensive vector-matrix multiplications, while the remaining max-pooling, normalization, and activation operations are executed on the Xilinx SoC. Training hyperparameters are summarized in Supplementary Data 1–3.

CNN backbone

As shown in Fig. 3a, the 4-layer CNN consists of 2 convolutional (C) layers and 2 fully connected (F) layers. The C1 layer uses kernels of size 64 × 1 × 3 × 3 (out channels × in channels × width × height) and produces 64 × 12 × 12 feature maps. The C2 layer uses kernels of size 16 × 64 × 3 × 3 and generates 16 × 10 × 10 feature maps. These maps are then down-sampled by a max-pooling layer with a 2 × 2 kernel and stride 2. The resulting 16 × 5 × 5 feature maps are flattened into a 400-element vector and fed into the F1 layer. F1 has a weight matrix of size 128 × 400 and reduces the input to a 128-element vector. Finally, this vector is passed to the F2 layer with 10 × 128 weights to produce 10 output probabilities corresponding to the image labels.

CRNN backbone

Spoken digits are first transformed into the frequency domain using mel-frequency cepstral coefficients (MFCCs)⁷², yielding cepstral feature maps suitable for recognition and classification. As shown in Fig. 4a, these 23 × 15 feature maps are then fed into a 5-layer CRNN comprising 2 convolutional layers, 1 recurrent (R) layer, and 2 fully connected layers. The C1 layer uses kernels of size 64 × 1 × 3 × 2, followed by a max-pooling layer that produces 64 × 11 × 14 feature maps. The C2 layer uses kernels of size 32 × 64 × 3 × 2, followed by a maxpooling layer that outputs 32 × 2 × 8 feature maps. These maps are then passed to the R1 layer:

$$h(t)=\left\{\begin{array}{ll}0,\hfill& \,{{\rm{if}}}\,\,t=0,\\ \tanh \left({w}_{ih}{x}_{t}+{w}_{hh}h(t-1)\right),& \,{{\rm{otherwise}}}\,,\end{array}\right.$$

(5)

where t, h(t), w_ih, and w_hh denote the time step, hidden state, input-hidden weights (128 × 32), and hidden-hidden recurrent weights (128 × 128), respectively. The time step increases as the model iterates, and when it reaches 4, the final output of the R1 layer is obtained by

$$\,{{\rm{recurrent\; output}}}\,=\frac{{\sum }_{t=1}^{4}h(t)}{4}.$$

(6)

Subsequently, the recurrent output with 128 elements is fed into the F1 (256 × 128) and F2 (10 × 256) layers to obtain the 10 probability outputs that determine the label of the input spoken digit.

U-Net backbone

As shown in Fig. 5a, the U-Net architecture consists of two main parts: a contracting path (down-sampling layers, D) and an expansive path (up-sampling layers, U). Each down-sampling layer applies two 3 × 3 convolutions, each followed by a rectified linear unit (ReLU) activation, and a 2 × 2 max-pooling operation. This progressively extracts higher-level features from the input, enabling the network to capture critical structures and patterns. Each up-sampling layer consists of a bilinear interpolation step, concatenation with the corresponding cropped feature maps from the contracting path, and two subsequent 3 × 3 convolutions, each followed by ReLU. This design facilitates precise localization and accurate segmentation of objects within the image.

During simulations, retinal blood vessel images of size 584 × 565 pixels are first split into 26,400 overlapping 96 × 96 patches. From these, 6400 patches are randomly selected for training, and the remaining 20,000 patches are used for testing. Each patch is fed into an input convolutional layer with kernel size 32 × 1 × 3 × 3 for initial feature extraction, and the resulting feature maps are processed by four down-sampling layers (D1–D4) and four up-sampling layers (U1–U4) for vessel segmentation. The input channel numbers for D1–D4 are 32, 64, 128, and 256, respectively, and for U1–U4 are 256, 128, 64, and 32. An output convolutional layer with kernel size 2 × 32 × 1 × 1 finally maps the U4 feature maps to segmentation probability maps for each patch. The full segmented image is then reconstructed by concatenating all processed patches.

Image segmentation indicators

To demonstrate the performance of U-Net in the DRIVE image segmentation task, several indicators are evaluated, including true positive rate (TPR), false positive rate (FPR), recall, precision, F1 score (F1), and accuracy, as follows:

$${{\rm{TPR}}}={{\rm{Recall}}}=\frac{{{\rm{TP}}}}{{{\rm{TP}}}+{{\rm{FN}}}},$$

(7)

$${{\rm{FPR}}}=\frac{{{\rm{FP}}}}{{{\rm{TN}}}+{{\rm{FP}}}},$$

(8)

$${{\rm{Precision}}}=\frac{{{\rm{TP}}}}{{{\rm{TP}}}+{{\rm{FP}}}},$$

(9)

$${{\rm{F1}}}=2\times \frac{{{\rm{Precision}}}\times {{\rm{Recall}}}}{{{\rm{Precision}}}+{{\rm{Recall}}}},$$

(10)

$${{\rm{Accuracy}}}=\frac{{{\rm{TP}}}+{{\rm{TN}}}}{{{\rm{TP}}}+{{\rm{FP}}}+{{\rm{TN}}}+{{\rm{FN}}}},$$

(11)

Here, Recall, Precision, F1, and Accuracy are widely used performance metrics for evaluating image segmentation tasks. Recall evaluates the U-Net’s ability to identify all relevant instances, Precision measures the correctness of positive predictions, F1 provides a balanced score ranging from 0 to 1, with higher values representing better classification performance, and Accuracy indicates the proportion of correct predictions made out of the total number of predictions. These metrics help assess classification performance accurately.

Data availability

All data supporting the findings of this study are provided in the main text and the Supplementary Information. Processed datasets are available in the GitHub⁸².

Code availability

The code supporting the findings of this study is available at the GitHub⁸².

References

Vaswani, A. et al. Attention is all you need. Preprint at https://doi.org/10.48550/arXiv.1706.03762 (2017).
Wolf, T. et al. Huggingface’s transformers: state-of-the-art natural language processing. Preprint at https://doi.org/10.48550/arXiv.1910.03771 (2019).
Strubell, E., Ganesh, A. & McCallum, A. Energy and policy considerations for deep learning in NLP. Preprint at https://doi.org/10.48550/arXiv.1906.02243 (2019).
Henderson, P. et al. Towards the systematic reporting of the energy and carbon footprints of machine learning. J. Mach. Learn. Res. 21, 1–43 (2020).
MathSciNet Google Scholar
Copeland, J., Bowen, J., Sprevak, M. & Wilson, R. The Turing Guide (Oxford University Press, 2017).
Chua, L. Memristor-the missing circuit element. IEEE Trans. Circuit Theory 18, 507–519 (1971).
Article Google Scholar
Strukov, D. B., Snider, G. S., Stewart, D. R. & Williams, R. S. The missing memristor found. Nature 453, 80–83 (2008).
Article ADS CAS PubMed Google Scholar
Huh, W., Lee, D. & Lee, C.-H. Memristors based on 2d materials as an artificial synapse for neuromorphic electronics. Adv. Mater. 32, 2002092 (2020).
Article CAS Google Scholar
Lu, Y. & Yang, Y. Memory augmented factorization for holographic representation. Nat. Nanotechnol. 18, 442–443 (2023).
Article ADS CAS PubMed Google Scholar
Zhang, W. et al. Edge learning using a fully integrated neuro-inspired memristor chip. Science 381, 1205–1211 (2023).
Article ADS CAS PubMed Google Scholar
Chen, W.-H. et al. Cmos-integrated memristive non-volatile computing-in-memory for ai edge processors. Nat. Electron. 2, 420–428 (2019).
Article CAS Google Scholar
Joshi, V. et al. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11, 2473 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Karunaratne, G. et al. Robust high-dimensional memory-augmented neural networks. Nat. Commun. 12, 2468 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, C. et al. Long short-term memory networks in memristor crossbar arrays. Nat. Mach. Intell. 1, 49–57 (2019).
Article Google Scholar
Li, H. et al. Sapiens: A 64-kb rram-based non-volatile associative memory for one-shot learning and inference at the edge. IEEE Trans. Electron Devices 68, 6637–6643 (2021).
Article ADS Google Scholar
Liu, Z. et al. Neural signal analysis with memristor arrays towards high-efficiency brain-machine interfaces. Nat. Commun. 11, 4234 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Milano, G. et al. In materia reservoir computing with a fully memristive architecture based on self-organizing nanowire networks. Nat. Mater. 21, 195–202 (2021).
Article ADS PubMed Google Scholar
Wang, Z. et al. Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing. Nat. Mater. 16, 101–108 (2016).
Article ADS PubMed Google Scholar
Waser, R., Dittmann, R., Staikov, G. & Szot, K. Redox-based resistive switching memories-nanoionic mechanisms, prospects, and challenges. Adv. Mater. 21, 2632–2663 (2009).
Article CAS PubMed Google Scholar
Yang, J. J., Strukov, D. B. & Stewart, D. R. Memristive devices for computing. Nat. Nanotechnol. 8, 13–24 (2013).
Article ADS CAS PubMed Google Scholar
Zidan, M. A., Strachan, J. P. & Lu, W. D. The future of electronics based on memristive systems. Nat. Electron. 1, 22–29 (2018).
Article Google Scholar
Kuzum, D., Yu, S. & Wong, H. P. Synaptic electronics: materials, devices and applications. Nanotechnology 24, 382001 (2013).
Article PubMed Google Scholar
Hu, M. et al. Memristor-based analog computation and neural network classification with a dot product engine. Adv. Mater. 30, 1705914 (2018).
Article ADS Google Scholar
Xi, Y. et al. In-memory learning with analog resistive switching memory: a review and perspective. Proc. IEEE 109, 14–42 (2020).
Article ADS Google Scholar
McKee, S. A. Reflections on the memory wall. In Proc. 1st Conference on Computing Frontiers 162 (ACM, 2004).
Kuroda, T. CMOS design challenges to power wall. In Proc. Digest of Papers. Microprocesses and Nanotechnology 2001. 2001 International Microprocesses and Nanotechnology Conference (IEEE Cat. No. 01EX468) 6–7 (IEEE, 2001).
Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In Proc. International Solid-state Circuits Conference Digest Of Technical Papers (ISSCC) 10−14 (IEEE, 2014).
Theis, T. N. & Wong, H.-S. P. The end of Moore’s law: a new beginning for information technology. Comput. Sci. Eng. 19, 41–50 (2017).
Article Google Scholar
Schaller, R. R. Moore’s law: past, present and future. IEEE Spectrum 34, 52–59 (1997).
Article Google Scholar
Shalf, J. M. & Leland, R. Computing beyond Moore’s law. Computer 48, 14–23 (2015).
Article ADS Google Scholar
Shalf, J. The future of computing beyond Moore’s law. Philos. Trans. R. Soc. A 378, 20190061 (2020).
Article ADS MathSciNet Google Scholar
Wong, H.-S. P. et al. Phase change memory. Proc. IEEE 98, 2201–2227 (2010).
Article ADS Google Scholar
Koelmans, W. W. et al. Projected phase-change memory devices. Nat. Commun. 6, 8181 (2015).
Article ADS PubMed Google Scholar
Soni, R. et al. Giant electrode effect on tunnelling electroresistance in ferroelectric tunnel junctions. Nat. Commun. 5, 5414 (2014).
Article ADS PubMed Google Scholar
Xi, Z. et al. Giant tunnelling electroresistance in metal/ferroelectric/semiconductor tunnel junctions by engineering the Schottky barrier. Nat. Commun. 8, 15217 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Wen, Z., Li, C., Wu, D., Li, A. & Ming, N. Ferroelectric-field-effect-enhanced electroresistance in metal/ferroelectric/semiconductor tunnel junctions. Nat. Mater. 12, 617–621 (2013).
Article ADS CAS PubMed Google Scholar
Sheridan, P. M. et al. Sparse coding with memristor networks. Nat. Nanotechnol. 12, 784–789 (2017).
Article ADS CAS PubMed Google Scholar
Shi, Y. et al. Neuroinspired unsupervised learning and pruning with subquantum cbram arrays. Nat. Commun. 9, 5312 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Song, L., Zhuo, Y., Qian, X., Li, H. & Chen, Y. Graphr: accelerating graph processing using ReRAM. In Proc. IEEE Symposium on High-Performance Computer Architecture (IEEE, 2018).
Tsai, H. et al. Inference of long-short term memory networks at software-equivalent accuracy using 2.5m analog phase change memory devices. In Proc. Symposium on VLSI Technology (IEEE, 2019).
Wan, W. et al. 33.1 a 74 TMACS/W CMOS-RRAM neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In Proc. International Conference on Solid-State Circuits (ISSCC) (IEEE, 2020).
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Mixed-precision continual learning based on computational resistance random access memory. Adv. Intell. Syst. 4, 2200026 (2022).
Article Google Scholar
Zhang, W. et al. Few-shot graph learning with robust and energy-efficient memory-augmented graph neural network (magnn) based on homogeneous computing-in-memory. In Proc. Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 224–225 (IEEE, 2022).
Yuan, R. et al. A neuromorphic physiological signal processing system based on VO2 memristor for next-generation human-machine interface. Nat. Commun. 14, 3695 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Sun, W. et al. Understanding memristive switching via in situ characterization and device modeling. Nat. Commun. 10, 3453 (2019).
Article ADS PubMed PubMed Central Google Scholar
Yang, Y. et al. Observation of conducting filament growth in nanoscale resistive memories. Nat. Commun. 3, 732 (2012).
Article ADS PubMed Google Scholar
Ambrogio, S. et al. Statistical fluctuations in HfO_x resistive-switching memory: part i-set/reset variability. IEEE Trans. Electron Devices 61, 2912–2919 (2014).
Article ADS CAS Google Scholar
Dalgaty, T. et al. In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling. Nat. Electron. 4, 151–161 (2021).
Article Google Scholar
Burr, G. W. et al. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. IEEE Trans. Electron Devices 62, 3498–3507 (2015).
Article ADS Google Scholar
Wang, S. et al. Echo state graph neural networks with analogue random resistive memory arrays. Nat. Mach. Intell. 5, 104–113 (2023).
Article Google Scholar
Li, Y. et al. An ADC-less RRAM-based computing-in-memory macro with binary CNN for efficient edge AI. In Proc. Transactions on Circuits and Systems II: Express Briefs (IEEE, 2023).
Wang, Z. et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron. 1, 137–145 (2018).
Article Google Scholar
Chih, Y.-D. et al. 16.4 an 89TOPS/W and 16.3TOPS/mm² all-digital SRAM-based full-precision compute-in memory macro in 22 nm for machine-learning edge applications. In Proc. International Solid-State Circuits Conference (ISSCC) Vol. 64, 252–254 (IEEE, 2021).
Wong, H.-S. P. et al. Metal–oxide RRAM. Proc. IEEE 100, 1951–1970 (2012).
Article ADS CAS Google Scholar
Lu, Y. et al. Accelerated local training of CNNs by optimized direct feedback alignment based on stochasticity of 4 mb c-doped Ge₂Sb₂Te₅ PCM chip in 40 nm node. In Proc. International Electron Devices Meeting (IEDM) (IEEE, 2020).
Marcus, C. & Westervelt, R. Stability of analog neural networks with delay. Phys. Rev. A 39, 347 (1989).
Article ADS MathSciNet CAS Google Scholar
Sakai, J. How synaptic pruning shapes neural wiring during development and, possibly, in disease. Proc. Natl. Acad. Sci. USA 117, 16096–16099 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Sretavan, D. & Shatz, C. J. Prenatal development of individual retinogeniculate axons during the period of segregation. Nature 308, 845–848 (1984).
Article ADS CAS PubMed Google Scholar
Hung, J.-M. et al. An 8-mb dc-current-free binary-to-8b precision reram nonvolatile computing-in-memory macro using time-space-readout with 1286.4-21.6 TOPS/W for edge-ai devices. In Proc. International Conference on Solid-State Circuits (ISSCC) (IEEE, 2022).
Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64 (2015).
Article ADS CAS PubMed Google Scholar
Tang, J. et al. Bridging biological and artificial neural networks with emerging neuromorphic devices: fundamentals, progress, and challenges. Adv. Mater. 31, 1902761 (2019).
Article ADS CAS Google Scholar
Ramanujan, V., Wortsman, M., Kembhavi, A., Farhadi, A. & Rastegari, M. What’s hidden in a randomly weighted neural network? In Proc. Conference on Computer Vision And Pattern Recognition 11893–11902 (IEEE, 2020).
Dubey, A. et al. The llama 3 herd of models. Preprint at https://doi.org/10.48550/arXiv.2309.16609 (2024).
Bai, J. et al. Qwen technical report. arXiv preprint arXiv:2309.16609 (2023).
Hu, E. J. et al. Lora: low-rank adaptation of large language models. Preprint at https://doi.org/10.48550/arXiv.2106.09685 (2021).
Park, G.-S. et al. In situ observation of filamentary conducting channels in an asymmetric Ta₂O_5−x/TaO_2−x bilayer structure. Nat. Commun. 4, 2382 (2013).
Article ADS PubMed Google Scholar
Li, C. et al. Direct observations of nanofilament evolution in switching processes in HfO₂-based resistive random access memory by in situ TEM studies. Adv. Mater. 29, 1602976 (2017).
Article Google Scholar
Kadam, S. S., Adamuthe, A. C. & Patil, A. B. CNN model for image classification on MNIST and fashion-MNIST dataset. J. Sci. Res. 64, 374–384 (2020).
Google Scholar
Trockman, A. & Kolter, J. Z. Patches are all you need? Preprint at https://doi.org/10.48550/arXiv.2201.09792 (2022).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. Conference on Computer Vision and Pattern Recognition 770-778 (IEEE, 2016).
Choi, K., Fazekas, G., Sandler, M. & Cho, K. Convolutional recurrent neural networks for music classification. In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2392–2396 (IEEE, 2017).
Sun, W. et al. High area efficiency (6 tops/mm 2) multimodal neuromorphic computing system implemented by 3d multifunctional RAM array. In Proc. International Electron Devices Meeting (IEDM) 1–4 (IEEE, 2023).
Huang, H. et al. Unet 3+: A full-scale connected UNet for medical image segmentation. In Proc.International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1055–1059 (IEEE, 2020).
Zhuang, J. Laddernet: multi-path networks based on U-Net for medical image segmentation. Preprint at https://doi.org/10.48550/arXiv.1810.07810 (2018).
Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A. & Van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 23, 501–509 (2004).
Article ADS PubMed Google Scholar
Lin, J., Zhu, Z., Wang, Y. & Xie, Y. Learning the sparsity for RRAM: mapping and pruning sparse neural network for RRAM-based accelerator. In Proc. 24th Asia and South Pacific Design Automation Conference 639–644 (ACM, 2019).
Yao, P. et al. Face classification using electronic synapses. Nat. Commun. 8, 15199 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Bengio, Y., Léonard, N. & Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. Preprint at https://doi.org/10.48550/arXiv.1308.3432 (2013).
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Article ADS CAS PubMed Google Scholar
You, K., Long, M., Jordan, M. I. & Wang, J. Learning stages: phenomenon, root cause, mechanism hypothesis, and implications. Preprint at https://doi.org/10.48550/arXiv.1908.01878 (2019).
Yi, L. Code for “pruning random resistive memory for optimizing analog AI". https://github.com/lyd126/Pruning_random_resistive_memory_for_optimizing_analogue_AI (2024).

Download references

Acknowledgements

This work was supported in part by the Innovation 2030 for Science and Technology (Grant No. 2021ZD0201203), the National Natural Science Foundation of China (Grant Nos. 62374181, U2341218, 92464201, 62488101, and 62322412), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA0330100), the Hong Kong Research Grants Council (Grant Nos. 17212923, C1009-22G, C7003-24Y, and AOE/E-101/23-N), and the Shenzhen Science and Technology Innovation Commission (Grant No. SGDX20220530111405040).

Author information

These authors contributed equally: Yi Li, Songqi Wang.

Authors and Affiliations

Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China
Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Bo Wang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Xiaojuan Qi & Han Wang
Research Center of Microelectronic Device and Integration Technology, Institute of Microelectronics, Chinese Academy of Sciences, Beijing, China
Yi Li, Songqi Wang, Shaocong Wang, Woyu Zhang, Ning Lin, Feng Zhang, Xiaoxin Xu, Dashan Shang, Qi Liu & Ming Liu
School of Microelectronics, Southern University of Science and Technology, Shenzhen, China
Yi Li, Songqi Wang & Zhongrui Wang
State Key Laboratory of Fabrication Technologies for Integrated Circuits, Institute of Microelectronics, Chinese Academy of Sciences, Beijing, China
Yi Li, Woyu Zhang, Feng Zhang, Xiaoxin Xu & Dashan Shang
Center for Advanced Semiconductor and Integrated Circuit, The University of Hong Kong, Hong Kong, China
Yi Li, Songqi Wang, Yangu He & Han Wang
University of Chinese Academy of Sciences, Beijing, China
Woyu Zhang, Feng Zhang, Xiaoxin Xu & Dashan Shang
Frontier Institute of Chip and System, Fudan University, Shanghai, China
Hao Jiang, Xumeng Zhang, Qi Liu & Ming Liu
College of Computer Science and Technology, Zhejiang University, Zhejiang, China
Peng Lin
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Kwang-Ting Cheng

Authors

Yi Li
View author publications
Search author on:PubMed Google Scholar
Songqi Wang
View author publications
Search author on:PubMed Google Scholar
Yaping Zhao
View author publications
Search author on:PubMed Google Scholar
Shaocong Wang
View author publications
Search author on:PubMed Google Scholar
Bo Wang
View author publications
Search author on:PubMed Google Scholar
Woyu Zhang
View author publications
Search author on:PubMed Google Scholar
Yangu He
View author publications
Search author on:PubMed Google Scholar
Ning Lin
View author publications
Search author on:PubMed Google Scholar
Binbin Cui
View author publications
Search author on:PubMed Google Scholar
Xi Chen
View author publications
Search author on:PubMed Google Scholar
Shiming Zhang
View author publications
Search author on:PubMed Google Scholar
Hao Jiang
View author publications
Search author on:PubMed Google Scholar
Peng Lin
View author publications
Search author on:PubMed Google Scholar
Xumeng Zhang
View author publications
Search author on:PubMed Google Scholar
Feng Zhang
View author publications
Search author on:PubMed Google Scholar
Xiaojuan Qi
View author publications
Search author on:PubMed Google Scholar
Zhongrui Wang
View author publications
Search author on:PubMed Google Scholar
Xiaoxin Xu
View author publications
Search author on:PubMed Google Scholar
Dashan Shang
View author publications
Search author on:PubMed Google Scholar
Qi Liu
View author publications
Search author on:PubMed Google Scholar
Han Wang
View author publications
Search author on:PubMed Google Scholar
Kwang-Ting Cheng
View author publications
Search author on:PubMed Google Scholar
Ming Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.W. and Y.L. conceived the work. Y.L., So.W., Y.Z., Sh.W., B.W., W.Z., and Y.H. contributed to the design and development of the models, software, and hardware experiments. Y.L., So.W., Y.Z., N.L., B.C., X.C., and Z.W. interpreted, analyzed, and presented the experimental results. Y.L., So.W., and Z.W. wrote the manuscript. Z.W., X.X., and D.S. supervised the project. All authors discussed the results and implications and commented on the manuscript at all stages.

Corresponding authors

Correspondence to Zhongrui Wang, Xiaoxin Xu or Dashan Shang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Wang, S., Zhao, Y. et al. Pruning random resistive memory for optimizing analog AI. Nat Commun 17, 1190 (2026). https://doi.org/10.1038/s41467-025-67960-6

Download citation

Received: 24 March 2024
Accepted: 12 December 2025
Published: 10 January 2026
Version of record: 30 January 2026
DOI: https://doi.org/10.1038/s41467-025-67960-6

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Software-hardware co-design: edge-pruning topology optimization for randomly weighted resistive memory neural networks

Physical origin of resistive memory array programming stochasticity

Image classification of FashionMNIST using the co-design

Audio classification for Spoken Digit with the co-design

Image segmentation of DRIVE with the co-design

Discussion

Methods

Fabrication of resistive memory chip

Hybrid analog-digital hardware system

Multi-Bit vector-matrix multiplications

Reset and set operations

Edge pruning topology optimization

Weight optimization

Edge pruning topology optimization algorithm

Threshold learning rule

Edge pruning topology optimization

Weight optimization

Details of the experimental neural networks

CNN backbone

CRNN backbone

U-Net backbone

Image segmentation indicators

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links