A lossless and fully parallel spintronic compute-in-memory macro for artificial intelligence chips

Li, Humiao; Chai, Zheng; Dong, Weirong; He, Junjie; Peng, Ruijie; Li, Shiheng; Kong, Zhen; Yuan, Xihui; Wang, Xianwang; Yang, Zhengke; Lyu, Haoran; Yu, Haofeng; Zhou, Xue; Li, Jiamin; Zhou, Feichi; Li, Yida; Xu, Zongben; Min, Tai; Lin, Longyang

doi:10.1038/s41928-025-01479-y

Article
Published: 16 October 2025

A lossless and fully parallel spintronic compute-in-memory macro for artificial intelligence chips

Nature Electronics (2025)Cite this article

176 Accesses
15 Altmetric
Metrics details

Subjects

Abstract

Non-volatile compute-in-memory macros can reduce data transfer between processing and memory units, providing fast and energy-efficient artificial intelligence computations. However, the non-volatile compute-in-memory architecture typically relies on analogue computing, which is limited in terms of accuracy, scalability and robustness. Here we report a 64-kb non-volatile digital compute-in-memory macro based on 40-nm spin-transfer torque magnetic random-access memory technology. Our macro features in situ multiplication and digitization at the bitcell level, precision-reconfigurable digital addition and accumulation at the macro level and a toggle-rate-aware training scheme at the algorithm level. The macro supports lossless matrix–vector multiplications with flexible input and weight precisions (4, 8, 12 and 16 bits), and can achieve a software-equivalent inference accuracy for a residual network at 8-bit precision and physics-informed neural networks at 16-bit precision. Our non-volatile compute-in-memory macro has computation latencies of 7.4–29.6 ns and energy efficiencies of 7.02–112.3 tera-operations per second per watt for fully parallel matrix–vector multiplications across precision configurations ranging from 4 to 16 bits.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Motivation and overview of the nvDCIM macro.**

**Fig. 2: Overview of the IBMD bitcell.**

**Fig. 3: System architecture of the nvDCIM macro.**

**Fig. 4: Toggle-rate-aware training scheme.**

Data availability

The data that support the plots presented in this Article, as well as other findings derived from this study, are available from the corresponding authors upon reasonable request.

Code availability

Computer codes are available from the corresponding authors upon reasonable request.

References

Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105, 2295–2329 (2017).
Article Google Scholar
Xu, X. et al. Scaling for edge inference of deep neural networks. Nat. Electron. 1, 216–222 (2018).
Article Google Scholar
Di Ventra, M. & Pershin, Y. V. The parallel approach. Nat. Phys. 9, 200–202 (2013).
Article Google Scholar
Horowitz, M. 1.1 Computing’s energy problem (and what we can do about it). In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 10–14 (IEEE, 2014).
Yu, E., K, G. K., Saxena, U. & Roy, K. Ferroelectric capacitors and field-effect transistors as in-memory computing elements for machine learning workloads. Sci. Rep. 14, 9426 (2024).
Article Google Scholar
Luo, Y.-C. et al. Experimental demonstration of non-volatile capacitive crossbar array for in-memory computing. In Proc. IEEE International Electron Devices Meeting (IEDM) 21.4.1–21.4.4 (IEEE, 2021).
Slesazeck, S. et al. A 2TnC ferroelectric memory gain cell suitable for compute-in-memory and neuromorphic application. In Proc. IEEE International Electron Devices Meeting (IEDM) 38.6.1–38.6.4 (IEEE, 2019).
Chen, W.-H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat. Electron. 2, 420–428 (2019).
Article Google Scholar
Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).
Article Google Scholar
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Article Google Scholar
Lin, P. et al. Three-dimensional memristor circuits as complex neural networks. Nat. Electron. 3, 225–232 (2020).
Article Google Scholar
Hung, J.-M. et al. A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat. Electron. 4, 921–930 (2021).
Article Google Scholar
Xue, C.-X. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat. Electron. 4, 81–90 (2021).
Article Google Scholar
Huo, Q. et al. A computing-in-memory macro based on three-dimensional resistive random-access memory. Nat. Electron. 5, 469–477 (2022).
Article Google Scholar
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Article Google Scholar
Wen, T.-H. et al. Fusion of memristor and digital compute-in-memory processing for energy-efficient edge computing. Science 384, 325–332 (2024).
Article Google Scholar
Cai, H. et al. A 28 nm 2 Mb STT-MRAM computing-in-memory macro with a refined bit-cell and 22.4–41.5 TOPS/W for AI inference. In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 500–502 (IEEE, 2023).
Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601, 211–216 (2022).
Article Google Scholar
Xie, W. et al. A 709.3 TOPS/W event-driven smart vision SoC with high-linearity and reconfigurable MRAM PIM. In Proc. IEEE Symposium on VLSI Technology 1–2 (IEEE, 2023).
Deaville, P., Zhang, B. & Verma, N. A 22 nm 128-kb MRAM row/column-parallel in-memory computing macro with memory-resistance boosting and multi-column ADC readout. In Proc. IEEE Symposium on VLSI Technology 268–269 (IEEE, 2022).
Joshi, V. et al. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11, 2473 (2020).
Article Google Scholar
Le Gallo, M. et al. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. Nat. Electron. 6, 680–693 (2023).
Article Google Scholar
Khaddam-Aljameh, R. et al. HERMES-core—a 1.59-TOPS/mm² PCM on 14-nm CMOS in-memory compute core using 300-ps/LSB linearized CCO-based ADCs. IEEE J. Solid-State Circuits 57, 1027–1038 (2022).
Article Google Scholar
Narayanan, P. et al. Fully on-chip MAC at 14 nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format. IEEE Trans. Electron Devices 68, 6629–6636 (2021).
Article Google Scholar
Khwa, W.-S. et al. A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5–65.0 TOPS/W for tiny-AI edge devices. In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Sun, Z. et al. A full spectrum of computing-in-memory technologies. Nat. Electron. 6, 823–835 (2023).
Article Google Scholar
Kim, H., Yoo, T., Kim, T. T.-H. & Kim, B. Colonnade: a reconfigurable SRAM-based digital bit-serial compute-in-memory macro for processing neural networks. IEEE J. Solid-State Circuits 56, 2221–2233 (2021).
Article Google Scholar
Murmann, B. Mixed-signal computing for deep neural network inference. IEEE Trans. Very Large Scale Integr. VLSI Syst. 29, 3–13 (2021).
Article Google Scholar
Murmann, B., Verhelst, M. & Manoli, Y. Analog-to-information conversion. In NANO-CHIPS 2030: On-Chip AI for an Efficient Data-Driven World 275–292 (Springer International Publishing, 2020).
Murmann, B. A/D converter trends: power dissipation, scaling and digitally assisted architectures. In Proc. IEEE Custom Integrated Circuits Conference (CICC) 105–112 (IEEE, 2008).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).
Article Google Scholar
Chih, Y.-D. et al. An 89 TOPS/W and 16.3 TOPS/mm² all-digital SRAM-based full-precision compute-in-memory macro in 22 nm for machine-learning edge applications. In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 252–254 (IEEE, 2021).
Mori, H. et al. A 4 nm 6163 TOPS/W/b 4790 TOPS/mm²/b SRAM-based digital-computing-in-memory macro supporting bit-width flexibility and simultaneous MAC and weight update. In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 132–134 (IEEE, 2023).
Fujiwara, H. et al. A 3 nm, 32.5 TOPS/W, 55.0 TOPS/mm² and 3.78 Mb/mm² fully-digital compute-in-memory macro supporting INT12 × INT12 with a parallel-MAC architecture and foundry 6T-SRAM bit cell. In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 572–574 (IEEE, 2024).
Shih, M.-E. et al. NVE: a 3 nm 23.2 TOPS/W 12b-digital-CIM-based neural engine for high-resolution visual-quality enhancement on smart devices. In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 360–362 (IEEE, 2024).
Wang, J. et al. A 22 nm 29.3 TOPS/W end-to-end CIM-utilization-aware accelerator with reconfigurable 4D-CIM mapping and adaptive feature reuse for diverse CNNs and transformers. In Proc. IEEE Custom Integrated Circuits Conference (CICC) 1–3 (IEEE, 2025).
Lou, M. et al. Area-efficient and low-power 8T compute-SRAM bitcell design for digital compute-in-memory macros in 22 nm CMOS. IEEE Trans. Circuits Syst. II Express Briefs 72, 1459–1463 (2025).
Lu, A. et al. High-speed emerging memories for AI hardware accelerators. Nat. Rev. Electr. Eng. 1, 24–34 (2024).
Article Google Scholar
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Article Google Scholar
Chiu, Y.-C. et al. A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices. Nat. Electron. 6, 534–543 (2023).
Article Google Scholar
Yang, Z. et al. A novel computing-in-memory platform based on hybrid spintronic/CMOS memory. IEEE Trans. Electron Devices 69, 1698–1705 (2022).
Article Google Scholar
Sayadi, L., Amirany, A., Moaiyeri, M. H. & Timarchi, S. Balancing precision and efficiency: an approximate multiplier with built-in error compensation for error-resilient applications. J. Supercomput. 81, 109 (2025).
Rezaei, M., Amirany, A., Moaiyeri, M. H. & Jafari, K. A reliable non-volatile in-memory computing associative memory based on spintronic neurons and synapses. Eng. Rep. 6, e12902 (2024).
Article Google Scholar
Angizi, S., He, Z., Chen, A. & Fan, D. Hybrid spin-CMOS polymorphic logic gate with application in in-memory computing. IEEE Trans. Magn. 56, 3400215 (2020).
Article Google Scholar
Tong, Z. et al. BSTCIM: a balanced symmetry ternary fully digital in-mram computing macro for energy efficiency neural network. IEEE Trans. Circuits Syst. Regul. Pap. 71, 6114–6127 (2024).
Article Google Scholar
Mazaheri, M. M., Amirany, A. & Moaiyeri, M. H. TPCSA-MRAM: ternary precharge sense amplifier-based MRAM. IEEE Access 12, 132817–132824 (2024).
Article Google Scholar
Rajaei, R. & Amirany, A. Nonvolatile low-cost approximate spintronic full adders for computing in memory architectures. IEEE Trans. Magn. 56, 3400308 (2020).
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images (University of Toronto, 2009).
Xu, S. et al. A practical approach to flow field reconstruction with sparse or incomplete data through physics informed neural network. Acta Mech. Sin. 39, 322302 (2023).
Article Google Scholar
Chandrakasan, A. P. & Brodersen, R. W. Minimizing power consumption in digital CMOS circuits. Proc. IEEE 83, 498–523 (1995).
Article Google Scholar
Natarajarathinam, A., Zhu, R., Visscher, P. B. & Gupta, S. Perpendicular magnetic tunnel junctions based on thin CoFeB free layer and Co-based multilayer synthetic antiferromagnet pinned layers. J. Appl. Phys. 111, 07C918 (2012).
Article Google Scholar
Song, J., Dixit, H., Behin-Aein, B., Kim, C. H. & Taylor, W. Impact of process variability on write error rate and read disturbance in STT-MRAM devices. IEEE Trans. Magn. 56, 3400411 (2020).
Article Google Scholar
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article Google Scholar
Deng, L. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag. 29, 141–142 (2012).
Article Google Scholar
Yang, J. et al. TIMAQ: a time-domain computing-in-memory-based processor using predictable decomposed convolution for arbitrary quantized DNNs. IEEE J. Solid-State Circuits 56, 3021–3038 (2021).
Article Google Scholar
Jain, S., Lin, L. & Alioto, M. ±CIM SRAM for signed in-memory broad-purpose computing from DSP to neural processing. IEEE J. Solid-State Circuits 56, 2981–2992 (2021).
Article Google Scholar
Yoshioka, K. A 818–4,094 TOPS/W capacitor-reconfigured CIM macro for unified acceleration of CNNs and transformers. In Proc. IEEE International Solid-State Circuits Conference (ISSCC) 574–576 (IEEE, 2024).

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (grant number 2022YFB4400200 to T.M.), the National Natural Science Foundation of China (grant numbers 62274081 to L.L. and 12327806 to Z.C.), Zhujiang Young Talent Program (grant number 2023QN10X177 to L.L.) and Shenzhen Stable Support Plan Program for Higher Education Institutions Research Program (grant number 20231121110457002 to L.L.). We acknowledge the SUSTech SME-Pixelcore Neuromorphic In-sensor Computing Joint Laboratory and the SUSTech SME-CIMCube Joint Laboratory for experimental support in this work.

Author information

These authors contributed equally: Humiao Li, Zheng Chai.

Authors and Affiliations

School of Microelectronics, Southern University of Science and Technology, Shenzhen, China
Humiao Li, Weirong Dong, Junjie He, Ruijie Peng, Shiheng Li, Zhen Kong, Zhengke Yang, Haoran Lyu, Haofeng Yu, Jiamin Li, Feichi Zhou, Yida Li & Longyang Lin
State Key Laboratory for Mechanical Behavior of Materials, and School of Materials Science and Engineering, Xi’an Jiaotong University, Xi’an, China
Zheng Chai, Xihui Yuan, Xianwang Wang, Xue Zhou & Tai Min
Pazhou Laboratory (Huangpu) and Peng Cheng Laboratory, Guangzhou, China
Zheng Chai, Zongben Xu & Tai Min
Ministry of Education Key Laboratory of Intelligent Networks and Network Security, and School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China
Zongben Xu
National Laboratory of Solid-State Microstructures, and School of Materials Science and Intelligent Engineering, Nanjing University, Suzhou, China
Tai Min

Authors

Humiao Li
View author publications
Search author on:PubMed Google Scholar
Zheng Chai
View author publications
Search author on:PubMed Google Scholar
Weirong Dong
View author publications
Search author on:PubMed Google Scholar
Junjie He
View author publications
Search author on:PubMed Google Scholar
Ruijie Peng
View author publications
Search author on:PubMed Google Scholar
Shiheng Li
View author publications
Search author on:PubMed Google Scholar
Zhen Kong
View author publications
Search author on:PubMed Google Scholar
Xihui Yuan
View author publications
Search author on:PubMed Google Scholar
Xianwang Wang
View author publications
Search author on:PubMed Google Scholar
Zhengke Yang
View author publications
Search author on:PubMed Google Scholar
Haoran Lyu
View author publications
Search author on:PubMed Google Scholar
Haofeng Yu
View author publications
Search author on:PubMed Google Scholar
Xue Zhou
View author publications
Search author on:PubMed Google Scholar
Jiamin Li
View author publications
Search author on:PubMed Google Scholar
Feichi Zhou
View author publications
Search author on:PubMed Google Scholar
Yida Li
View author publications
Search author on:PubMed Google Scholar
Zongben Xu
View author publications
Search author on:PubMed Google Scholar
Tai Min
View author publications
Search author on:PubMed Google Scholar
Longyang Lin
View author publications
Search author on:PubMed Google Scholar

Contributions

L.L. and T.M. conceived and supervised the project. H. Li, R.P. and L.L. designed the circuits for the nvDCIM macro and test chip. H. Li, W.D. and J.H. performed the training, quantization and inference of the NNs and implemented the toggle-rate-aware training algorithm. H. Li, Z.C., S.L., Z.K., X.Y., X.W., Z.Y., H. Lyu, H.Y. and X.Z. performed the experiments, including device characterization and building the chip-testing platform, and chip testing. H. Li, Z.C., J.L., F.Z., Y.L., Z.X., T.M. and L.L. analysed the data. H. Li, Z.C., T.M. and L.L. wrote the paper. All authors reviewed and approved the paper.

Corresponding authors

Correspondence to Tai Min or Longyang Lin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Electronics thanks Abdolah Amirany, Esteban Garzón and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 STT-MRAM device.

(a) Vertical structure of MTJ. RL: reference layer, which has a fixed magnetization. FL: free layer, whose magnetization can switch between parallel (P) and antiparallel (AP) orientations relative to the reference layer. HL: hard layer, which possesses strong perpendicular magnetic anisotropy (PMA). (b) Measured I-V curve of the MTJ’s magnetic resistive switching. The SEM image shows the MTJ’s critical diameter of 78 nm. (c) Measured R-V curve of the MTJ’s magnetic resistive switching, showing a clear resistance change between high-resistance (anti-parallel, AP) and low-resistance (parallel, P) states. The TMR ratio is approximately 170% at 0.1 V. TMR: Tunnel Magneto-Resistance, TMR = (R_AP – R_P) / R_P * 100%. (d) Measured distribution of state switching voltages (V_AP→P, V_P→AP), with a mean V_AP→P of 0.461 V (standard deviation σ = 0.029 V, CV = 6.3%) and a mean V_P→AP of –0.299 V (σ = 0.020 V, CV = 6.7%). (e) Resistance distributions of the R_P and R_AP, where the mean R_AP is 9199.2 Ω (σ = 480.7 Ω, CV = 5.2%) and the mean R_P is 3363.7 Ω (σ = 171.8 Ω, CV = 5.1%). (f) Measured TMR distribution, with a mean of 173.5% (σ = 3.9%, CV = 2.2%). CV: Coefficient of Variation, CV = standard deviation (σ) / mean (μ) * 100%.

Extended Data Fig. 2 MVM timing diagram, test chip architecture, and test flow chart.

(a) The nvDCIM test chip architecture featuring the nvDCIM macro, integrated on-chip buffers, a clock generator, and SPI interfaces for data transfer. (b) nvDCIM chip test flow chart. (c) MVM timing diagram illustrating bit-serial processing for two examples: 4-bit unsigned weight with 4-bit signed input and 8-bit signed weight with 4-bit unsigned input.

Extended Data Fig. 3 Experimental and measurement platform for evaluating the nvDCIM chips.

(a) The experimental platform consists of an nvDCIM test chip, a PCB test board, and a National Instruments (NI) PXIe system, including the PXIe-6570 and PXIe-8881 modules, which handle chip control, intermediate data processing, and result visualization. Additionally, two source/measurement units (SMUs) are included in the platform for power measurements. (b) A flowchart illustrating the inference process conducted on the experimental platform. During the execution, the 64-kb nvDCIM macro performs parallel and lossless MVM operations across 4-, 8-, 12-, and 16-bit precisions for convolutional and fully connected layers. Input vectors and matrix data are supplied to the nvDCIM macro via the NI PXIe-6570 controlled through LabVIEW, which also retrieves the MVM results. Beyond this, the PXIe-6570 implements ReLU activations, pooling, Tanh activations, and batch normalization. The PXIe-8881 processes and displays the final inference results. The system supports a range of computational tasks, such as low computational precision tasks (for example, image classification with CNNs) and high computational precision tasks (for example, flow field reconstruction using PINNs).

Extended Data Fig. 4 Power and area breakdown.

(a) Area breakdown of the main macro components. (b) Power consumption breakdown of the nvDCIM macro components measured during the inference of the ResNet-20 model on the CIFAR-10 dataset.

Extended Data Fig. 5 Measured shmoo plot, energy efficiency, and test results across 24 nvDCIM chips.

(a) Measured shmoo plot of the nvDCIM macro showing the relationship between supply voltage (VDD) and maximum clock frequency (f_CLK) while operating in 4-bit-input, 4-bit-weight, and 16-bit-output mode. (b) Measured energy efficiency of the nvDCIM macro versus VDD in 4-bit-input, 4-bit-weight, and 16-bit-output mode, when the weight sparsity is 50% and the input toggle rate ranges from 50% to 6.25%. F, Fail; P, Pass. (c) Wafer map showing 12 selected shots (highlighted in blue), with the Z-pattern sampling. (d) Photograph of the fabricated 12-inch wafer, showing the positions of the selected shots, corresponding to the Z-pattern used in (c). (e) Measured throughput (TOPS) distribution at VDD = 1.20 V, with a mean of 4.44 TOPS (standard deviation σ = 0.10 TOPS, CV = 2.3%). (f) Measured throughput (TOPS) distribution at VDD = 0.65 V, with a mean of 0.64 TOPS (σ = 0.03 TOPS, CV = 4.7%). (g) Measured energy efficiency (TOPS/W) distribution at VDD = 1.20 V, with a mean of 40.1 TOPS/W (σ = 1.97 TOPS/W, CV = 4.9%). (h) Measured energy efficiency (TOPS/W) distribution at VDD = 0.65 V, with a mean of 86.0 TOPS/W (σ = 8.82 TOPS/W, CV = 10.3%). CV: Coefficient of Variation, CV = standard deviation (σ) / mean (μ) * 100%.

Extended Data Fig. 6 PINN model quantization and performance.

Flow field reconstruction with PINN: A comparison of predictions for u (streamwise velocity), v (spanwise velocity), and p (pressure) at varying computational precision levels, with computational fluid dynamics (CFD) data benchmark³². (a) CFD benchmark data. (b) Predictions from the FP32 PINN model. (c) Predictions from the INT16 PINN model. (d) Predictions from the INT12 PINN model. (e) Predictions from the INT8 PINN model. (f) Predictions from the INT4 PINN model. (g) Relative L2 Norm (R_L2): Temporal R_L2 for u with different bit precisions and the overall R_L2 for u. (h) Temporal R_L2 for v with different bit precisions and the overall R_L2 for v. (i) Temporal R_L2 for u with different bit precisions and the overall R_L2 for p.

Extended Data Fig. 7 Energy model construction.

In neural network workloads, convolution layers are mapped onto the nvDCIM hardware to perform efficient MVM operation, with feature maps and kernels assigned to input drivers and memory banks. Dynamic energy consumption is calculated by aggregating the energy costs of IBMD-bitcells, input drivers, and adders based on precomputed energy look-up tables. This model enables an evaluation of both model accuracy and energy consumption across convolutional and fully connected layers.

Extended Data Fig. 8 Toggle-rate-aware training results.

(a) Relationship between input toggle rate and accuracy for LeNet-5 INT4 model (dataset: MNIST). Increasing the regularization factor λ reduces the toggle rate while slightly impacting accuracy, demonstrating a tradeoff. (b) Energy efficiency improvement for LeNet-5. Reduction in toggle rate leads to notable energy efficiency gains, as illustrated by energy model estimations and chip-level measurements. (c) Relationship between input toggle rate and accuracy for ResNet-20 INT8 model (dataset: CIFAR-100). Similar to LeNet-5, increasing λ reduces the toggle rate with minimal accuracy degradation. (d) Energy efficiency improvement for ResNet-20 (dataset: CIFAR-100). Larger λ values result in more energy efficiency improvements.

Extended Data Table 1 Chip summary

Full size table

Extended Data Table 2 Comparison of nvDCIM with other nvCIM chips

Full size table

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, Notes 1–5, Table 1 and References.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Chai, Z., Dong, W. et al. A lossless and fully parallel spintronic compute-in-memory macro for artificial intelligence chips. Nat Electron (2025). https://doi.org/10.1038/s41928-025-01479-y

Download citation

Received: 13 January 2025
Accepted: 16 September 2025
Published: 16 October 2025
DOI: https://doi.org/10.1038/s41928-025-01479-y