A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices

Chiu, Yen-Cheng; Khwa, Win-San; Yang, Chia-Sheng; Teng, Shih-Hsin; Huang, Hsiao-Yu; Chang, Fu-Chun; Wu, Yuan; Chien, Yu-An; Hsieh, Fang-Ling; Li, Chung-Yuan; Lin, Guan-Yi; Chen, Po-Jung; Pan, Tsen-Hsiang; Lo, Chung-Chuan; Liu, Ren-Shuo; Hsieh, Chih-Cheng; Tang, Kea-Tiong; Ho, Mon-Shu; Lo, Chieh-Pu; Chih, Yu-Der; Chang, Tsung-Yung Jonathan; Chang, Meng-Fan

doi:10.1038/s41928-023-00994-0

Article
Published: 13 July 2023

A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices

Yen-Cheng Chiu¹,
Win-San Khwa²,
Chia-Sheng Yang¹,
Shih-Hsin Teng¹,
Hsiao-Yu Huang¹,
Fu-Chun Chang¹,
Yuan Wu¹,
Yu-An Chien¹,
Fang-Ling Hsieh¹,
Chung-Yuan Li¹,
Guan-Yi Lin¹,
Po-Jung Chen¹,
Tsen-Hsiang Pan¹,
Chung-Chuan Lo ORCID: orcid.org/0000-0001-7737-7250¹,
Ren-Shuo Liu¹,
Chih-Cheng Hsieh¹,
Kea-Tiong Tang¹,
Mon-Shu Ho³,
Chieh-Pu Lo ORCID: orcid.org/0000-0002-3413-8708²,
Yu-Der Chih²,
Tsung-Yung Jonathan Chang² &
…
Meng-Fan Chang ORCID: orcid.org/0000-0001-6905-6350^1,2

Nature Electronics volume 6, pages 534–543 (2023) Cite this article

5674 Accesses
75 Citations
38 Altmetric
Metrics details

Subjects

Abstract

Artificial intelligence edge devices should offer high inference accuracy and rapid response times, as well as being energy efficient. Ensuring the security of these devices against malicious attacks and illegal access requires data protection mechanisms and secure access control. Here we report a spintronic non-volatile compute-in-memory macro for efficient dot-product edge computing with secure access control for activation, key and data protection against power-on and power-off probing. The approach relies on spintronic-based physically unclonable functions and two-dimensional half-complement physical encryption, as well as a snoop-proof self-decryption burst-read scheme in conjunction with a sparsity-and-rectified-linear-unit-aware early-termination compute-in-memory engine. The 6.6 megabit complementary metal–oxide–semiconductor (CMOS)-integrated macro uses 22 nm spin-transfer torque magnetic random-access memory technology. The macro achieves high randomness (inter-Hamming distance, 0.4999) and high reliability for physically unclonable functionality (intra-Hamming distance, 0), as well as a high energy efficiency for dot-product computation (between 30.1 and 68.0 tera-operations per second per watt).

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of 6.6 Mb STT-MRAM nvCIM macro.**

**Fig. 3: 2DHC-PE and SP-SDBR schemes.**

A lossless and fully parallel spintronic compute-in-memory macro for artificial intelligence chips

Article 16 October 2025

A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices

Article 20 December 2021

A computing-in-memory macro based on three-dimensional resistive random-access memory

Article Open access 26 July 2022

Data availability

The data that support the plots presented in this paper as well as other findings derived from this study are available from the corresponding author upon reasonable request.

Code availability

The code that supports the experiment platforms is available from the corresponding author upon reasonable request.

References

Chou, C.-C. et al. A 22nm 96KX144 RRAM macro with a self-tracking reference and a low ripple charge pump to achieve a configurable read window and a wide operating voltage range. In 2020 IEEE Symposium on VLSI Circuits 1–2 (IEEE, 2020).
Chih, Y. et al. A 22nm 32Mb embedded STT-MRAM with 10ns read speed, 1M cycle write endurance, 10 years retention at 150 °C and high immunity to magnetic field interference. In 2020 IEEE International Solid-State Circuits Conference—(ISSCC) 222–224 (IEEE, 2020).
Chang, T. et al. A 22nm 1Mb 1024b-read and near-memory-computing dual-mode STT-MRAM macro with 42.6GB/s read bandwidth for security-aware mobile devices. In 2020 IEEE International Solid-State Circuits Conference—(ISSCC) 224–226 (IEEE, 2020).
Chiu, Y.-C. et al. A 40 nm 2 Mb ReRAM macro with 85% reduction in FORMING time and 99% reduction in page-write time using auto-FORMING and auto-write schemes. In Symposium on VLSI Technology T232–T233 (IEEE, 2019).
Jain, P. et al. A 3.6 Mb 10.1 Mb/mm² embedded non-volatile ReRAM macro in 22 nm FinFET technology with adaptive forming/set/reset schemes yielding down to 0.5 V with sensing time of 5 ns at 0.7 V. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 212–214 (IEEE, 2019).
Dong, Q. et al. A 1 Mb 28 nm STT-MRAM with 2.8 ns read access time at 1.2 V VDD using single-cap offset-cancelled sense amplifier and in-situ self-write-termination. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 480–482 (IEEE, 2018).
Yang, T. et al. A 28 nm 32 Kb embedded 2T2MTJ STT-MRAM macro with 1.3 ns read-access time for fast and reliable read applications. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 482–484 (IEEE, 2018).
Liu, P. et al. A 65 nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 84–86 (IEEE, 2016).
Lee, C.-F. et al. A 1.4 Mb 40-nm embedded ReRAM macro with 0.07 µm² bit cell, 2.7 mA/100 MHz low-power read and hybrid write verify for high endurance application. In IEEE Asian Solid-State Circuits Conference (A-SSCC) 9–12 (IEEE, 2017).
Chang, M.-F. et al. Embedded 1 Mb ReRAM in 28 nm CMOS with 0.27-to1 V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 332–334 (IEEE, 2014).
Xue, C.-X. et al. A 22 nm 4 Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7TOPS/W for tiny AI edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 245–247 (IEEE, 2021).
Xue, C.-X. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat. Electron. 4, 81–90 (2021).
Article Google Scholar
Xue, C.-X. et al. A 22 nm 2 Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 244–245 (IEEE, 2020).
Xue, C.-X. et al. A 1 Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN-based AI edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 388–390 (IEEE, 2019).
Chen, W.-H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors.Nat. Electron. 2, 420–428 (2019).
Article Google Scholar
Tang, K.-T. et al. Considerations of integrating computing-in-memory and processing-in-sensor into convolutional neural network accelerators for low-power edge devices. In IEEE Symposium on VLSI Technology 166–167 (IEEE, 2019).
Chen, W.-H. et al. A 65 nm 1 Mb nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 494–496 (IEEE, 2018).
Yoon, J.-H. et al. A 40 nm 64 Kb 56.67TOPS/W read-disturb-tolerant compute-in-memory/digital RRAM macro with active-feedback-based read and in-situ write verification. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 404–406 (IEEE, 2021).
Khaddam-Aljameh, R. et al. HERMES core—a 14 nm CMOS and PCM-based in-memory compute core using an array of 300 ps/LSB linearized CCO-based ADCs and local digital processing. In IEEE Symposium on VLSI Circuits 1–2 (IEEE, 2021).
Wan, W. et al. A 74 TMACS/W CMOS-RRAM neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 498–499 (IEEE, 2020).
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Article Google Scholar
Liu, Q. et al. A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 500–501 (IEEE, 2020).
Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).
Article Google Scholar
Mochida, R. et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In IEEE Symposium on VLSI Technology 175–176 (IEEE, 2018).
Ielmini, D. et al. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Article Google Scholar
Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).
Article Google Scholar
Boybat, I. et al. Neuromorphic computing with multi-memristive synapses. Nat. Commun. 9, 2514 (2018).
Article Google Scholar
Wang, Z. et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron. 1, 137–145 (2018).
Article Google Scholar
Le Gallo, M. et al. Mixed-precision in-memory computing. Nat. Electron. 1, 246–253 (2018).
Article Google Scholar
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
Article Google Scholar
Zidan, M.-A. et al. The future of electronics based on memristive systems. Nat. Electron. 1, 22–29 (2018).
Article Google Scholar
Pang, Y. et al. A reconfigurable RRAM physically unclonable function utilizing post-process randomness source with <6 × 10⁻⁶ native bit error rate. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 402–404 (IEEE, 2019).
Xue, X. et al. A 28 nm 512 Kb adjacent 2T2R RRAM PUF with interleaved cell mirroring and self-adaptive splitting for extremely low bit error rate of cryptographic key. In IEEE Asian Solid-State Circuits Conference (A-SSCC) 29–32 (IEEE, 2019).
Wu, M.-Y. et al. A PUF scheme using competing oxide rupture with bit error rate approaching zero. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 130–132 (IEEE, 2018).
Gallagher, W.-J. et al. 22 nm STT-MRAM for reflow and automotive uses with high yield, reliability, and magnetic immunity and with performance and shielding options. In IEEE International Electron Devices Meeting (IEDM) 2.7.1–2.7.4 (IEEE, 2019).
Chang, M. & Shen, S. A process variation tolerant embedded split-gate flash memory using pre-stable current sensing scheme. IEEE J. Solid-State Circuits 44, 987–994 (2009).
Nair, V. & Hinton, G.-E. Rectified linear units improve restricted Boltzmann machines. In Proc. 27th International Conference on Machine Learning (ICML'10) 807–814 (IEEE, 2010).
Chen, Z. et al. A 65 nm 3T dynamic analog RAM-based computing-in-memory macro and CNN accelerator with retention enhancement, adaptive analog sparsity and 44TOPS/W system energy efficiency. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 240–242 (IEEE, 2021).
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Tech Report (2009).
He, K. et al. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Yuval, N. et al. Reading digits in natural images with unsupervised feature learning. In Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011).
Menze, B.-H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015).
Article Google Scholar
Federal Information Processing Standards (FIPS) Publication 180-2. Secure Hash Standard (SHS). US DoC/NIST (2002).
Chiu, Y.-C. et al. A 22 nm 4 Mb STT-MRAM data-encrypted near-memory computation macro with a 192 GB/s read-and-decryption bandwidth and 25.1-55.1TOPS/W 8b MAC for AI operations. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 178–180 (IEEE, 2022).
Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601, 211–216 (2022).
Article Google Scholar
Swami, S., Rakshit, J. & Mohanram, K. SECRET: smartly encrypted energy efficient non-volatile memories. In Proc. 53rd Annual Design Automation Conference 166 (ACM, 2016).
Huang, S., Jiang, H., Peng, X, Li, W. & Yu, S. XOR-CIM: compute-in-memory SRAM architecture with embedded XOR encryption. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD) 77 (ACM, 2020).
Li, W., Huang, S., Sun, X., Jiang H. & Yu, S. Secure-RRAM: a 40 nm 16 kb compute-in-memory macro with reconfigurability, sparsity control, and embedded security. In 2021 IEEE Custom Integrated Circuits Conference (CICC) 1–2 (IEEE, 2021).
Chiu, Y.-C. et al. A 22-nm 1-Mb 1024-b read data-protected STT-MRAM macro with near-memory shift-and-rotate functionality and 42.6-GB/s read bandwidth for security-aware mobile device. IEEE J. Solid-State Circuits 57, 1936–1949 (2022).
Article Google Scholar

Download references

Acknowledgements

We express our appreciation for support from NTHU, TSMC and NSTC, Taiwan.

Author information

Authors and Affiliations

National Tsing Hua University (NTHU), Hsinchu, Taiwan, Republic of China
Yen-Cheng Chiu, Chia-Sheng Yang, Shih-Hsin Teng, Hsiao-Yu Huang, Fu-Chun Chang, Yuan Wu, Yu-An Chien, Fang-Ling Hsieh, Chung-Yuan Li, Guan-Yi Lin, Po-Jung Chen, Tsen-Hsiang Pan, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang & Meng-Fan Chang
Taiwan Semiconductor Manufacturing Company (TSMC), Hsinchu, Taiwan, Republic of China
Win-San Khwa, Chieh-Pu Lo, Yu-Der Chih, Tsung-Yung Jonathan Chang & Meng-Fan Chang
National Chung Hsing University (NCHU), Taichung City, Taiwan, Republic of China
Mon-Shu Ho

Authors

Yen-Cheng Chiu
View author publications
Search author on:PubMed Google Scholar
Win-San Khwa
View author publications
Search author on:PubMed Google Scholar
Chia-Sheng Yang
View author publications
Search author on:PubMed Google Scholar
Shih-Hsin Teng
View author publications
Search author on:PubMed Google Scholar
Hsiao-Yu Huang
View author publications
Search author on:PubMed Google Scholar
Fu-Chun Chang
View author publications
Search author on:PubMed Google Scholar
Yuan Wu
View author publications
Search author on:PubMed Google Scholar
Yu-An Chien
View author publications
Search author on:PubMed Google Scholar
Fang-Ling Hsieh
View author publications
Search author on:PubMed Google Scholar
Chung-Yuan Li
View author publications
Search author on:PubMed Google Scholar
Guan-Yi Lin
View author publications
Search author on:PubMed Google Scholar
Po-Jung Chen
View author publications
Search author on:PubMed Google Scholar
Tsen-Hsiang Pan
View author publications
Search author on:PubMed Google Scholar
Chung-Chuan Lo
View author publications
Search author on:PubMed Google Scholar
Ren-Shuo Liu
View author publications
Search author on:PubMed Google Scholar
Chih-Cheng Hsieh
View author publications
Search author on:PubMed Google Scholar
Kea-Tiong Tang
View author publications
Search author on:PubMed Google Scholar
Mon-Shu Ho
View author publications
Search author on:PubMed Google Scholar
Chieh-Pu Lo
View author publications
Search author on:PubMed Google Scholar
Yu-Der Chih
View author publications
Search author on:PubMed Google Scholar
Tsung-Yung Jonathan Chang
View author publications
Search author on:PubMed Google Scholar
Meng-Fan Chang
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.-C.C. designed the circuits for the nvCIM macro and test chip. Y.-C.C., W.-S.K., C.-S.Y., S.-H.T., H.-Y.H., F.-C.C., Y.W., Y.-A.C., F.-L.H., C.-Y.L., G.-Y.L., P.-J.C., T.-H.P., C.-C.L., R.-S.L., C.-C.H., K.-T.T., M.-S.H. and M.-F.C. contributed ideas. C.-S.Y., S.-H.T., H.-Y.H., F.-C.C., Y.W., Y.-A.C., F.-L.H., C.-Y.L., G.-Y.L., P.-J.C. and T.-H.P. built the test measurement system and testing flow for the STT-MRAM nvCIM macro. F.-C.C., Y.W., G.-Y.L. and C.-Y.L. conducted the security experiment and AI demonstration system. Y.-C.C. performed the analysis and measurements of the nvCIM macro. C.-P.L., Y.-D.C., T.-Y.J.C. and M.-F.C. managed the project. Y.-C.C., M.-S.H. and M.-F.C. wrote the paper.

Corresponding author

Correspondence to Meng-Fan Chang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Electronics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Experiment platform used to assess AI inference operations of the nvCIM macro with Resnet20 model.

a, Experiment platform including nvCIM testchip, FPGA board as system controller with intermediate data processor, and a PC to display classification results on an LCD screen. This nvCIM test-chip eliminated the drop in inference accuracy typical of software-based inference, regardless of the dataset SVHN, CIFAR-10, BraTS, and CIFAR-100. b, Flow chart showing inference process implemented on experiment platform. The nvCIM macro performed full-channel dot-product operations (convolutions), while the FPGA fed the input to the nvCIM macro, collected the full-channel dot-product values (Dot-PVFC) from the nvCIM macro, performed pooling operations, and executed the 1st-layer convolution operation. The PC platform presented intermediate data generated during inference and the final inference results on an LCD screen.

Extended Data Fig. 2 Experiment involving chip-ID identification using the nvCIM macro.

a, This experiment was implemented using an nvCIM test-chip, an FPGA as a system controller, and a PC to display chip-ID comparison results. b, Flow chart of chip-ID identification operation. The 6.6 Mb nvCIM macro created a chip-ID, while the FPGA fed the challenge to the nvCIM macro and collected the chip-ID. The nvCIM macro verified the chip-ID, while the FPGA fed the challenge and external chip-ID (chip-IDE) and stored the chip-ID comparison results. The PC was used as a command interface presenting the comparison results and chip-IDs on the LCD screen.

Extended Data Fig. 3 Experiment on accessing secured data using the nvCIM macro.

a, Fabricated 22 nm 6.6 Mb nvCIM test-chip used with an FPGA as system controller and PC to display the comparison results of accessing the secured data on an LCD screen. b, Flow chart showing the process by which secured data was accessed, wherein the FPGA fed the challenge, write command, and owner-KEY to the nvCIM macro, which subsequently performed a secure data-write operation. The data-read operation involved the FPGA feeding the challenge, read command, and user-KEY to the nvCIM macro to perform the secure data-read operation. Finally, the FPGA passed output data from the nvCIM macro to the PC, which compared decrypted data from the nvCIM macro with target plaintext data, the results of which were displayed on an LCD screen.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–6 and Notes 1–4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chiu, YC., Khwa, WS., Yang, CS. et al. A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices. Nat Electron 6, 534–543 (2023). https://doi.org/10.1038/s41928-023-00994-0

Download citation

Received: 23 June 2022
Accepted: 14 June 2023
Published: 13 July 2023
Version of record: 13 July 2023
Issue date: July 2023
DOI: https://doi.org/10.1038/s41928-023-00994-0

This article is cited by

Self-Rectifying Memristors for Beyond-CMOS Computing: Mechanisms, Materials, and Integration Prospects
- Guobin Zhang
- Xuemeng Fan
- Yishu Zhang
Nano-Micro Letters (2026)
Self-rectifying memristors with high rectification ratio for attack-resilient autonomous driving systems
- Guobin Zhang
- Xuemeng Fan
- Yishu Zhang
Nature Communications (2025)
Transforming memristor noises into computational innovations
- Chenchen Ding
- Yuan Ren
- Ngai Wong
Communications Materials (2025)
Federated learning using a memristor compute-in-memory chip with in situ physical unclonable function and true random number generator
- Xueqi Li
- Bin Gao
- Huaqiang Wu
Nature Electronics (2025)
Bank data protection and fraud identification based on improved adaptive federated learning and WGAN
- Yunjiao Zheng
Scientific Reports (2025)