Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices

Abstract

Artificial intelligence edge devices should offer high inference accuracy and rapid response times, as well as being energy efficient. Ensuring the security of these devices against malicious attacks and illegal access requires data protection mechanisms and secure access control. Here we report a spintronic non-volatile compute-in-memory macro for efficient dot-product edge computing with secure access control for activation, key and data protection against power-on and power-off probing. The approach relies on spintronic-based physically unclonable functions and two-dimensional half-complement physical encryption, as well as a snoop-proof self-decryption burst-read scheme in conjunction with a sparsity-and-rectified-linear-unit-aware early-termination compute-in-memory engine. The 6.6 megabit complementary metal–oxide–semiconductor (CMOS)-integrated macro uses 22 nm spin-transfer torque magnetic random-access memory technology. The macro achieves high randomness (inter-Hamming distance, 0.4999) and high reliability for physically unclonable functionality (intra-Hamming distance, 0), as well as a high energy efficiency for dot-product computation (between 30.1 and 68.0 tera-operations per second per watt).

This is a preview of subscription content, access via your institution

Access options

Fig. 1: Overview of 6.6 Mb STT-MRAM nvCIM macro.
Fig. 2: SRR-PUF.
Fig. 3: 2DHC-PE and SP-SDBR schemes.
Fig. 4: SRET-CIME.

Similar content being viewed by others

Data availability

The data that support the plots presented in this paper as well as other findings derived from this study are available from the corresponding author upon reasonable request.

Code availability

The code that supports the experiment platforms is available from the corresponding author upon reasonable request.

References

  1. Chou, C.-C. et al. A 22nm 96KX144 RRAM macro with a self-tracking reference and a low ripple charge pump to achieve a configurable read window and a wide operating voltage range. In 2020 IEEE Symposium on VLSI Circuits 1–2 (IEEE, 2020).

  2. Chih, Y. et al. A 22nm 32Mb embedded STT-MRAM with 10ns read speed, 1M cycle write endurance, 10 years retention at 150 °C and high immunity to magnetic field interference. In 2020 IEEE International Solid-State Circuits Conference—(ISSCC) 222–224 (IEEE, 2020).

  3. Chang, T. et al. A 22nm 1Mb 1024b-read and near-memory-computing dual-mode STT-MRAM macro with 42.6GB/s read bandwidth for security-aware mobile devices. In 2020 IEEE International Solid-State Circuits Conference—(ISSCC) 224–226 (IEEE, 2020).

  4. Chiu, Y.-C. et al. A 40 nm 2 Mb ReRAM macro with 85% reduction in FORMING time and 99% reduction in page-write time using auto-FORMING and auto-write schemes. In Symposium on VLSI Technology T232–T233 (IEEE, 2019).

  5. Jain, P. et al. A 3.6 Mb 10.1 Mb/mm2 embedded non-volatile ReRAM macro in 22 nm FinFET technology with adaptive forming/set/reset schemes yielding down to 0.5 V with sensing time of 5 ns at 0.7 V. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 212–214 (IEEE, 2019).

  6. Dong, Q. et al. A 1 Mb 28 nm STT-MRAM with 2.8 ns read access time at 1.2 V VDD using single-cap offset-cancelled sense amplifier and in-situ self-write-termination. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 480–482 (IEEE, 2018).

  7. Yang, T. et al. A 28 nm 32 Kb embedded 2T2MTJ STT-MRAM macro with 1.3 ns read-access time for fast and reliable read applications. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 482–484 (IEEE, 2018).

  8. Liu, P. et al. A 65 nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 84–86 (IEEE, 2016).

  9. Lee, C.-F. et al. A 1.4 Mb 40-nm embedded ReRAM macro with 0.07 µm2 bit cell, 2.7 mA/100 MHz low-power read and hybrid write verify for high endurance application. In IEEE Asian Solid-State Circuits Conference (A-SSCC) 9–12 (IEEE, 2017).

  10. Chang, M.-F. et al. Embedded 1 Mb ReRAM in 28 nm CMOS with 0.27-to1 V read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 332–334 (IEEE, 2014).

  11. Xue, C.-X. et al. A 22 nm 4 Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7TOPS/W for tiny AI edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 245–247 (IEEE, 2021).

  12. Xue, C.-X. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat. Electron. 4, 81–90 (2021).

    Article  Google Scholar 

  13. Xue, C.-X. et al. A 22 nm 2 Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 244–245 (IEEE, 2020).

  14. Xue, C.-X. et al. A 1 Mb multibit ReRAM computing-in-memory macro with 14.6 ns parallel MAC computing time for CNN-based AI edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 388–390 (IEEE, 2019).

  15. Chen, W.-H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors.Nat. Electron. 2, 420–428 (2019).

    Article  Google Scholar 

  16. Tang, K.-T. et al. Considerations of integrating computing-in-memory and processing-in-sensor into convolutional neural network accelerators for low-power edge devices. In IEEE Symposium on VLSI Technology 166–167 (IEEE, 2019).

  17. Chen, W.-H. et al. A 65 nm 1 Mb nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 494–496 (IEEE, 2018).

  18. Yoon, J.-H. et al. A 40 nm 64 Kb 56.67TOPS/W read-disturb-tolerant compute-in-memory/digital RRAM macro with active-feedback-based read and in-situ write verification. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 404–406 (IEEE, 2021).

  19. Khaddam-Aljameh, R. et al. HERMES core—a 14 nm CMOS and PCM-based in-memory compute core using an array of 300 ps/LSB linearized CCO-based ADCs and local digital processing. In IEEE Symposium on VLSI Circuits 1–2 (IEEE, 2021).

  20. Wan, W. et al. A 74 TMACS/W CMOS-RRAM neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 498–499 (IEEE, 2020).

  21. Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).

    Article  Google Scholar 

  22. Liu, Q. et al. A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 500–501 (IEEE, 2020).

  23. Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).

    Article  Google Scholar 

  24. Mochida, R. et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In IEEE Symposium on VLSI Technology 175–176 (IEEE, 2018).

  25. Ielmini, D. et al. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).

    Article  Google Scholar 

  26. Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).

    Article  Google Scholar 

  27. Boybat, I. et al. Neuromorphic computing with multi-memristive synapses. Nat. Commun. 9, 2514 (2018).

    Article  Google Scholar 

  28. Wang, Z. et al. Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron. 1, 137–145 (2018).

    Article  Google Scholar 

  29. Le Gallo, M. et al. Mixed-precision in-memory computing. Nat. Electron. 1, 246–253 (2018).

    Article  Google Scholar 

  30. Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).

    Article  Google Scholar 

  31. Zidan, M.-A. et al. The future of electronics based on memristive systems. Nat. Electron. 1, 22–29 (2018).

    Article  Google Scholar 

  32. Pang, Y. et al. A reconfigurable RRAM physically unclonable function utilizing post-process randomness source with <6 × 10−6 native bit error rate. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 402–404 (IEEE, 2019).

  33. Xue, X. et al. A 28 nm 512 Kb adjacent 2T2R RRAM PUF with interleaved cell mirroring and self-adaptive splitting for extremely low bit error rate of cryptographic key. In IEEE Asian Solid-State Circuits Conference (A-SSCC) 29–32 (IEEE, 2019).

  34. Wu, M.-Y. et al. A PUF scheme using competing oxide rupture with bit error rate approaching zero. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 130–132 (IEEE, 2018).

  35. Gallagher, W.-J. et al. 22 nm STT-MRAM for reflow and automotive uses with high yield, reliability, and magnetic immunity and with performance and shielding options. In IEEE International Electron Devices Meeting (IEDM) 2.7.1–2.7.4 (IEEE, 2019).

  36. Chang, M. & Shen, S. A process variation tolerant embedded split-gate flash memory using pre-stable current sensing scheme. IEEE J. Solid-State Circuits 44, 987–994 (2009).

  37. Nair, V. & Hinton, G.-E. Rectified linear units improve restricted Boltzmann machines. In Proc. 27th International Conference on Machine Learning (ICML'10) 807–814 (IEEE, 2010).

  38. Chen, Z. et al. A 65 nm 3T dynamic analog RAM-based computing-in-memory macro and CNN accelerator with retention enhancement, adaptive analog sparsity and 44TOPS/W system energy efficiency. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 240–242 (IEEE, 2021).

  39. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Tech Report (2009).

  40. He, K. et al. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).

  41. Yuval, N. et al. Reading digits in natural images with unsupervised feature learning. In Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011).

  42. Menze, B.-H. et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015).

    Article  Google Scholar 

  43. Federal Information Processing Standards (FIPS) Publication 180-2. Secure Hash Standard (SHS). US DoC/NIST (2002).

  44. Chiu, Y.-C. et al. A 22 nm 4 Mb STT-MRAM data-encrypted near-memory computation macro with a 192 GB/s read-and-decryption bandwidth and 25.1-55.1TOPS/W 8b MAC for AI operations. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 178–180 (IEEE, 2022).

  45. Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601, 211–216 (2022).

    Article  Google Scholar 

  46. Swami, S., Rakshit, J. & Mohanram, K. SECRET: smartly encrypted energy efficient non-volatile memories. In Proc. 53rd Annual Design Automation Conference 166 (ACM, 2016).

  47. Huang, S., Jiang, H., Peng, X, Li, W. & Yu, S. XOR-CIM: compute-in-memory SRAM architecture with embedded XOR encryption. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD) 77 (ACM, 2020).

  48. Li, W., Huang, S., Sun, X., Jiang H. & Yu, S. Secure-RRAM: a 40 nm 16 kb compute-in-memory macro with reconfigurability, sparsity control, and embedded security. In 2021 IEEE Custom Integrated Circuits Conference (CICC) 1–2 (IEEE, 2021).

  49. Chiu, Y.-C. et al. A 22-nm 1-Mb 1024-b read data-protected STT-MRAM macro with near-memory shift-and-rotate functionality and 42.6-GB/s read bandwidth for security-aware mobile device. IEEE J. Solid-State Circuits 57, 1936–1949 (2022).

    Article  Google Scholar 

Download references

Acknowledgements

We express our appreciation for support from NTHU, TSMC and NSTC, Taiwan.

Author information

Authors and Affiliations

Authors

Contributions

Y.-C.C. designed the circuits for the nvCIM macro and test chip. Y.-C.C., W.-S.K., C.-S.Y., S.-H.T., H.-Y.H., F.-C.C., Y.W., Y.-A.C., F.-L.H., C.-Y.L., G.-Y.L., P.-J.C., T.-H.P., C.-C.L., R.-S.L., C.-C.H., K.-T.T., M.-S.H. and M.-F.C. contributed ideas. C.-S.Y., S.-H.T., H.-Y.H., F.-C.C., Y.W., Y.-A.C., F.-L.H., C.-Y.L., G.-Y.L., P.-J.C. and T.-H.P. built the test measurement system and testing flow for the STT-MRAM nvCIM macro. F.-C.C., Y.W., G.-Y.L. and C.-Y.L. conducted the security experiment and AI demonstration system. Y.-C.C. performed the analysis and measurements of the nvCIM macro. C.-P.L., Y.-D.C., T.-Y.J.C. and M.-F.C. managed the project. Y.-C.C., M.-S.H. and M.-F.C. wrote the paper.

Corresponding author

Correspondence to Meng-Fan Chang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Electronics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Experiment platform used to assess AI inference operations of the nvCIM macro with Resnet20 model.

a, Experiment platform including nvCIM testchip, FPGA board as system controller with intermediate data processor, and a PC to display classification results on an LCD screen. This nvCIM test-chip eliminated the drop in inference accuracy typical of software-based inference, regardless of the dataset SVHN, CIFAR-10, BraTS, and CIFAR-100. b, Flow chart showing inference process implemented on experiment platform. The nvCIM macro performed full-channel dot-product operations (convolutions), while the FPGA fed the input to the nvCIM macro, collected the full-channel dot-product values (Dot-PVFC) from the nvCIM macro, performed pooling operations, and executed the 1st-layer convolution operation. The PC platform presented intermediate data generated during inference and the final inference results on an LCD screen.

Extended Data Fig. 2 Experiment involving chip-ID identification using the nvCIM macro.

a, This experiment was implemented using an nvCIM test-chip, an FPGA as a system controller, and a PC to display chip-ID comparison results. b, Flow chart of chip-ID identification operation. The 6.6 Mb nvCIM macro created a chip-ID, while the FPGA fed the challenge to the nvCIM macro and collected the chip-ID. The nvCIM macro verified the chip-ID, while the FPGA fed the challenge and external chip-ID (chip-IDE) and stored the chip-ID comparison results. The PC was used as a command interface presenting the comparison results and chip-IDs on the LCD screen.

Extended Data Fig. 3 Experiment on accessing secured data using the nvCIM macro.

a, Fabricated 22 nm 6.6 Mb nvCIM test-chip used with an FPGA as system controller and PC to display the comparison results of accessing the secured data on an LCD screen. b, Flow chart showing the process by which secured data was accessed, wherein the FPGA fed the challenge, write command, and owner-KEY to the nvCIM macro, which subsequently performed a secure data-write operation. The data-read operation involved the FPGA feeding the challenge, read command, and user-KEY to the nvCIM macro to perform the secure data-read operation. Finally, the FPGA passed output data from the nvCIM macro to the PC, which compared decrypted data from the nvCIM macro with target plaintext data, the results of which were displayed on an LCD screen.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–6 and Notes 1–4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chiu, YC., Khwa, WS., Yang, CS. et al. A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices. Nat Electron 6, 534–543 (2023). https://doi.org/10.1038/s41928-023-00994-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41928-023-00994-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing