Abstract
Applications of artificial intelligence (AI) necessitate AI hardware accelerators able to efficiently process data-intensive and computation-intensive AI workloads. AI accelerators require two types of memory: the weight memory that stores the parameters of the AI models and the buffer memory that stores the intermediate input or output data when computing a portion of the AI models. In this Review, we present the recent progress in the emerging high-speed memory for AI hardware accelerators and survey the technologies enabling the global buffer memory in digital systolic-array architectures. Beyond conventional static random-access memory (SRAM), we highlight the following device candidates: capacitorless gain cell-based embedded dynamic random-access memories (eDRAMs), ferroelectric memories, spin-transfer torque magnetic random-access memory (STT-MRAM) and spin-orbit torque magnetic random-access memory (SOT-MRAM). We then summarize the research advances in the industrial development and the technological challenges in buffer memory applications. Finally, we present a systematic benchmarking analysis on a tensor processing unit (TPU)-like AI accelerator in the edge and in the cloud and evaluate the use of these emerging memories.
Key points
-
The global buffer in artificial intelligence (AI) hardware (for example, the tensor processing unit (TPU)) is traditionally based on static random-access memory (SRAM), which is expensive in the silicon footprint and suffers from high stand-by leakage power. Emerging memories with high speed and high endurance could replace SRAM as global buffers.
-
A capacitorless two-transistor (2T) gain cell, an implementation of embedded dynamic random-access memory (DRAM), uses amorphous oxide semiconductors as the channel material allowing a high data retention time.
-
Ferroelectric memories such as the ferroelectric field effect transistor (FeFET) and magnetic memories such as spin-transfer torque magnetic random-access memory (STT-MRAM) or spin-orbit torque magnetic random-access memory (SOT-MRAM) could be tailored to improve their cycling endurance, making them viable as global buffer candidates.
-
Three-dimensional integration that stacks emerging memories and their access transistors all together at the back end of line (BEOL) paves the way for high-density global buffer solutions that are even denser than the leading edge node SRAMs.
-
Leading edge node SRAM is still a competitive high-performance technology for AI hardware in the cloud, whereas emerging memories exhibit more advantages in AI hardware at the edge where minimizing the stand-by leakage power is critical.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
References
Jouppi, N. et al. TPU v4: an optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In Proc. 50th Annual Int. Symp. Computer Architecture (ed. Solihin, Y.) 1–14 (Association for Computing Machinery, 2023).
Cass, S. Taking AI to the edge: Google’s TPU now comes in a maker-friendly package. IEEE Spectr. 56, 16–17 (2019).
Deng, L., Li, G., Han, S., Shi, L. & Xie, Y. Model compression and hardware acceleration for neural networks: a comprehensive survey. Proc. IEEE 108, 485–532 (2020). This work provides a survey of hardware acceleration for neural networks.
Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. How to evaluate deep neural network processors: TOPS/W (alone) considered harmful. IEEE Solid-State Circuits Mag. 12, 28–41 (2020).
Zhang, W. et al. Neuro-inspired computing chips. Nat. Electron. 3, 371–382 (2020).
Yu, S., Jiang, H., Huang, S., Peng, X. & Lu, A. Compute-in-memory chips for deep learning: recent trends and prospects. IEEE Circuits Syst. Mag. 21, 31–56 (2021). This work provides recent trends in weight memory in CIM engines as background for this Review.
Aoyagi, Y. et al. A 3-nm 27.6-Mbit/mm2 self-timed SRAM enabling 0.48 - 1.2 V wide operating range with far-end pre-charge and weak-bit tracking. In IEEE Symp. VLSI Technology and Circuits (eds Miyashita, K. & Oike, Y.) (IEEE, 2023).
Chang, J. et al. A 3nm 256Mb SRAM in FinFET technology with new array banking architecture and write-assist circuitry scheme for high-density and low-VMIN applications. In IEEE Symp. VLSI Technology and Circuits (eds Miyashita, K. & Oike, Y.) (IEEE, 2023).
Yu, S. Semiconductor Memory Devices and Circuits 1-4 (CRC, 2022). This work provides criteria for high-speed memory candidates for global buffer memory.
Luo, Y., Luo, Y.-C. & Yu, S. A ferroelectric-based volatile/non-volatile dual-mode buffer memory for deep neural network accelerators. IEEE Trans. Comput. 71, 2088–2101 (2021).
Coleman, C. A. et al. DAWNBench: An end-to-end deep learning benchmark and competition. In Conference on Neural Information Processing Systems, Machine Learning for Systems Workshop (eds Guyon, I. & von Luxburg, U.) (NeurIPS, 2017).
Cai, Y., Ghose, S., Haratsch, E. F., Luo, Y. & Mutlu, O. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. Proc. IEEE. 105, 1666–1704 (2017).
Saligram, R., Datta, S. & Raychowdhury, A. CryoMem: a 4K-300K 1.3GHz eDRAM macro with hybrid 2T-gain-cell in a 28nm logic process for cryogenic applications. In IEEE Custom Integrated Circuits Conf. (CICC) (ed. Raychowdhury, A.) (IEEE, 2021).
Ye, H. et al. Double-gate W-doped amorphous indium oxide transistors for monolithic 3D capacitorless gain cell eDRAM. In 2020 IEEE Int. Electron Devices Meeting (IEDM) (ed. Datta, S.) 613–614 (IEEE, 2020).
International Roadmap of Devices and Systems 2022 Edition, More Moore; https://irds.ieee.org/editions/2022/more-moore (accessed 24 November 2022).
On, N. et al. Boosting carrier mobility and stability in indium–zinc–tin oxide thin-film transistors through controlled crystallization. Sci. Rep. 10, 18868 (2020).
Hu, Y., Chakraborty, W., Ye, H., Datta, S. & Cho, K. First-principles investigation of amorphous n-type In2 O3 for BEOL transistor. In Int. Conf. Simulation of Semiconductor Processes and Devices (SISPAD) (ed. Vandenberghe, W.) 116–119 (IEEE, 2021).
Shiah, Y.-S. et al. Mobility–stability trade-off in oxide thin-film transistors. Nat. Electron. 4, 800–807 (2021).
Böscke, T., Müller, J., Bräuhaus, D., Schröder, U. & Böttger, U. Ferroelectricity in hafnium oxide thin films. Appl. Phys. Lett. 99, 102903 (2011).
Mulaosmanovic, H. et al. Ferroelectric field-effect transistors based on HfO2: a review. Nanotechnology 32, 502002 (2021).
Haratipour, N. et al. Hafnia-based FeRAM: a path toward ultra-high density for next-generation high-speed embedded memory. In International Electron Devices Meeting (IEDM) (ed. De Salvo, B.) 138–141 (IEEE, 2022).
Salahuddin, S., Ni, K. & Datta, S. The era of hyper-scaling in electronics. Nat. Electron. 1, 442–450 (2018).
Sharma, A. A. et al. High speed memory operation in channel-last, back-gated ferroelectric transistors. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Datta, S.) 391–394 (IEEE, 2020).
Dutta, S. et al. Logic compatible high-performance ferroelectric transistor memory. IEEE Electron. Device Lett. 43, 382–385 (2022).
Ni, K. et al. SoC logic compatible multi-bit FeMFET weight cell for neuromorphic applications. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Rim, K.) 296–299 (IEEE, 2018).
Choe, G. & Yu, S. Multigate ferroelectric transistor design toward 3-nm technology node. IEEE Trans. Electron. Devices 68, 5908–5911 (2021).
Baibich, M. N. et al. Giant magnetoresistance of (001)Fe/(001)Cr magnetic superlattices. Phys. Rev. Lett. 61, 2472 (1988).
Apalkov, D. et al. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J. Emerg. Technol. Comput. Syst. 9, 1–35 (2013).
Shao, Q. et al. Roadmap of spin-orbit torques. IEEE Trans. Magn. 57, 1–39 (2021).
Shum, D. et al. CMOS-embedded STT-MRAM arrays in 2x nm nodes for GP-MCU applications. In 2017 Symp. VLSI Technology (ed. Inaba, S.) T208–T209 (IEEE, 2017).
Lee, K. & Kang, S. H. Development of embedded STT-MRAM for mobile system-on-chips. IEEE Trans. Magn. 47, 131–136 (2010).
Lee, K. et al. 22-nm FD-SOI embedded MRAM with full solder reflow compatibility and enhanced magnetic immunity. In IEEE Symp. VLSI Technology (ed. Khare, M.) 183–184 (IEEE, 2018).
Antonyan, A., Pyo, S., Jung, H. & Song, T. Embedded MRAM macro for eFlash replacement. In 2018 IEEE Int. Symp. Circuits and Systems (ISCAS) (eds Maloberti, F. & Setti, G.) (IEEE, 2018).
Naik, V. et al. JEDEC-qualified highly reliable 22nm FD-SOI embedded MRAM for low-power industrial-grade, and extended performance towards automotive-grade-1 applications. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Datta, S.) 219–222 (IEEE, 2020).
Naik, V. et al. Manufacturable 22nm FD-SOI embedded MRAM technology for industrial-grade MCU and IOT applications. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Takayanagi, M.) 26–29 (IEEE, 2019).
Alzate, J. et al. 2 MB array-level demonstration of STT-MRAM process and performance towards L4 Cache applications. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Takayanagi, M.) 30–33 (IEEE, 2019).
Luo, Y. et al. Performance benchmarking of spin-orbit torque magnetic RAM (SOT-MRAM) for deep neural network (DNN) accelerators. In 2022 IEEE Int. Memory Workshop (IMW) (ed. Wouters, D.) (IEEE, 2022).
Garello, K. et al. SOT-MRAM 300mm integration for low power and ultrafast embedded memories. In 2018 IEEE Symp. VLSI Circuits (ed. Lehmann, G.) 81–82 (IEEE, 2018).
Peng, X., Huang, S., Luo, Y., Sun, X. & Yu, S. DNN+NeuroSim: an end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Takayanagi, M.) 771–774 (IEEE, 2019).
Lu, A., Peng, X., Li, W., Jiang, H. & Yu, S. NeuroSim simulator for compute-in-memory hardware accelerator: validation and benchmark. Front. Artif. Intell. 4, 659060 (2021).
Trentzsch, M. et al. A 28nm HKMG super low power embedded NVM technology based on ferroelectric FETs. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Fay, P.) 294–297 (IEEE, 2016).
Müller, S. et al. Development status of gate-first FeFET technology. In 2021 Symp. VLSI Technology (ed. Yamakawa, S.) TFS1–5 (IEEE, 2021).
Okuno, J. et al. SoC compatible 1T1C FeRAM memory array based on ferroelectric Hf0.5Zr0.5O2. In IEEE Symp. VLSI Technology (eds Chang, C.-P. & Chang, K.) TF2.1 (IEEE, 2020).
Francois, T. et al. Demonstration of BEOL-compatible ferroelectric Hf0.5Zr0.5O2 scaled FeRAM co-integrated with 130nm CMOS for embedded NVM applications. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Takayanagi, M.) 362–365 (IEEE, 2019).
Francois, T. et al. 16kbit HfO2:Si-based 1T-1C FeRAM arrays demonstrating high performance operation and solder reflow compatibility. In IEEE Int. Electron Devices Meeting (IEDM) (ed. Grasser, T.) 697–700 (IEEE, 2021).
Lin, Y.-D. et al. Highly reliable, scalable, and high-yield HfZrOx FRAM by barrier layer engineering and post-metal annealing. In Int. Electron Devices Meeting (IEDM) (ed. De Salvo, B.) 747–750 (IEEE, 2022).
Yang, J. et al. A 9Mb HZO-based embedded FeRAM with 1012-cycle endurance and 5/7ns read/write using ECC-assisted data refresh and offset-canceled sense amplifier. In 2023 IEEE Int. Solid-State Circuits Conference (ISSCC) (ed. Cantatore, E.) 498–500 (IEEE, 2023).
Wang, C.-Y. et al. Reliability demonstration of reflow qualified 22nm STT-MRAM for embedded memory applications. In 2020 IEEE Symp. on VLSI Technology (eds Chang, C.-P. & Chang, K.) TM3.2 (IEEE, 2020).
Lee, K. et al. 28nm CIS-compatible embedded STT-MRAM for frame buffer memory. In 2021 IEEE Int. Electron Devices Meeting (IEDM) (ed. Grasser, T.) 27–30 (IEEE, 2021).
Chen, C.-H. et al. Reliability and magnetic immunity of reflow-capable embedded STT-MRAM in 16nm FinFET CMOS process. In 2021 Symp. VLSI Technology (ed. Yamakawa, S.) T12–1 (IEEE, 2021).
Oka, M. et al. 3D stacked CIS compatible 40nm embedded STT-MRAM for buffer memory. In 2021 Symp. VLSI Technology (ed. Yamakawa, S.) T2–5 (IEEE, 2021).
Shimoi, T. et al. A 22nm 32Mb embedded STT-MRAM macro achieving 5.9ns random read access and 5.8MB/s write throughput at up to Tj of 150 °C. In IEEE Symp. VLSI Technology and Circuits (eds. Palacios, T. & Ginsburg, B.) 134–135 (IEEE, 2022).
Johnson, R. W., Evans, J. L., Jacobsen, P., Thompson, J. R. & Christopher, M. The changing automotive environment: high-temperature electronics. IEEE Trans. Electron. Packag. Manuf. 27, 164–176 (2004).
Watson, J. & Castro, G. A review of high-temperature electronics technology and applications. J. Mater. Sci. Mater. Electron. 26, 9226–9235 (2015).
Lee, T. et al. World-most energy-efficient MRAM technology for non-volatile RAM applications. In 2022 Int. Electron Devices Meeting (IEDM) (ed. De Salvo, B.) 242–245 (IEEE, 2022).
Seo, S. M. et al. First demonstration of full integration and characterization of 4F2 1S1M cells with 45 nm of pitch and 20 nm of MTJ size. In 2022 Int. Electron Devices Meeting (IEDM) (ed. De Salvo, B.) 218–221 (IEEE, 2022).
Ikegawa, S. et al. High-speed (400MB/s) and low-BER STT-MRAM technology for industrial applications. In 2022 Int. Electron Devices Meeting (IEDM) (ed. De Salvo, B.) 230–233 (IEEE, 2022).
Lee, P.-H. et al. 33.1 A 16nm 32Mb embedded STT-MRAM with a 6ns read-access time, a 1M-cycle write endurance, 20-year retention at 150°C and MTJ-OTP solutions for magnetic immunity. In IEEE Int. Solid-State Circuits Conf. (ISSCC) (ed. Cantatore, E.) 494–496 (IEEE, 2023).
Gupta, M. et al. High-density SOT-MRAM technology and design specifications for the embedded domain at 5nm node. In 2020 IEEE Int. Electron Devices Meeting (IEDM) (ed. Datta, S.) 513–516 (IEEE, 2020).
Song, M. et al. High speed (1ns) and low voltage (1.5V) demonstration of 8Kb SOT-MRAM array. In 2022 IEEE Symp. VLSI Technology and Circuits (eds Palacios, T. & Ginsburg, B.) 377–378 (IEEE, 2022).
Doevenspeck, J. et al. SOT-MRAM based analog in-memory computing for DNN inference. In IEEE Symp. VLSI Technology (eds Chang, C.-P. & Chang, K.) JFS4.1 (IEEE, 2020).
Couet, S. et al. BEOL compatible high retention perpendicular SOT-MRAM device for SRAM replacement and machine learning. In 2021 Symp. VLSI Technology (ed. Yamakawa, S.) T11–1 (IEEE, 2021).
Acknowledgements
This work is supported in part by PRISM, one of the SRC/DARPA JUMP 2.0 Centers.
Author information
Authors and Affiliations
Contributions
S.Y., A.L., J.L. and T.-H.K. researched data for the article and contributed to discussion of content and writing. All authors reviewed and edited the manuscript before submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Electrical Engineering thanks Can Li, Zhefan Li, Giacomo Predetti and the other anonymous reviewer for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lu, A., Lee, J., Kim, TH. et al. High-speed emerging memories for AI hardware accelerators. Nat Rev Electr Eng 1, 24–34 (2024). https://doi.org/10.1038/s44287-023-00002-9
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s44287-023-00002-9
This article is cited by
-
Subnanosecond flash memory enabled by 2D-enhanced hot-carrier injection
Nature (2025)
-
Interactive wearable digital devices for blind and partially sighted people
Nature Reviews Electrical Engineering (2025)
-
A lossless and fully parallel spintronic compute-in-memory macro for artificial intelligence chips
Nature Electronics (2025)
-
Flexible self-rectifying synapse array for energy-efficient edge multiplication in electrocardiogram diagnosis
Nature Communications (2025)
-
Analog in-memory computing attention mechanism for fast and energy-efficient large language models
Nature Computational Science (2025)


