Abstract
Artificial intelligence (AI) edge devices1,2,3,4,5,6,7,8,9,10,11,12 demand high-precision energy-efficient computations, large on-chip model storage, rapid wakeup-to-response time and cost-effective foundry-ready solutions. Floating point (FP) computation provides precision exceeding that of integer (INT) formats at the cost of higher power and storage overhead. Multi-level-cell (MLC) memristor compute-in-memory (CIM)13,14,15 provides compact non-volatile storage and energy-efficient computation but is prone to accuracy loss owing to process variation. Digital static random-access memory (SRAM)-CIM16,17,18,19,20,21,22 enables lossless computation; however, storage is low as a result of large bit-cell area and model loading is required during inference. Thus, conventional approaches using homogeneous CIM architectures and computation formats impose a trade-off between efficiency, storage, wakeup latency and inference accuracy. Here we present a mixed-precision heterogeneous CIM AI edge processor, which supports the layer-granular/kernel-granular partitioning of network layers among on-chip CIM architectures (that is, memristor-CIM, SRAM-CIM and tiny-digital units) and computation number formats (INT and FP) based on sensitivity to error. This layer-granular/kernel-granular flexibility allows simultaneous optimization within the two-dimensional design space at the hardware level. The proposed hardware achieved high energy efficiency (40.91 TFLOPS W−1 for ResNet-20 with CIFAR-100 and 28.63 TFLOPS W−1 for MobileNet-v2 with ImageNet), low accuracy degradation (<0.45% for ResNet-20 with CIFAR-100 and for MobilNet-v2 with ImageNet) and rapid wakeup-to-response time (373.52 μs).
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Code availability
The codes that support the findings of this study are not available but exceptions for non-commercial use might be made on request after TSMC management approval.
References
Prabhu, K. et al. CHIMERA: a 0.92-TOPS, 2.2-TOPS/W edge AI accelerator with 2-Mbyte on-chip foundry resistive RAM for efficient training and inference. IEEE J. Solid-State Circuits 57, 1013–1026 (2022).
Jain, V. et al. TinyVers: A 0.8-17 TOPS/W, 1.7 μW-20 mW, tiny versatile system-on-chip with state-retentive eMRAM for machine learning inference at the extreme edge. In Proc. 2022 IEEE Symposium on VLSI Technology and Circuits 20–21 (IEEE, 2022).
Rossi, D. et al. 4.4 A 1.3TOPS/W @ 32GOPS fully integrated 10-core SoC for IoT end-nodes with 1.7μW cognitive wake-up from MRAM-based state-retentive sleep mode. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 60–62 (IEEE, 2021).
Ueyoshi, K. et al. DIANA: an end-to-end energy-efficient digital and analog hybrid neural network SoC. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yue, J. et al. A 28nm 16.9-300TOPS/W computing-in-memory processor supporting floating-point NN inference/training with intensive-CIM sparse-digital architecture. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2023).
Yue, J. et al. 15.2 A 2.75-to-75.9TOPS/W computing-in-memory NN processor supporting set-associate block-wise zero skipping and ping-pong CIM with simultaneous computation and weight updating. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 238–240 (IEEE, 2021).
Liu, S. et al. 16.2 A 28nm 53.8TOPS/W 8b sparse transformer accelerator with in-memory butterfly zero skipper for unstructured-pruned NN and CIM-based local-attention-reusable engine. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 250–252 (IEEE, 2023).
Tu, F. et al. 16.1 MuITCIM: a 28nm 2.24μJ/token attention-token-bit hybrid sparse digital CIM-based accelerator for multimodal transformers. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 248–250 (IEEE, 2023).
Huang, W.-H. et al. A nonvolatile Al-edge processor with 4MB SLC-MLC hybrid-mode ReRAM compute-in-memory macro and 51.4-251 TOPS/W. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 15–17 (IEEE, 2023).
Wen, T.-H. et al. A 28nm nonvolatile AI edge processor using 4Mb analog-based near-memory-compute ReRAM with 27.2 TOPS/W for tiny AI edge devices. In Proc. 2023 IEEE Symposium on VLSI Technology and Circuits 1–2 (IEEE, 2023).
Wen, T.-H. et al. Fusion of memristor and digital compute-in-memory processing for energy-efficient edge computing. Science 384, 325–332 (2024).
Lele, A. S. et al. A heterogeneous RRAM in-memory and SRAM near-memory SoC for fused frame and event-based target identification and tracking. IEEE J. Solid-State Circuits 59, 52–64 (2024).
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
Khwa, W.-S. et al. A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5-65.0TOPS/W for tiny-Al edge devices. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Wu, P.-C. et al. A 22nm 832Kb hybrid-domain floating-point SRAM in-memory-compute macro with 16.2-70.2TFLOPS/W for high-accuracy AI-edge devices. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 126–128 (IEEE, 2023).
Guo, A. et al. A 28-nm 64-kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs. IEEE J. Solid-State Circuits 59, 3032–3044 (2024).
Sinangil, M. E. et al. A 7-nm compute-in-memory SRAM macro supporting multi-bit input, weight and output and achieving 351 TOPS/W and 372.4 GOPS. IEEE J. Solid-State Circuits 56, 188–198 (2021).
Mori, H. et al. A 4nm 6163-TOPS/W/b 4790−TOPS/mm2/b SRAM based digital-computing-in-memory macro supporting bit-width flexibility and simultaneous MAC and weight update. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 132–134 (IEEE, 2023).
Chih, Y.-D. et al. 16.4 An 89TOPS/W and 16.3TOPS/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 252–254 (IEEE, 2021).
Fujiwara, H. et al. A 5-nm 254-TOPS/W 221-TOPS/mm2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous MAC and write operations. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Lee, C.-F. et al. A 12nm 121-TOPS/W 41.6-TOPS/mm2 all digital full precision SRAM-based compute-in-memory with configurable bit-width for AI edge applications. In Proc. 2022 IEEE Symposium on VLSI Technology and Circuits 24–25 (IEEE, 2022).
Chiu, Y. C. et al. A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices. Nat. Electron. 6, 534–543 (2023).
Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601, 211–216 (2022).
Wen, T.-H. et al. 34.8 A 22nm 16Mb floating-point ReRAM compute-in-memory macro with 31.2TFLOPS/W for AI edge devices. In Proc. 2024 IEEE International Solid-State Circuits Conference (ISSCC) 580–582 (IEEE, 2024).
Chang, M. et al. A 40nm 60.64TOPS/W ECC-capable compute-in-memory/digital 2.25MB/768KB RRAM/SRAM system with embedded cortex M3 microprocessor for edge recommendation systems. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Hung, J. M. et al. A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat. Electron. 4, 921–930 (2021).
Hung, J.-M. 8-b precision 8-Mb ReRAM compute-in-memory macro using direct-current-free time-domain readout scheme for AI edge devices. IEEE J. Solid-State Circuits 58, 303–315 (2022).
Chiu, Y.-C. et al. A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB/s read-and-decryption bandwidth and 25.1-55.1TOPS/W 8b MAC for AI operations. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 178–180 (IEEE, 2022).
Spetalnick, S. D. et al. A 40nm 64kb 26.56TOPS/W 2.37Mb/mm2 RRAM binary/compute-in-memory macro with 4.23x improvement in density and >75% use of sensing dynamic range. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yoon, J.-H. et al. A 40nm 100Kb 118.44TOPS/W ternary-weight compute-in-memory RRAM macro with voltage-sensing read and write verification for reliable multi-bit RRAM operation. In Proc. 2021 IEEE Custom Integrated Circuits Conference (CICC) 1–2 (IEEE, 2022).
Khwa, W. S. et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. In Proc. 2021 Symposium on VLSI Technology 1–2 (IEEE, 2021).
Xue, C.-X. et al. 15.4 A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 244–246 (IEEE, 2020).
Xue, C.-X. et al. 24.1 A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors. In Proc. 2019 IEEE International Solid-State Circuits Conference (ISSCC) 388–390 (IEEE, 2019).
Xue, C.-X. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat. Electron. 4, 81–90 (2020).
Chen, W.-H. et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In Proc. 2018 IEEE International Solid-State Circuits Conference (ISSCC) 494–496 (IEEE, 2018).
Chen, W. H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat. Electron. 2, 420–428 (2019).
Mochida, R. et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In Proc. 2018 IEEE Symposium on VLSI Technology 175–176 (IEEE, 2018).
Wan, W. et al. 33.1 A 74 TMACS/W CMOS-RRAM neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 498–500 (IEEE, 2020).
Liu, Q. et al. 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 500–502 (IEEE, 2020).
Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).
Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).
Ielmini, D. & Wong, H. S. P. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Chou, C.-C. et al. A 22nm 96KX144 RRAM macro with a self-tracking reference and a low ripple charge pump to achieve a configurable read window and a wide operating voltage range. In Proc. 2020 IEEE Symposium on VLSI Circuits 1–2 (IEEE, 2020).
Boybat, I. et al. Neuromorphic computing with multi-memristive synapses. Nat. Commun. 9, 2514 (2018).
Le Gallo, M. et al. Mixed-precision in-memory computing. Nat. Electron. 1, 246–253 (2018).
Qu, Z., Zhou, Z., Cheng, Y. & Thiele, L. Adaptive loss-aware quantization for multi-bit networks. In Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 7985–7994 (Computer Vision Foundation, 2020).
Mishra, A. & Marr, D. Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In Proc. Sixth International Conference on Learning Representations (ICLR, 2018).
Chu, T. et al. Mixed-precision quantized neural networks with progressively decreasing bitwidth. Pattern Recognit. 111, 107647 (2021).
Lecun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, Univ. Toronto (2009).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Houshmand, P. et al. Opportunities and limitations of emerging analog in-memory compute DNN architectures. In Proc. 2020 IEEE International Electron Devices Meeting (IEDM) 29.1.1–29.1.4 (IEEE, 2020).
Shamsoshoara, A. et al. The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs). IEEE DataPort (2020).
Warden, P. Speech commands: a dataset for limited-vocabulary speech recognition. Preprint at https://arxiv.org/abs/1804.03209 (2018).
Chowdhery, A. et al. Visual wake words dataset. Preprint at https://arxiv.org/abs/1906.05721 (2019).
Markus, N. et al. A white paper on neural network quantization. Preprint at https://arxiv.org/abs/2106.08295 (2021).
Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) 2704–2713 (Computer Vision Foundation, 2018).
Sun, S., Bai, J., Shi, Z., Zhao, W. & Kang, W. CIM2PQ: an arraywise and hardware-friendly mixed precision quantization method for analog computing-in-memory. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 43, 2084–2097 (2024).
Chen, Y.-W. et al. SUN: dynamic hybrid-precision SRAM-based CIM accelerator with high macro utilization using structured pruning mixed-precision networks. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 43, 2163–2176 (2024).
Agrawal, A. et al. A 7nm 4-core AI chip with 25.6TFLOPS hybrid FP8 training, 102.4TOPS INT4 inference and workload-aware throttling. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 144–145 (IEEE, 2021).
Wang, J. et al. A compute SRAM with bit-serial integer/floating-point operations for programmable in-memory vector acceleration. In Proc. 2019 IEEE International Solid-State Circuits Conference (ISSCC) 224–225 (IEEE, 2019).
Acknowledgements
We acknowledge support from National Tsing Hua University (NTHU), TSMC Corporate Research (TSMC-CR), TSMC Design Technology Platform (TSMC-DTP), TSMC More-than-Moore Technologies (TSMC-MtM), TSMC-NTHU Joint Developed Project (JDP) and the National Science and Technology Council (NSTC) of Taiwan. We also acknowledge contributions from TSMC colleagues H.-S. P. Wong, H. Chuang, W. T. Chu and K. C. Huang.
Author information
Authors and Affiliations
Contributions
W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W., W.-T.H. and M.-F.C. designed the hybrid-mode mix-CIM AI edge processor and test chip. W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W., W.-T.H., C.-C.L., R.-S.L., C.-C.H., K.-T.T., M.-S.H., A.S.L. and M.-F.C. contributed ideas. W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W., W.-T.H. and S.-H.T. built the test measurement system and testing flow for the hybrid-mode mix-CIM AI edge processor. W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W. and W.-T.H. performed analysis and measurements of the hybrid-mode mix-CIM AI edge processor. C.-C.C., Y.-D.C., T.-Y.J.C. and M.-F.C. managed the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Yiyu Shi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Experiment platform and timing-extraction circuit.
a, Experiment platform used to assess the proposed memristor-SRAM CIM-fusion processor, including data generator, logic analyser, oscilloscope, power supply and analogue voltage supply. b, On-chip timing-extraction circuit and experiment methods used to measure wakeup-to-response latency. c, Illustration of the signal involving measurement of the wakeup-to-response latency. d, Measured waveform indicating wakeup-to-response latency using a ResNet-20 model trained for the FLAME dataset.
Extended Data Fig. 2 Trade-off between accuracy and energy efficiency under various input and weight formats.
a, Measured energy efficiency under various input and weight formats using a ResNet-20 model trained for CIFAR-100. b, Inference accuracy using a ResNet-20 model trained for CIFAR-100 versus software baseline under FP16. c, FoM performance as a function of input and weight formats. FoM = energy efficiency/inference accuracy degradation.
Extended Data Fig. 3 Trade-off between accuracy and energy efficiency under various mix-CIM configurations.
a, Breakdown of CIM architectures (memristor-CIM, SRAM-CIM and tiny-digital unit). b, Measured energy efficiency and inference accuracy degradation of various computing architectures using a ResNet-20 model trained for CIFAR-100. c, Normalized deviation in the output value of each layer in a ResNet-20 model trained for the CIFAR-100 dataset. d, FoM as a function of mix-CIM configuration. FoM = energy efficiency/inference accuracy degradation.
Extended Data Fig. 4 Implementation of memristor-CIM and SRAM-CIM.
a, The memristor-CIM performs partial dot-product operations in the cell array by inducing cell current in the memristor device and accumulating cell current on the BL. The 5-bit-resolution ADC converts BL current from the analogue domain to the digital domain and the accumulator combines the results from the weight MSB column to the LSB column across input cycles to generate DPVs. b, Implementation of SRAM-CIM using a mux-based compute unit for dot-product operations with no accuracy loss.
Extended Data Fig. 5 Illustration of the proposed 2D-PVA-DA scheme.
a, Conventional memristor-CIM using a consistent number of accumulations (the number of WLs that are turned on in a single cycle) across various input place values and weight place values. b, The proposed 2D-PVA-DA adjusts the number of accumulations according to the place value of inputs and weights to increase the overall number of accumulations. c, This scheme reduced the operation cycle counts by 42% in memristor-CIM.
Extended Data Fig. 6 Layer-wise configurations for various applications and measurement results demonstrating the efficacy of the proposed AI edge processor when applied to keyword spotting and visual wake word detection.
a, Layer-wise configuration of the ResNet-20 model trained for CIFAR-100 using hybrid mode and mix-CIM. b, Layer-wise configuration of the MobileNet-v2 model trained for ImageNet using hybrid mode and mix-CIM. c, Breakdown of computation types (INT and FP) on the left axis and CIM architectures (memristor-CIM, SRAM-CIM and tiny-digital unit) on the right axis. d, Inference accuracy and energy efficiency across use scenarios.
Extended Data Fig. 7 Comparison of proposed 2DQA scheme versus previous works.
a, Processing of the pretrained model by the proposed 2DQA scheme. b, Error injection was used to obtain concise guidelines by which to determine the sensitivity of each layer for use in deriving a usable configuration without having to assess all possible combinations iteratively. c, 2DQA scheme outperformed all previous mixed-precision architectures in terms of inference accuracy when applied to the complex ImageNet dataset. Data from refs. 11,47,60,61.
Extended Data Fig. 8 Illustration of pre-alignment process for FP inputs and weights and comparison of alignment methods for FP operations.
a, During the pre-alignment process, the FP format is converted to the fixed-point format to facilitate subsequent processing. b, Comparison of different alignment methods: product alignment, input alignment, input-wise and layer-wise weight separate alignment and the proposed input-wise and kernel-wise weight separate alignment. c, Simulated energy consumption by memristor-CIM macro under various alignment methods. d, Inference accuracy as a function of bit width under various alignment methods.
Extended Data Fig. 9 Characteristics and specifications of the proposed memristor device.
The characteristics of the proposed memristor, including the underlying technology, memristor cell size, set/reset voltages, read voltage, normalized resistance state of weights, normalized conductance state of weights and dimensions of memristor banks.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khwa, WS., Wen, TH., Hsu, HH. et al. A mixed-precision memristor and SRAM compute-in-memory AI processor. Nature 639, 617–623 (2025). https://doi.org/10.1038/s41586-025-08639-2
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41586-025-08639-2
This article is cited by
-
High-concurrency tri-mode memristor-based ordinary differential equation solver
Nature Communications (2026)
-
In-memory analog computing for non-negative matrix factorization
Nature Communications (2026)
-
Compute-in-memory implementation of state space models for event sequence processing
Nature Communications (2026)
-
Robust optoelectronic dual-mode memristor enabled by ZnO/MoS2 heterojunction for synaptic bionics and in-memory computing
Science China Information Sciences (2026)
-
Control and cryptographic application of a memristor-induced second-order discrete map
International Journal of Dynamics and Control (2026)


