A mixed-precision memristor and SRAM compute-in-memory AI processor

Khwa, Win-San; Wen, Tai-Hao; Hsu, Hung-Hsi; Huang, Wei-Hsing; Chang, Yu-Chen; Chiu, Ting-Chien; Ke, Zhao-En; Chin, Yu-Hsiang; Wen, Hua-Jin; Hsu, Wei-Ting; Lo, Chung-Chuan; Liu, Ren-Shuo; Hsieh, Chih-Cheng; Tang, Kea-Tiong; Ho, Mon-Shu; Lele, Ashwin Sanjay; Teng, Shih-Hsin; Chou, Chung-Cheng; Chih, Yu-Der; Chang, Tsung-Yung Jonathan; Chang, Meng-Fan

doi:10.1038/s41586-025-08639-2

Article
Published: 05 March 2025

A mixed-precision memristor and SRAM compute-in-memory AI processor

Win-San Khwa¹^na1,
Tai-Hao Wen²^na1,
Hung-Hsi Hsu^1,2^na1,
Wei-Hsing Huang²,
Yu-Chen Chang²,
Ting-Chien Chiu²,
Zhao-En Ke²,
Yu-Hsiang Chin²,
Hua-Jin Wen²,
Wei-Ting Hsu²,
Chung-Chuan Lo³,
Ren-Shuo Liu²,
Chih-Cheng Hsieh²,
Kea-Tiong Tang²,
Mon-Shu Ho⁴,
Ashwin Sanjay Lele⁵,
Shih-Hsin Teng¹,
Chung-Cheng Chou¹,
Yu-Der Chih¹,
Tsung-Yung Jonathan Chang¹ &
…
Meng-Fan Chang ORCID: orcid.org/0000-0001-6905-6350^1,2

Nature volume 639, pages 617–623 (2025)Cite this article

11k Accesses
42 Citations
11 Altmetric
Metrics details

Subjects

Abstract

Artificial intelligence (AI) edge devices^{1,2,3,4,5,6,7,8,9,10,11,12} demand high-precision energy-efficient computations, large on-chip model storage, rapid wakeup-to-response time and cost-effective foundry-ready solutions. Floating point (FP) computation provides precision exceeding that of integer (INT) formats at the cost of higher power and storage overhead. Multi-level-cell (MLC) memristor compute-in-memory (CIM)^13,14,15 provides compact non-volatile storage and energy-efficient computation but is prone to accuracy loss owing to process variation. Digital static random-access memory (SRAM)-CIM^{16,17,18,19,20,21,22} enables lossless computation; however, storage is low as a result of large bit-cell area and model loading is required during inference. Thus, conventional approaches using homogeneous CIM architectures and computation formats impose a trade-off between efficiency, storage, wakeup latency and inference accuracy. Here we present a mixed-precision heterogeneous CIM AI edge processor, which supports the layer-granular/kernel-granular partitioning of network layers among on-chip CIM architectures (that is, memristor-CIM, SRAM-CIM and tiny-digital units) and computation number formats (INT and FP) based on sensitivity to error. This layer-granular/kernel-granular flexibility allows simultaneous optimization within the two-dimensional design space at the hardware level. The proposed hardware achieved high energy efficiency (40.91 TFLOPS W⁻¹ for ResNet-20 with CIFAR-100 and 28.63 TFLOPS W⁻¹ for MobileNet-v2 with ImageNet), low accuracy degradation (<0.45% for ResNet-20 with CIFAR-100 and for MobilNet-v2 with ImageNet) and rapid wakeup-to-response time (373.52 μs).

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of proposed heterogeneous INT–FP hybrid-mode and memristor-SRAM-digital mix-CIM AI edge processor.**

**Fig. 2: Overview of the proposed layer-based INT–FP hybrid-mode controller and hybrid-mode implementation.**

**Fig. 3: Overview of the proposed memristor-SRAM-digital mix-CIM structure.**

**Fig. 4: Measurement results and demonstration of the proposed AI edge device.**

A compute-in-memory chip based on resistive random-access memory

Article Open access 17 August 2022

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

Article 10 August 2023

An energy efficient processor array and memory controller for accurate processing of convolutional neural network-based inference engines

Article Open access 12 November 2025

Data availability

The datasets that we used for benchmarking are publicly available in refs. ^{51,52,53,55,56,57}. Other data that support the findings of this study can be made available by the corresponding author on request after TSMC management approval.

Code availability

The codes that support the findings of this study are not available but exceptions for non-commercial use might be made on request after TSMC management approval.

References

Prabhu, K. et al. CHIMERA: a 0.92-TOPS, 2.2-TOPS/W edge AI accelerator with 2-Mbyte on-chip foundry resistive RAM for efficient training and inference. IEEE J. Solid-State Circuits 57, 1013–1026 (2022).
Article ADS MATH Google Scholar
Jain, V. et al. TinyVers: A 0.8-17 TOPS/W, 1.7 μW-20 mW, tiny versatile system-on-chip with state-retentive eMRAM for machine learning inference at the extreme edge. In Proc. 2022 IEEE Symposium on VLSI Technology and Circuits 20–21 (IEEE, 2022).
Rossi, D. et al. 4.4 A 1.3TOPS/W @ 32GOPS fully integrated 10-core SoC for IoT end-nodes with 1.7μW cognitive wake-up from MRAM-based state-retentive sleep mode. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 60–62 (IEEE, 2021).
Ueyoshi, K. et al. DIANA: an end-to-end energy-efficient digital and analog hybrid neural network SoC. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yue, J. et al. A 28nm 16.9-300TOPS/W computing-in-memory processor supporting floating-point NN inference/training with intensive-CIM sparse-digital architecture. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2023).
Yue, J. et al. 15.2 A 2.75-to-75.9TOPS/W computing-in-memory NN processor supporting set-associate block-wise zero skipping and ping-pong CIM with simultaneous computation and weight updating. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 238–240 (IEEE, 2021).
Liu, S. et al. 16.2 A 28nm 53.8TOPS/W 8b sparse transformer accelerator with in-memory butterfly zero skipper for unstructured-pruned NN and CIM-based local-attention-reusable engine. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 250–252 (IEEE, 2023).
Tu, F. et al. 16.1 MuITCIM: a 28nm 2.24μJ/token attention-token-bit hybrid sparse digital CIM-based accelerator for multimodal transformers. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 248–250 (IEEE, 2023).
Huang, W.-H. et al. A nonvolatile Al-edge processor with 4MB SLC-MLC hybrid-mode ReRAM compute-in-memory macro and 51.4-251 TOPS/W. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 15–17 (IEEE, 2023).
Wen, T.-H. et al. A 28nm nonvolatile AI edge processor using 4Mb analog-based near-memory-compute ReRAM with 27.2 TOPS/W for tiny AI edge devices. In Proc. 2023 IEEE Symposium on VLSI Technology and Circuits 1–2 (IEEE, 2023).
Wen, T.-H. et al. Fusion of memristor and digital compute-in-memory processing for energy-efficient edge computing. Science 384, 325–332 (2024).
Article ADS CAS PubMed MATH Google Scholar
Lele, A. S. et al. A heterogeneous RRAM in-memory and SRAM near-memory SoC for fused frame and event-based target identification and tracking. IEEE J. Solid-State Circuits 59, 52–64 (2024).
Article ADS MATH Google Scholar
Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60–67 (2018).
Article ADS CAS PubMed Google Scholar
Khwa, W.-S. et al. A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5-65.0TOPS/W for tiny-Al edge devices. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
Article ADS CAS PubMed MATH Google Scholar
Wu, P.-C. et al. A 22nm 832Kb hybrid-domain floating-point SRAM in-memory-compute macro with 16.2-70.2TFLOPS/W for high-accuracy AI-edge devices. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 126–128 (IEEE, 2023).
Guo, A. et al. A 28-nm 64-kb 31.6-TFLOPS/W digital-domain floating-point-computing-unit and double-bit 6T-SRAM computing-in-memory macro for floating-point CNNs. IEEE J. Solid-State Circuits 59, 3032–3044 (2024).
Article MATH Google Scholar
Sinangil, M. E. et al. A 7-nm compute-in-memory SRAM macro supporting multi-bit input, weight and output and achieving 351 TOPS/W and 372.4 GOPS. IEEE J. Solid-State Circuits 56, 188–198 (2021).
Article ADS Google Scholar
Mori, H. et al. A 4nm 6163-TOPS/W/b 4790−TOPS/mm²/b SRAM based digital-computing-in-memory macro supporting bit-width flexibility and simultaneous MAC and weight update. In Proc. 2023 IEEE International Solid-State Circuits Conference (ISSCC) 132–134 (IEEE, 2023).
Chih, Y.-D. et al. 16.4 An 89TOPS/W and 16.3TOPS/mm² all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 252–254 (IEEE, 2021).
Fujiwara, H. et al. A 5-nm 254-TOPS/W 221-TOPS/mm² fully-digital computing-in-memory macro supporting wide-range dynamic-voltage-frequency scaling and simultaneous MAC and write operations. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Lee, C.-F. et al. A 12nm 121-TOPS/W 41.6-TOPS/mm² all digital full precision SRAM-based compute-in-memory with configurable bit-width for AI edge applications. In Proc. 2022 IEEE Symposium on VLSI Technology and Circuits 24–25 (IEEE, 2022).
Chiu, Y. C. et al. A CMOS-integrated spintronic compute-in-memory macro for secure AI edge devices. Nat. Electron. 6, 534–543 (2023).
Article MATH Google Scholar
Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601, 211–216 (2022).
Article ADS CAS PubMed MATH Google Scholar
Wen, T.-H. et al. 34.8 A 22nm 16Mb floating-point ReRAM compute-in-memory macro with 31.2TFLOPS/W for AI edge devices. In Proc. 2024 IEEE International Solid-State Circuits Conference (ISSCC) 580–582 (IEEE, 2024).
Chang, M. et al. A 40nm 60.64TOPS/W ECC-capable compute-in-memory/digital 2.25MB/768KB RRAM/SRAM system with embedded cortex M3 microprocessor for edge recommendation systems. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Hung, J. M. et al. A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat. Electron. 4, 921–930 (2021).
Article MATH Google Scholar
Hung, J.-M. 8-b precision 8-Mb ReRAM compute-in-memory macro using direct-current-free time-domain readout scheme for AI edge devices. IEEE J. Solid-State Circuits 58, 303–315 (2022).
Article ADS MATH Google Scholar
Chiu, Y.-C. et al. A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB/s read-and-decryption bandwidth and 25.1-55.1TOPS/W 8b MAC for AI operations. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 178–180 (IEEE, 2022).
Spetalnick, S. D. et al. A 40nm 64kb 26.56TOPS/W 2.37Mb/mm² RRAM binary/compute-in-memory macro with 4.23x improvement in density and >75% use of sensing dynamic range. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).
Yoon, J.-H. et al. A 40nm 100Kb 118.44TOPS/W ternary-weight compute-in-memory RRAM macro with voltage-sensing read and write verification for reliable multi-bit RRAM operation. In Proc. 2021 IEEE Custom Integrated Circuits Conference (CICC) 1–2 (IEEE, 2022).
Khwa, W. S. et al. MLC PCM techniques to improve nerual network inference retention time by 105X and reduce accuracy degradation by 10.8X. In Proc. 2021 Symposium on VLSI Technology 1–2 (IEEE, 2021).
Xue, C.-X. et al. 15.4 A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 244–246 (IEEE, 2020).
Xue, C.-X. et al. 24.1 A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors. In Proc. 2019 IEEE International Solid-State Circuits Conference (ISSCC) 388–390 (IEEE, 2019).
Xue, C.-X. et al. A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices. Nat. Electron. 4, 81–90 (2020).
Article MATH Google Scholar
Chen, W.-H. et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In Proc. 2018 IEEE International Solid-State Circuits Conference (ISSCC) 494–496 (IEEE, 2018).
Chen, W. H. et al. CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors. Nat. Electron. 2, 420–428 (2019).
Article CAS MATH Google Scholar
Mochida, R. et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture. In Proc. 2018 IEEE Symposium on VLSI Technology 175–176 (IEEE, 2018).
Wan, W. et al. 33.1 A 74 TMACS/W CMOS-RRAM neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 498–500 (IEEE, 2020).
Liu, Q. et al. 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing. In Proc. 2020 IEEE International Solid-State Circuits Conference (ISSCC) 500–502 (IEEE, 2020).
Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).
Article CAS MATH Google Scholar
Li, C. et al. Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).
Article ADS MATH Google Scholar
Ielmini, D. & Wong, H. S. P. In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
Article MATH Google Scholar
Chou, C.-C. et al. A 22nm 96KX144 RRAM macro with a self-tracking reference and a low ripple charge pump to achieve a configurable read window and a wide operating voltage range. In Proc. 2020 IEEE Symposium on VLSI Circuits 1–2 (IEEE, 2020).
Boybat, I. et al. Neuromorphic computing with multi-memristive synapses. Nat. Commun. 9, 2514 (2018).
Article ADS PubMed PubMed Central MATH Google Scholar
Le Gallo, M. et al. Mixed-precision in-memory computing. Nat. Electron. 1, 246–253 (2018).
Article MATH Google Scholar
Qu, Z., Zhou, Z., Cheng, Y. & Thiele, L. Adaptive loss-aware quantization for multi-bit networks. In Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 7985–7994 (Computer Vision Foundation, 2020).
Mishra, A. & Marr, D. Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. In Proc. Sixth International Conference on Learning Representations (ICLR, 2018).
Chu, T. et al. Mixed-precision quantized neural networks with progressively decreasing bitwidth. Pattern Recognit. 111, 107647 (2021).
Article MATH Google Scholar
Lecun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article MATH Google Scholar
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, Univ. Toronto (2009).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Houshmand, P. et al. Opportunities and limitations of emerging analog in-memory compute DNN architectures. In Proc. 2020 IEEE International Electron Devices Meeting (IEDM) 29.1.1–29.1.4 (IEEE, 2020).
Shamsoshoara, A. et al. The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs). IEEE DataPort (2020).
Warden, P. Speech commands: a dataset for limited-vocabulary speech recognition. Preprint at https://arxiv.org/abs/1804.03209 (2018).
Chowdhery, A. et al. Visual wake words dataset. Preprint at https://arxiv.org/abs/1906.05721 (2019).
Markus, N. et al. A white paper on neural network quantization. Preprint at https://arxiv.org/abs/2106.08295 (2021).
Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. Conference on Computer Vision and Pattern Recognition (CVPR) 2704–2713 (Computer Vision Foundation, 2018).
Sun, S., Bai, J., Shi, Z., Zhao, W. & Kang, W. CIM²PQ: an arraywise and hardware-friendly mixed precision quantization method for analog computing-in-memory. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 43, 2084–2097 (2024).
Article Google Scholar
Chen, Y.-W. et al. SUN: dynamic hybrid-precision SRAM-based CIM accelerator with high macro utilization using structured pruning mixed-precision networks. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 43, 2163–2176 (2024).
Article MATH Google Scholar
Agrawal, A. et al. A 7nm 4-core AI chip with 25.6TFLOPS hybrid FP8 training, 102.4TOPS INT4 inference and workload-aware throttling. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) 144–145 (IEEE, 2021).
Wang, J. et al. A compute SRAM with bit-serial integer/floating-point operations for programmable in-memory vector acceleration. In Proc. 2019 IEEE International Solid-State Circuits Conference (ISSCC) 224–225 (IEEE, 2019).

Download references

Acknowledgements

We acknowledge support from National Tsing Hua University (NTHU), TSMC Corporate Research (TSMC-CR), TSMC Design Technology Platform (TSMC-DTP), TSMC More-than-Moore Technologies (TSMC-MtM), TSMC-NTHU Joint Developed Project (JDP) and the National Science and Technology Council (NSTC) of Taiwan. We also acknowledge contributions from TSMC colleagues H.-S. P. Wong, H. Chuang, W. T. Chu and K. C. Huang.

Author information

These authors contributed equally: Win-San Khwa, Tai-Hao Wen, Hung-Hsi Hsu

Authors and Affiliations

Taiwan Semiconductor Manufacturing Company Limited (TSMC), Hsinchu, Taiwan, Republic of China
Win-San Khwa, Hung-Hsi Hsu, Shih-Hsin Teng, Chung-Cheng Chou, Yu-Der Chih, Tsung-Yung Jonathan Chang & Meng-Fan Chang
Department of Electrical Engineering, National Tsing Hua University (NTHU), Hsinchu, Taiwan, Republic of China
Tai-Hao Wen, Hung-Hsi Hsu, Wei-Hsing Huang, Yu-Chen Chang, Ting-Chien Chiu, Zhao-En Ke, Yu-Hsiang Chin, Hua-Jin Wen, Wei-Ting Hsu, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang & Meng-Fan Chang
Department of Life Science, National Tsing Hua University (NTHU), Hsinchu, Taiwan, Republic of China
Chung-Chuan Lo
Department of Physics, National Chung Hsing University (NCHU), Taichung, Taiwan, Republic of China
Mon-Shu Ho
Taiwan Semiconductor Manufacturing Company Limited (TSMC), San Jose, CA, USA
Ashwin Sanjay Lele

Authors

Win-San Khwa
View author publications
Search author on:PubMed Google Scholar
Tai-Hao Wen
View author publications
Search author on:PubMed Google Scholar
Hung-Hsi Hsu
View author publications
Search author on:PubMed Google Scholar
Wei-Hsing Huang
View author publications
Search author on:PubMed Google Scholar
Yu-Chen Chang
View author publications
Search author on:PubMed Google Scholar
Ting-Chien Chiu
View author publications
Search author on:PubMed Google Scholar
Zhao-En Ke
View author publications
Search author on:PubMed Google Scholar
Yu-Hsiang Chin
View author publications
Search author on:PubMed Google Scholar
Hua-Jin Wen
View author publications
Search author on:PubMed Google Scholar
Wei-Ting Hsu
View author publications
Search author on:PubMed Google Scholar
Chung-Chuan Lo
View author publications
Search author on:PubMed Google Scholar
Ren-Shuo Liu
View author publications
Search author on:PubMed Google Scholar
Chih-Cheng Hsieh
View author publications
Search author on:PubMed Google Scholar
Kea-Tiong Tang
View author publications
Search author on:PubMed Google Scholar
Mon-Shu Ho
View author publications
Search author on:PubMed Google Scholar
Ashwin Sanjay Lele
View author publications
Search author on:PubMed Google Scholar
Shih-Hsin Teng
View author publications
Search author on:PubMed Google Scholar
Chung-Cheng Chou
View author publications
Search author on:PubMed Google Scholar
Yu-Der Chih
View author publications
Search author on:PubMed Google Scholar
Tsung-Yung Jonathan Chang
View author publications
Search author on:PubMed Google Scholar
Meng-Fan Chang
View author publications
Search author on:PubMed Google Scholar

Contributions

W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W., W.-T.H. and M.-F.C. designed the hybrid-mode mix-CIM AI edge processor and test chip. W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W., W.-T.H., C.-C.L., R.-S.L., C.-C.H., K.-T.T., M.-S.H., A.S.L. and M.-F.C. contributed ideas. W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W., W.-T.H. and S.-H.T. built the test measurement system and testing flow for the hybrid-mode mix-CIM AI edge processor. W.-S.K., T.-H.W., H.-H.H., W.-H.H., Y.-C.C., T.-C.C., Z.-E.K., Y.-H.C., H.-J.W. and W.-T.H. performed analysis and measurements of the hybrid-mode mix-CIM AI edge processor. C.-C.C., Y.-D.C., T.-Y.J.C. and M.-F.C. managed the project.

Corresponding author

Correspondence to Meng-Fan Chang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Yiyu Shi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Experiment platform and timing-extraction circuit.

a, Experiment platform used to assess the proposed memristor-SRAM CIM-fusion processor, including data generator, logic analyser, oscilloscope, power supply and analogue voltage supply. b, On-chip timing-extraction circuit and experiment methods used to measure wakeup-to-response latency. c, Illustration of the signal involving measurement of the wakeup-to-response latency. d, Measured waveform indicating wakeup-to-response latency using a ResNet-20 model trained for the FLAME dataset.

Extended Data Fig. 2 Trade-off between accuracy and energy efficiency under various input and weight formats.

a, Measured energy efficiency under various input and weight formats using a ResNet-20 model trained for CIFAR-100. b, Inference accuracy using a ResNet-20 model trained for CIFAR-100 versus software baseline under FP16. c, FoM performance as a function of input and weight formats. FoM = energy efficiency/inference accuracy degradation.

Extended Data Fig. 3 Trade-off between accuracy and energy efficiency under various mix-CIM configurations.

a, Breakdown of CIM architectures (memristor-CIM, SRAM-CIM and tiny-digital unit). b, Measured energy efficiency and inference accuracy degradation of various computing architectures using a ResNet-20 model trained for CIFAR-100. c, Normalized deviation in the output value of each layer in a ResNet-20 model trained for the CIFAR-100 dataset. d, FoM as a function of mix-CIM configuration. FoM = energy efficiency/inference accuracy degradation.

Extended Data Fig. 4 Implementation of memristor-CIM and SRAM-CIM.

a, The memristor-CIM performs partial dot-product operations in the cell array by inducing cell current in the memristor device and accumulating cell current on the BL. The 5-bit-resolution ADC converts BL current from the analogue domain to the digital domain and the accumulator combines the results from the weight MSB column to the LSB column across input cycles to generate DPVs. b, Implementation of SRAM-CIM using a mux-based compute unit for dot-product operations with no accuracy loss.

Extended Data Fig. 5 Illustration of the proposed 2D-PVA-DA scheme.

a, Conventional memristor-CIM using a consistent number of accumulations (the number of WLs that are turned on in a single cycle) across various input place values and weight place values. b, The proposed 2D-PVA-DA adjusts the number of accumulations according to the place value of inputs and weights to increase the overall number of accumulations. c, This scheme reduced the operation cycle counts by 42% in memristor-CIM.

Extended Data Fig. 6 Layer-wise configurations for various applications and measurement results demonstrating the efficacy of the proposed AI edge processor when applied to keyword spotting and visual wake word detection.

a, Layer-wise configuration of the ResNet-20 model trained for CIFAR-100 using hybrid mode and mix-CIM. b, Layer-wise configuration of the MobileNet-v2 model trained for ImageNet using hybrid mode and mix-CIM. c, Breakdown of computation types (INT and FP) on the left axis and CIM architectures (memristor-CIM, SRAM-CIM and tiny-digital unit) on the right axis. d, Inference accuracy and energy efficiency across use scenarios.

Extended Data Fig. 7 Comparison of proposed 2DQA scheme versus previous works.

a, Processing of the pretrained model by the proposed 2DQA scheme. b, Error injection was used to obtain concise guidelines by which to determine the sensitivity of each layer for use in deriving a usable configuration without having to assess all possible combinations iteratively. c, 2DQA scheme outperformed all previous mixed-precision architectures in terms of inference accuracy when applied to the complex ImageNet dataset. Data from refs. ^11,47,60,61.

Extended Data Fig. 8 Illustration of pre-alignment process for FP inputs and weights and comparison of alignment methods for FP operations.

a, During the pre-alignment process, the FP format is converted to the fixed-point format to facilitate subsequent processing. b, Comparison of different alignment methods: product alignment, input alignment, input-wise and layer-wise weight separate alignment and the proposed input-wise and kernel-wise weight separate alignment. c, Simulated energy consumption by memristor-CIM macro under various alignment methods. d, Inference accuracy as a function of bit width under various alignment methods.

Extended Data Fig. 9 Characteristics and specifications of the proposed memristor device.

The characteristics of the proposed memristor, including the underlying technology, memristor cell size, set/reset voltages, read voltage, normalized resistance state of weights, normalized conductance state of weights and dimensions of memristor banks.

Extended Data Table 1 Comparison of NVM-based processors and macros

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khwa, WS., Wen, TH., Hsu, HH. et al. A mixed-precision memristor and SRAM compute-in-memory AI processor. Nature 639, 617–623 (2025). https://doi.org/10.1038/s41586-025-08639-2

Download citation

Received: 06 November 2023
Accepted: 13 January 2025
Published: 05 March 2025
Version of record: 05 March 2025
Issue date: 20 March 2025
DOI: https://doi.org/10.1038/s41586-025-08639-2

This article is cited by

High-concurrency tri-mode memristor-based ordinary differential equation solver
- Lianfeng Yu
- Teng Zhang
- Yuchao Yang
Nature Communications (2026)
In-memory analog computing for non-negative matrix factorization
- Shiqing Wang
- Yubiao Luo
- Zhong Sun
Nature Communications (2026)
Compute-in-memory implementation of state space models for event sequence processing
- Xiaoyu Zhang
- Mingtao Hu
- Wei D. Lu
Nature Communications (2026)
Robust optoelectronic dual-mode memristor enabled by ZnO/MoS2 heterojunction for synaptic bionics and in-memory computing
- Xiaoyan Liu
- Youshan Gui
- Lei Wang
Science China Information Sciences (2026)
Control and cryptographic application of a memristor-induced second-order discrete map
- Zhe Yang
- Xian-Feng Li
- Andrew Y.-T. Leung
International Journal of Dynamics and Control (2026)