Supernetwork-based efficient mapping of deep learning applications to mixed-precision hardware using model adaptation

Benmeziane, Hadjer; Lammie, Corey; Boybat, Irem; Rasch, Malte; Le Gallo, Manuel; Vasilopoulos, Athanasios; Tsai, Hsinyu; Burr, Geoffrey W.; Narayanan, Vijay; El Maghraoui, Kaoutar; Sebastian, Abu

doi:10.1038/s41467-026-71071-1

Download PDF

Article
Open access
Published: 27 March 2026

Supernetwork-based efficient mapping of deep learning applications to mixed-precision hardware using model adaptation

Nature Communications , Article number: (2026) Cite this article

887 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The rapid proliferation of Artificial Intelligence applications necessitates scalable solutions that perform efficiently under real-world constraints. Heterogeneous accelerators combining specialized analog and digital units offer localized, energy-efficient neural network computations. However, achieving optimal performance on these platforms requires balancing energy efficiency and model accuracy through optimized neural network layer mapping. To this end, we introduce Mixed-Precision Supernetwork, a unified framework for training mixed-precision supernetworks that seamlessly integrate quantized layers with analog noise-sensitive layers. Mixed-Precision Supernetwork incorporates a mapping-aware adaptation strategy, dynamically optimizing layer assignments while refining the neural network via hardware-aware architecture search. This dual innovation establishes Mixed-Precision Supernetwork as a groundbreaking approach for deploying deep learning models efficiently on heterogeneous accelerators. On average, Mixed-Precision Supernetwork produces mappings ~ 2.2 × faster and achieves a ~ 3.4% increase in model accuracy over a fully analog approach, while improving energy-efficiency by mapping up to 80% of the model’s weights to analog hardware while maintaining full-precision accuracy.

Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders

Article Open access 25 January 2025

Overcoming classic challenges for artificial neural networks by providing incentives and practice

Article 20 October 2025

Adaptive physics-informed neural operator for coarse-grained non-equilibrium flows

Article Open access 19 September 2023

Data availability

The raw data for all figures, and tables can be found here https://github.com/IBM/analog-nas/tree/main/MPS.

Code availability

The code used to perform the simulations included in this study is available at https://github.com/IBM/analog-nas/tree/main/MPS

References

Touvron, H. et al. Llama: Open and efficient foundation language models, arXiv preprint arXiv:2302.13971 (2023).
Radford, A. et al. Learning transferable visual models from natural language supervision, in Proceedings of the 38th International Conference on Machine Learning, ICML, Proceedings of Machine Learning Research 139 (eds Meila, M. & Zhang, T.) pp. 8748–8763 (PMLR, 2021).
Schlichtkrull, M. et al. Modeling relational data with graph convolutional networks, in The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15 pp. 593–607 (Springer, 2018).
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Google Scholar
Burr, G. W. et al. Neuromorphic computing using non-volatile memory. Adv. Phys. X 2, 89–124 (2017).
Google Scholar
Yu, S. Neuro-inspired computing with emerging nonvolatile memorys. Proc. IEEE 106, 260–285 (2018).
Google Scholar
Ambrogio, S. et al. An analog-AI chip for energy-efficient speech recognition and transcription. Nature 620, 768–775 (2023).
Google Scholar
Rasch, M. J. et al. Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators. Nat. Commun. 14, 5282 (2023).
Google Scholar
Vasilopoulos, A. et al. Exploiting the state dependency of conductance variations in memristive devices for accurate in-memory computing. IEEE Trans. Electron Devices 70, 6279–6285 (2023).
Google Scholar
Büchel, J. et al. Programming weights to analog in-memory computing cores by direct minimization of the matrix-vector multiplication error. IEEE J. Emerg. Sel. Top. Circuits Syst. 13, 1052–1061 (2023).
Google Scholar
Joshi, V. et al. Accurate deep neural network inference using computational Phase-Change Memory, http://arxiv.org/abs/1906.03138CoRRabs/1906.03138 1906.03138 (2019).
Lepri, N. et al. In-memory computing for machine learning and deep learning. in IEEE J. Electron Devices Society (2023).
Jain, S. et al. A heterogeneous and programmable compute-in-memory accelerator architecture for analog-ai using dense 2-D mesh. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 31, 114–127 (2022).
Boybat-Kara, I. et al. Heterogeneous embedded neural processing units utilizing PCM-based analog in-memory computing. In IEEE International Electron Devices Meeting (2024).
Le Gallo, M. et al. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. Nat. Electron. 6, 680–693 (2023).
Google Scholar
Ueyoshi, K. et al. DIANA: An end-to-end energy-efficient digital and analog hybrid neural network soc, in https://doi.org/10.1109/ISSCC42614.2022.97317162022 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 65 pp. 1–3 (2022).
Behnam, P., Kamal, U. & Mukhopadhyay, S. An algorithm-hardware co-design framework to overcome imperfections of mixed-signal dnn accelerators, arXiv preprint arXiv:2208.13896 (2022).
Risso, M. et al. Precision-aware latency and energy balancing on multi-accelerator platforms for DNN inference, in ISLPED’23: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (IEEE, 2023).
Lammie, Corey, et al. Lionheart: A layer-based mapping framework for heterogeneous systems with analog in-memory computing tiles. IEEE Transactions on Emerging Topics in Computing (2025).
Cai, H., Zhu, L. & Han, S. ProxylessNAS: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations (2018).
Myung, S. et al. A 14 nm MRAM-based multi-bit analog in-memory computing with process-variation calibration for 72 macros-based accelerator. IEEE Solid-State Circuits Lett. 7, 295–298 (2024).
Mattson, P. et al. MLperf: an industry standard benchmark suite for machine learning performance. IEEE Micro 40, 8–16 (2020).
Le Gallo, M. et al. Using the IBM analog in-memory hardware acceleration kit for neural network training and inference. APL Mach. Learn. 1, 4041102 (2023).
Akiba, T., Sano, S., Yanase, T., Ohta, T. & Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019).
IBM. 3D-CiM-LLM-Inference-Simulator. GitHub repository. Accessed 2026-02-16.
Chu, X., Zhang, B. & Xu, R. FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision pp. 12239–12248 (2021).
Jacob, B. et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference, in https://doi.org/10.1109/CVPR.2018.00286 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 2704–2713 (2018).
Benmeziane, H., El Maghraoui, K., Ouarnoughi, H. & Niar, S. Pareto rank-preserving supernetwork for hardware-aware neural architecture search. In ECAI 2023 (IOS Press, 2023) pp. 239–246.
Ma, L. et al. How to simplify search: classification-wise Pareto evolution for one-shot neural architecture search, arXiv preprint arXiv:2109.07582 (2021).

Download references

Acknowledgements

We thank R. Haas, J. Burns, M. Khare for management support.

Author information

Authors and Affiliations

IBM Research Europe, Rüschlikon, Switzerland
Hadjer Benmeziane, Corey Lammie, Irem Boybat, Manuel Le Gallo, Athanasios Vasilopoulos & Abu Sebastian
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Malte Rasch, Vijay Narayanan & Kaoutar El Maghraoui
IBM Research—Almaden, San Jose, CA, USA
Hsinyu Tsai & Geoffrey W. Burr

Authors

Hadjer Benmeziane
View author publications
Search author on:PubMed Google Scholar
Corey Lammie
View author publications
Search author on:PubMed Google Scholar
Irem Boybat
View author publications
Search author on:PubMed Google Scholar
Malte Rasch
View author publications
Search author on:PubMed Google Scholar
Manuel Le Gallo
View author publications
Search author on:PubMed Google Scholar
Athanasios Vasilopoulos
View author publications
Search author on:PubMed Google Scholar
Hsinyu Tsai
View author publications
Search author on:PubMed Google Scholar
Geoffrey W. Burr
View author publications
Search author on:PubMed Google Scholar
Vijay Narayanan
View author publications
Search author on:PubMed Google Scholar
Kaoutar El Maghraoui
View author publications
Search author on:PubMed Google Scholar
Abu Sebastian
View author publications
Search author on:PubMed Google Scholar

Contributions

H.B. and A.S. initiated the project. H.B. designed and planned the project. C.L., I.B., A.V. and M.R. set up the infrastructure and tools required. M.L.G., H.T., and G.W.B. contributed to the discussion of the project and assisted with revisions of the manuscript. H.B. wrote the manuscript with input from all authors. V.N., K.E.M., and A.S. supervised the work.

Corresponding author

Correspondence to Hadjer Benmeziane.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Huaqiang Wu and Daniele Jahier Pagliari, who co-reviewed with Beatrice Alessandra Motetti, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Benmeziane, H., Lammie, C., Boybat, I. et al. Supernetwork-based efficient mapping of deep learning applications to mixed-precision hardware using model adaptation. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71071-1

Download citation

Received: 04 March 2025
Accepted: 12 March 2026
Published: 27 March 2026
DOI: https://doi.org/10.1038/s41467-026-71071-1