Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Diffractive tensorized unit for million-TOPS general-purpose computing

Abstract

Photonic computing has emerged as a promising next-generation technology for processors, with diffraction-based architectures showing particular potential for large-scale parallel processing. Unfortunately, the lack of on-chip reconfigurability poses significant obstacles to realizing general-purpose computing, restricting the adaptability of these architectures to diverse advanced applications. Here we propose a diffractive tensorized unit (DTU), which is a fully reconfigurable photonic processor supporting million-TOPS general-purpose computing. The DTU leverages a tensor factorization approach to perform complex matrix multiplication through clustered diffractive tensor cores, while each diffractive tensor core employs a near-core modulation mechanism to activate dynamic temporal diffractive connections. Experiments confirm that the DTU overcomes the long-standing generality and scalability constraints of diffractive computing, realizing general computing with a 10−6 mean absolute error for arbitrary 1,024-size matrix multiplications. Compared with state-of-the-art solutions, the DTU not only achieves competitive accuracy on various challenging tasks, such as natural language generation and cross-modal recognition, but also delivers a 1,000× improvement in computing throughput over conventional electronic processors. The proposed DTU represents a leap forward in general-purpose photonic computing, paving the way for further advancements in large-scale artificial intelligence.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: DTU consisting of dynamic DTCs for million-TOPS general-purpose computing.
Fig. 2: Systemic construction and evaluation of DTU networks.
Fig. 3: Large-scale DTU implementation of NLG.
Fig. 4: Large-scale DTU implementation of cross-modal recognition.
Fig. 5: Experimental on-chip validation of the DTU for flexible AI applications.

Similar content being viewed by others

Data availability

All data required to evaluate the conclusions of this study are presented in the Article or its Supplementary Information. The data repository is further available via Dryad at https://doi.org/10.5061/dryad.7d7wm387c (ref. 67). The DTU testing sample can be provided upon signing a material transfer agreement.

Code availability

All relevant code is available from the corresponding author upon reasonable request.

References

  1. Jaeger, H., Noheda, B. & van der Wiel, W. G. Toward a formal theory for computing machines made out of whatever physics offers. Nat. Commun. 14, 4911 (2023).

    Article  ADS  Google Scholar 

  2. Brunner, D. & Psaltis, D. Competitive photonic neural networks. Nat. Photonics 15, 323–324 (2021).

    Article  ADS  Google Scholar 

  3. Huang, C. et al. Prospects and applications of photonic neural networks. Adv. Phys. X 7, 1981155 (2022).

    Google Scholar 

  4. Fang, L. et al. Engram-driven videography. Engineering 25, 101–109 (2023).

    Article  Google Scholar 

  5. McMahon, P. L. The physics of optical computing. Nat. Rev. Phys. 5, 717–734 (2023).

    Article  Google Scholar 

  6. Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photon. 15, 102–114 (2021).

    Article  ADS  Google Scholar 

  7. Xue, Z. et al. Fully forward mode training for optical neural networks. Nature 632, 280–286 (2024).

    Article  ADS  Google Scholar 

  8. Shen, Y. et al. Deep learning with coherent nanophotonic circuits. Nat. Photon. 11, 441–446 (2017).

    Article  ADS  Google Scholar 

  9. Meng, X. et al. Compact optical convolution processing unit based on multimode interference. Nat. Commun. 14, 3000 (2023).

    Article  ADS  Google Scholar 

  10. Feldmann, J. et al. Parallel convolutional processing using an integrated photonic tensor core. Nature 589, 52–58 (2021).

    Article  ADS  Google Scholar 

  11. Ashtiani, F., Geers, A. J. & Aflatouni, F. An on-chip photonic deep neural network for image classification. Nature 606, 501–506 (2022).

    Article  ADS  Google Scholar 

  12. Fyrillas, A., Faure, O., Maring, N., Senellart, J. & Belabas, N. Scalable machine learning-assisted clear-box characterization for optimally controlled photonic circuits. Optica 11, 427 (2024).

    Article  ADS  Google Scholar 

  13. Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature 588, 39–47 (2020).

    Article  ADS  Google Scholar 

  14. Zhou, H. et al. Photonic matrix multiplication lights up photonic accelerator and beyond. Light Sci. Appl. 11, 30 (2022).

    Article  ADS  Google Scholar 

  15. Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science 361, 1004–1008 (2018).

    Article  ADS  MathSciNet  Google Scholar 

  16. Zhou, T. et al. Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit. Nat. Photon. 15, 367–373 (2021).

    Article  ADS  Google Scholar 

  17. Ambrogio, S. et al. An analog-AI chip for energy-efficient speech recognition and transcription. Nature 620, 768–775 (2023).

    Article  ADS  Google Scholar 

  18. Liu, C. et al. A programmable diffractive deep neural network based on a digital-coding metasurface array. Nat. Electron. 5, 113–122 (2022).

    Article  Google Scholar 

  19. Wu, T., Menarini, M., Gao, Z. & Feng, L. Lithography-free reconfigurable integrated photonic processor. Nat. Photonics 17, 710–716 (2023).

    Article  ADS  Google Scholar 

  20. Zuo, C. & Chen, Q. Exploiting optical degrees of freedom for information multiplexing in diffractive neural networks. Light Sci. Appl. 11, 208 (2022).

    Article  ADS  Google Scholar 

  21. Zhang, Z. et al. Space–time projection enabled ultrafast all‐optical diffractive neural network. Laser Photon. Rev. 18, 2301367 (2024).

    Article  ADS  Google Scholar 

  22. Luo, Y. et al. Design of task-specific optical systems using broadband diffractive neural networks. Light Sci. Appl. 8, 112 (2019).

    Article  ADS  Google Scholar 

  23. Luo, X. et al. Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible. Light Sci. Appl. 11, 158 (2022).

    Article  ADS  Google Scholar 

  24. Kulce, O., Mengu, D., Rivenson, Y. & Ozcan, A. All-optical information-processing capacity of diffractive surfaces. Light Sci. Appl. 10, 25 (2021).

    Article  Google Scholar 

  25. Hu, J. et al. Diffractive optical computing in free space. Nat. Commun. 15, 1525 (2024).

    Article  ADS  Google Scholar 

  26. Rahman, M. S. S., Yang, X., Li, J., Bai, B. & Ozcan, A. Universal linear intensity transformations using spatially incoherent diffractive processors. Light Sci. Appl. 12, 195 (2023).

    Article  ADS  Google Scholar 

  27. Kulce, O., Mengu, D., Rivenson, Y. & Ozcan, A. All-optical synthesis of an arbitrary linear transformation using diffractive surfaces. Light Sci. Appl. 10, 196 (2021).

    Article  ADS  Google Scholar 

  28. Cheng, Y. et al. Photonic neuromorphic architecture for tens-of-task lifelong learning. Light Sci. Appl. 13, 56 (2024).

    Article  ADS  Google Scholar 

  29. Xu, Z. et al. Large-scale photonic chiplet Taichi empowers 160-TOPS/W artificial general intelligence. Science 384, 202–209 (2024).

    Article  ADS  Google Scholar 

  30. Gu, T., Kim, H. J., Rivero-Baleine, C. & Hu, J. Reconfigurable metasurfaces towards commercial success. Nat. Photon. 17, 48–58 (2023).

    Article  ADS  Google Scholar 

  31. Yao, Y., Wei, Y., Dong, J., Li, M. & Zhang, X. Large-scale reconfigurable integrated circuits for wideband analog photonic computing. Photonics 10, 300 (2023).

    Article  Google Scholar 

  32. Nemati, A., Wang, Q., Hong, M. H. & Teng, J. H. Tunable and reconfigurable metasurfaces and metadevices. Opto-Electron. Adv. 1, 1–25 (2018).

    Article  Google Scholar 

  33. Qu, Y., Lian, H., Ding, C., Liu, H. & Liu, L. High-frame-rate reconfigurable diffractive neural network based on superpixels. Opt. Lett 48, 1–4 (2023).

    Article  ADS  Google Scholar 

  34. Yang, G. et al. Nonlocal phase-change metaoptics for reconfigurable nonvolatile image processing. Light Sci. Appl. 14, 182 (2025).

    Article  ADS  Google Scholar 

  35. Dinsdale, N. J. et al. Deep learning enabled design of complex transmission matrices for universal optical components. ACS Photonics 8, 283–295 (2021).

    Article  Google Scholar 

  36. Li, Q., Sun, Y. & Zhang, X. Single-layer universal optical computing. Phys. Rev. A 109, 053527 (2024).

    Article  ADS  Google Scholar 

  37. Giamougiannis, G. et al. A coherent photonic crossbar for scalable universal linear optics. J. Light. Technol. 41, 2425–2442 (2023).

    Article  ADS  Google Scholar 

  38. Yang, Y., Krompass, D. & Tresp, V. Tensor-train recurrent neural networks for video classification. In Proc. 34th International Conference on Machine Learning https://proceedings.mlr.press/v70/yang17e/yang17e.pdf (PMLR, 2017).

  39. Cheng, Y., Li, G., Wong, N., Chen, H. & Yu, H. DEEPEYE: a deeply tensor-compressed neural network for video comprehension on terminal devices. ACM Trans. Embed. Comput. Syst. 19, 1–25 (2020).

    Article  Google Scholar 

  40. Miscuglio, M. & Sorger, V. J. Photonic tensor cores for machine learning. Appl. Phys. Rev. 7, 031404 (2020).

    Article  ADS  Google Scholar 

  41. Wang, Y. et al. An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Trans. Nanotechnol. 14, 998–1012 (2015).

    Article  ADS  Google Scholar 

  42. Cheng, Y., Wang, C., Chen, H.-B. & Yu, H. A large-scale in-memory computing for deep neural network with trained quantization. Integration 69, 345–355 (2019).

    Article  Google Scholar 

  43. Krizhevsky, A. et al. Learning multiple layers of features from tiny images. University of Toronto https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (2009).

  44. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248-255 (IEEE, 2009); https://doi.org/10.1109/CVPR.2009.5206848

  45. Oseledets, I. V. Tensor-train decomposition. SIAM J. Sci. Comput. 33, 2295–2317 (2011).

    Article  MathSciNet  Google Scholar 

  46. Cheng, Y., Yang, Y., Chen, H.-B., Wong, N. & Yu, H. S3-Net: a fast scene understanding network by single-shot segmentation for autonomous driving. ACM Trans. Intell. Syst. Technol. 12, 1–19 (2021).

    Article  Google Scholar 

  47. A, de S.-E. The Little Prince and Letter to a Hostage (Penguin UK, 2021).

  48. Rong, X. word2vec parameter learning explained. Nature 606, 501–506 (2014).

    Google Scholar 

  49. Graves, A., Jaitly, N. & Mohamed, A. Hybrid speech recognition with Deep Bidirectional LSTM. In 2013 IEEE Workshop on Automatic Speech Recognition and Understanding 273–278 (IEEE, 2013); https://doi.org/10.1109/ASRU.2013.6707742

  50. Gesmundo, A. & Dean, J. An evolutionary approach to dynamic introduction of tasks in large-scale multitask learning systems. Preprint at https://arxiv.org/abs/2205.12755 (2022).

  51. Plath, J., Sinclair, G. & Curnutt, K. The 100 Greatest Literary Characters (Bloomsbury, 2019).

  52. Carroll L. Alice’s Adventures in Wonderland (Broadview Press, 2011).

  53. Baum, L. F. The Wonderful Wizard of Oz (Broadview Press, 2024).

  54. Abdi, H. & Williams, L. J. Principal component analysis. WIREs Comput. Stat. 2, 433–459 (2010).

    Article  Google Scholar 

  55. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  56. Wang, B. Dataset for couplets. GitHub https://github.com/wb14123/couplet-dataset (2018).

  57. michaelarman. Poems Dataset (NLP). Kaggle https://www.kaggle.com/datasets/michaelarman/poemsdataset (2020).

  58. Karvelis, P., Gavrilis, D., Georgoulas, G. & Stylios, C. Topic recommendation using Doc2Vec. In 2018 International Joint Conference on Neural Networks (IJCNN) 1–6 (IEEE, 2018); https://doi.org/10.1109/IJCNN.2018.8489513

  59. Chen, D. & Dollan, W. Collecting highly parallel data for paraphrase evaluation. In Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (eds Lin, D. et al.) 190–200 (Association for Computational Linguistics, 2011).

  60. Abu-El-Haija, S. et al. YouTube-8M: a large-scale video classification benchmark. Preprint at https://arxiv.org/abs/1609.08675 (2016).

  61. Yang, A. et al. Vid2Seq: large-scale pretraining of a visual language model for dense video captioning. Preprint at https://arxiv.org/abs/2302.14115 (2023).

  62. Liang, Y., Zhu, L., Wang, X. & Yang, Y. IcoCap: improving video captioning by compounding images. IEEE Trans. Multimed. 26, 4389–4400 (2024).

    Article  Google Scholar 

  63. Xu, J., Mei, T., Yao, T. & Rui, Y. MSR-VTT: a large video description dataset for bridging video and language. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5288–5296 (IEEE, 2016); https://doi.org/10.1109/CVPR.2016.571

  64. Schuldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proc. 17th International Conference on Pattern Recognition, ICPR 2004 https://doi.org/10.1109/ICPR.2004.1334462 (IEEE, 2004).

  65. Srivastava, N., Mansimov, E. & Salakhutdinov, R. Unsupervised learning of video representations using LSTMs. Preprint at https://arxiv.org/abs/1502.04681 (2015).

  66. Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Article  ADS  Google Scholar 

  67. Wang, C. et al. Diffractive tensorized unit for million-TOPS general-purpose computing. Dryad https://doi.org/10.5061/dryad.7d7wm387c (2025).

Download references

Acknowledgements

This work is supported in part by National Science and Technology Major Project (contract no. 2021ZD0109903), in part by Natural Science Foundation of China (NSFC) (contract nos. 62125106, 62407026 and 62205176), in part by the Beijing Outstanding Young Scientist Program (contract no. JWZQ20240101009), in part by the XPLORER PRIZE.

Author information

Authors and Affiliations

Authors

Contributions

L.F. initiated the project, L.F. and Q.D. supervised the project, L.F. and C.W. conceived the idea, C.W. designed the photonic integrated circuit, Y.C. performed the simulations, Z.X. C.W. and Y.C. constructed the experimental system and conducted the on-chip task validations. All the authors analysed the results. C.W., Y.C., Z.X. and L.F. prepared the manuscript with input from all the authors.

Corresponding authors

Correspondence to Qionghai Dai or Lu Fang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Photonics thanks Shengxi Huang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Modeling analysis and physical architecture of a DTC.

a, Schematic structure and b, mathematical expression of on-chip modulation technologies. c, Physical architecture of the proposed DTC, with the analysis in the inset representing two classical applications of optical devices, as preliminarily validated by the numerical simulations in d and e.

Extended Data Fig. 2 Chip layout and CMOS-compatible fabrication.

a, Designed chip with commercially available processes (including 29 layout layers). b, Layout of the CMOS-compatible chip on the SOI platform. c, Passive process layers, including the designs of optical devices, and d, active process layers, including the designs of electrical devices, with the fabricated chip partially characterized by optical photos.

Extended Data Fig. 3 Integrated optoelectronic chip testing platform.

a, Constructed multifunctional experimental platform, which supports both calibration and validation of the chip. b, Example collection of the infrared spots from the OMA through the spy GCs. c, Example collection of the converted volts from the PDA obtained through the signal collector. d, Developed software for setup integration and chip testing. e, Key parts of the highly integrated experimental setup. (T-laser, tunable laser. EDFA, erbium-doped fiber amplifier. OPC, optical polarization controller. FA, fiber array. EC, edge coupler. OSP, optical splitter. GCA, grating coupler array. OMA, optical modulator array. DONN, diffractive optical neural network. PDA, photon detector array. PIC, photonic integrated circuit. NIR, near-infrared spectroscopy. IVC, current‒voltage converter. TIA, transimpedance amplifier. MVS, multichannel voltage signal source).

Extended Data Fig. 4 Detailed computational framework of benchmark models with the best settings for the DTU.

Computing network structures for various applications involving the applied a, word prediction, b, image classification, and c, video captioning tasks. Generally, the structure contains multimode computational blocks for diffractive propagation, including 1-dimensional tensor core, 2-dimensional tensor chain, 3-dimensional tensor array, and 4-dimensional tensor cluster. Each basic block performs near-core modulation with recurrent flows and tensor connections, which are configured with flexible input and output parameters.

Supplementary information

Supplementary Information

Supplementary Notes 1–20, Supplementary Figs. 1–17 and Supplementary Tables 1–4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, C., Cheng, Y., Xu, Z. et al. Diffractive tensorized unit for million-TOPS general-purpose computing. Nat. Photon. 19, 1078–1087 (2025). https://doi.org/10.1038/s41566-025-01749-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41566-025-01749-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing