Temporal-Spatial Fusion Vision Hardware Enables Streamlined In-Sensor Computing for Dynamic Scenes

Wu, Yi; Deng, Wenjie; Liu, Ruihao; Xiao, Chutian; Guo, Jianmiao; Zhu, Chaoyi; Ren, Qinqi; Li, Zehao; Wu, Yushan; Li, Kexin; Ma, Xueliang; Wang, Xiaoting; Xu, Zhangyang; Zhao, Zikang; Chen, Zhijie; Chai, Yang; Zhang, Yongzhe

doi:10.1038/s41467-026-71907-w

Download PDF

Article
Open access
Published: 15 April 2026

Temporal-Spatial Fusion Vision Hardware Enables Streamlined In-Sensor Computing for Dynamic Scenes

Nature Communications (2026) Cite this article

4504 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Time and space constitute fundamental dimensions of physical reality, making their integrated processing crucial for advanced vision perception systems. Current visual information processing faces dual limitations: von Neumann architecture-induced data-transfer bottlenecks and spatial-feature processing often disregard temporal dynamics, while temporal analyzers oversimplify spatial complexity. Here we propose an artificial vision hardware enabling intrinsic temporal-spatial fusion through voltage-tunable temporal differentiation with microsecond-scale resolution and photoresponse-weighted spatial compression via pixel binning. The architecture achieves millisecond-level latency from sensing to decision in autonomous driving scenarios through in-sensor spatiotemporal fusion, eliminating external computing dependencies. Experimental validation demonstrates 95 % recognition accuracy in human actions database while the operation counts required is only 1/10 of conventional convolutional processing. This work facilitates physical-level spatiotemporal fusion through the co-optimization of photodetector arrays and weighted control circuits, which could fundamentally reshape machine vision architectures with potential extensions to real-time decision systems.

Self-reconfigurable polarization perception in dual-anisotropy heterostructures for high-dimensional in-sensor computing

Article 14 April 2026

In-sensor analog optoelectronic processing of concurrent event and memory signals for dynamic vision sensing

Article Open access 26 December 2025

Bioinspired high-order in-sensor spatiotemporal enhancement in van der Waals optoelectronic neuromorphic electronics

Article Open access 02 October 2025

Data availability

The data that support the findings of this study are presented in the paper and the Supplementary Information. Source data are provided with this paper.

Code availability

The codes that support the findings of this study are available from the corresponding authors on request.

References

Van Essen, D. C. & Gallant, J. L. Neural mechanisms of form and motion processing in the primate visual system. Neuron 13, 1–10 (1994).
Google Scholar
Chen, G. & Gong, P. A spatiotemporal mechanism of visual attention: superdiffusive motion and theta oscillations of neural population activity patterns. Sci. Adv. 8, eabl4995 (2022).
Google Scholar
Li, Z. et al. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, Vol. 13669 (eds Avidan, S., Brostow, G., Cissé, M., Farinella, G. M. & Hassner,T.) 1–18 (Springer,Cham, 2022).
Chai, Y. In-sensor computing for machine vision. Nature 579, 32–33 (2020).
Google Scholar
He, Z. et al. Perovskite retinomorphic image sensor for embodied intelligent vision. Sci. Adv. 11, eads2834 (2025).
Google Scholar
Yang, Y. et al. In-sensor dynamic computing for intelligent machine vision. Nat. Electron 7, 225–233 (2024).
Google Scholar
Dang, B. et al. Reconfigurable in-sensor processing based on a multi-phototransistor—one-memristor array. Nat. Electron 7, 991–1003 (2024).
Google Scholar
Li, F. et al. An artificial visual neuron with multiplexed rate and time-to-first-spike coding. Nat. Commun. 15, 3689 (2024).
Google Scholar
Wu, X. et al. Ultralow-power optoelectronic synaptic transistors based on polyzwitterion dielectrics for in-sensor reservoir computing. Sci. Adv. 10, eadn4524 (2024).
Google Scholar
Huang, H. et al. Fully integrated multi-mode optoelectronic memristor array for diversified in-sensor computing. Nat. Nanotechnol. 20, 93–103 (2025).
Google Scholar
Gao, H. et al. Bio-inspired mid-infrared neuromorphic transistors for dynamic trajectory perception using PdSe2/pentacene heterostructure. Nat. Commun. 16, 5241 (2025).
Google Scholar
Zhou, Y. et al. Computational event-driven vision sensors for in-sensor spiking neural networks. Nat. Electron 6, 870–878 (2023).
Google Scholar
Reissig, L., Dalgleish, S. & Awaga, K. A differential photodetector: Detecting light modulations using transient photocurrents. AIP Adv. 6, 015306 (2016).
Google Scholar
Herrera, C. T. & Labram, J. G. Quantifying the performance of perovskite retinomorphic sensors. J. Phys. D 54, 475110 (2021).
Google Scholar
Al Mahfuz, M. M., Islam, R. & Ko, D.-K. Artificial Amacrine Retinal Circuits. ACS Appl. Mater. Interfaces 16, 46454–46460 (2024).
Google Scholar
Kumar, M., Park, H. & Seo, H. A single-pixel event photoactive device for real-time, in-sensor spatiotemporal optical information processing. Adv. Mater. 37, 2406607 (2024).
Google Scholar
Yamamoto, H. et al. Modular architecture facilitates noise-driven control of synchrony in neuronal networks. Sci. Adv. 9, eade1755 (2023).
Google Scholar
Sinha, M. & Narayanan, R. Active dendrites and local field potentials: biophysical mechanisms and computational explorations. Neuroscience 489, 111–142 (2022).
Google Scholar
Yi, G., Wang, J., Wei, X. & Deng, B. Action potential initiation in a two-compartment model of pyramidal neuron mediated by dendritic Ca2+ spike. Sci. Rep. 7, 45684 (2017).
Google Scholar
Mennel, L. et al. Ultrafast machine vision with 2D material neural network image sensors. Nature 579, 62–66 (2020).
Google Scholar
Wang, C.-Y. et al. Gate-tunable van der Waals heterostructure for reconfigurable neural network vision sensor. Sci. Adv. 6, eaba6173 (2020).
Google Scholar
Yao, S. et al. Radar-camera fusion for object detection and semantic segmentation in autonomous driving: a comprehensive review. IEEE Trans. Intell. Veh. 9, 2094–2128 (2024).
Google Scholar
Wu, Y. et al. CMOS-compatible retinomorphic Si photodetector for motion detection. Sci. China Inf. Sci. 66, 162401 (2023).
Google Scholar
Chen, G. et al. Event-based neuromorphic vision for autonomous driving: a paradigm shift for bio-inspired visual sensing and perception. IEEE Signal Process. Mag. 37, 34–49 (2020).
Google Scholar
Liu, L. et al. Computing systems for autonomous driving: state of the art and challenges. IEEE Internet Things J. 8, 6469–6486 (2021).
Google Scholar
Kim, M.-K., Kim, I.-J. & Lee, J.-S. CMOS-compatible compute-in-memory accelerators based on integrated ferroelectric synaptic arrays for convolution neural networks. Sci. Adv. 8, eabm8537 (2022).
Google Scholar
Wu, C. et al. Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network. Nat. Commun. 12, 96 (2021).
Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M. & Basri, R. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007).
Google Scholar
Wu, Y. et al. A spiking artificial vision architecture based on fully emulating the human vision. Adv. Mater. 36, 2312094 (2024).
Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Project of China (2023YFB2806701 W.D.), the National Natural Science Foundation of China under Grant (U23A20357 Y.Z., 62334001 Y.Z., 62305013 W.D., and 62574019 W.D.), and the China National Postdoctoral Program for Innovative Talents (No. BX20230033 W.D.).

Author information

These authors contributed equally: Yi Wu, Wenjie Deng.

Authors and Affiliations

State Key Laboratory of Materials Low-Carbon Recycling, College of Materials Science and Engineering, Beijing University of Technology, Beijing, 100124, China
Yi Wu & Yongzhe Zhang
Key Laboratory of Optoelectronics Technology of Education Ministry of China, School of Integrated Circuits, Beijing University of Technology, Beijing, 100124, China
Yi Wu, Wenjie Deng, Ruihao Liu, Chutian Xiao, Zehao Li, Yushan Wu, Kexin Li, Xueliang Ma, Xiaoting Wang, Zhangyang Xu, Zikang Zhao, Zhijie Chen & Yongzhe Zhang
Department of Applied Physics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
Yi Wu, Jianmiao Guo, Chaoyi Zhu, Qinqi Ren & Yang Chai

Authors

Yi Wu
View author publications
Search author on:PubMed Google Scholar
Wenjie Deng
View author publications
Search author on:PubMed Google Scholar
Ruihao Liu
View author publications
Search author on:PubMed Google Scholar
Chutian Xiao
View author publications
Search author on:PubMed Google Scholar
Jianmiao Guo
View author publications
Search author on:PubMed Google Scholar
Chaoyi Zhu
View author publications
Search author on:PubMed Google Scholar
Qinqi Ren
View author publications
Search author on:PubMed Google Scholar
Zehao Li
View author publications
Search author on:PubMed Google Scholar
Yushan Wu
View author publications
Search author on:PubMed Google Scholar
Kexin Li
View author publications
Search author on:PubMed Google Scholar
Xueliang Ma
View author publications
Search author on:PubMed Google Scholar
Xiaoting Wang
View author publications
Search author on:PubMed Google Scholar
Zhangyang Xu
View author publications
Search author on:PubMed Google Scholar
Zikang Zhao
View author publications
Search author on:PubMed Google Scholar
Zhijie Chen
View author publications
Search author on:PubMed Google Scholar
Yang Chai
View author publications
Search author on:PubMed Google Scholar
Yongzhe Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Z., W.D., Y.W., and Y.C. conceived the concept and designed the experiments. W.D., Y.C., Y.Z. supervised the project. Y.W. fabricated the devices. Z.C., Y.W., R.L., C.X., Z.L., and Y.S.W. design weight control circuits. Y.W., J.G., C.Z., Q.R., X.M., Z.X., and Z.Z. performed the optoelectronic measurements. Y.W., D.W., K.L., X.W., Z.C., Y.C., and Y.Z. analyzed the data. Y.W. and W.D. wrote the paper. All the authors discussed the results and implications and reviewed the paper.

Corresponding authors

Correspondence to Wenjie Deng, Zhijie Chen, Yang Chai or Yongzhe Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Hyeok Kim, Chengkuo Lee, and Haotong Wei for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, Y., Deng, W., Liu, R. et al. Temporal-Spatial Fusion Vision Hardware Enables Streamlined In-Sensor Computing for Dynamic Scenes. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71907-w

Download citation

Received: 19 June 2025
Accepted: 01 April 2026
Published: 15 April 2026
DOI: https://doi.org/10.1038/s41467-026-71907-w