Abstract
Time and space constitute fundamental dimensions of physical reality, making their integrated processing crucial for advanced vision perception systems. Current visual information processing faces dual limitations: von Neumann architecture-induced data-transfer bottlenecks and spatial-feature processing often disregard temporal dynamics, while temporal analyzers oversimplify spatial complexity. Here we propose an artificial vision hardware enabling intrinsic temporal-spatial fusion through voltage-tunable temporal differentiation with microsecond-scale resolution and photoresponse-weighted spatial compression via pixel binning. The architecture achieves millisecond-level latency from sensing to decision in autonomous driving scenarios through in-sensor spatiotemporal fusion, eliminating external computing dependencies. Experimental validation demonstrates 95 % recognition accuracy in human actions database while the operation counts required is only 1/10 of conventional convolutional processing. This work facilitates physical-level spatiotemporal fusion through the co-optimization of photodetector arrays and weighted control circuits, which could fundamentally reshape machine vision architectures with potential extensions to real-time decision systems.
Similar content being viewed by others
Data availability
The data that support the findings of this study are presented in the paper and the Supplementary Information. Source data are provided with this paper.
Code availability
The codes that support the findings of this study are available from the corresponding authors on request.
References
Van Essen, D. C. & Gallant, J. L. Neural mechanisms of form and motion processing in the primate visual system. Neuron 13, 1–10 (1994).
Chen, G. & Gong, P. A spatiotemporal mechanism of visual attention: superdiffusive motion and theta oscillations of neural population activity patterns. Sci. Adv. 8, eabl4995 (2022).
Li, Z. et al. BEVFormer: learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, Vol. 13669 (eds Avidan, S., Brostow, G., Cissé, M., Farinella, G. M. & Hassner,T.) 1–18 (Springer,Cham, 2022).
Chai, Y. In-sensor computing for machine vision. Nature 579, 32–33 (2020).
He, Z. et al. Perovskite retinomorphic image sensor for embodied intelligent vision. Sci. Adv. 11, eads2834 (2025).
Yang, Y. et al. In-sensor dynamic computing for intelligent machine vision. Nat. Electron 7, 225–233 (2024).
Dang, B. et al. Reconfigurable in-sensor processing based on a multi-phototransistor—one-memristor array. Nat. Electron 7, 991–1003 (2024).
Li, F. et al. An artificial visual neuron with multiplexed rate and time-to-first-spike coding. Nat. Commun. 15, 3689 (2024).
Wu, X. et al. Ultralow-power optoelectronic synaptic transistors based on polyzwitterion dielectrics for in-sensor reservoir computing. Sci. Adv. 10, eadn4524 (2024).
Huang, H. et al. Fully integrated multi-mode optoelectronic memristor array for diversified in-sensor computing. Nat. Nanotechnol. 20, 93–103 (2025).
Gao, H. et al. Bio-inspired mid-infrared neuromorphic transistors for dynamic trajectory perception using PdSe2/pentacene heterostructure. Nat. Commun. 16, 5241 (2025).
Zhou, Y. et al. Computational event-driven vision sensors for in-sensor spiking neural networks. Nat. Electron 6, 870–878 (2023).
Reissig, L., Dalgleish, S. & Awaga, K. A differential photodetector: Detecting light modulations using transient photocurrents. AIP Adv. 6, 015306 (2016).
Herrera, C. T. & Labram, J. G. Quantifying the performance of perovskite retinomorphic sensors. J. Phys. D 54, 475110 (2021).
Al Mahfuz, M. M., Islam, R. & Ko, D.-K. Artificial Amacrine Retinal Circuits. ACS Appl. Mater. Interfaces 16, 46454–46460 (2024).
Kumar, M., Park, H. & Seo, H. A single-pixel event photoactive device for real-time, in-sensor spatiotemporal optical information processing. Adv. Mater. 37, 2406607 (2024).
Yamamoto, H. et al. Modular architecture facilitates noise-driven control of synchrony in neuronal networks. Sci. Adv. 9, eade1755 (2023).
Sinha, M. & Narayanan, R. Active dendrites and local field potentials: biophysical mechanisms and computational explorations. Neuroscience 489, 111–142 (2022).
Yi, G., Wang, J., Wei, X. & Deng, B. Action potential initiation in a two-compartment model of pyramidal neuron mediated by dendritic Ca2+ spike. Sci. Rep. 7, 45684 (2017).
Mennel, L. et al. Ultrafast machine vision with 2D material neural network image sensors. Nature 579, 62–66 (2020).
Wang, C.-Y. et al. Gate-tunable van der Waals heterostructure for reconfigurable neural network vision sensor. Sci. Adv. 6, eaba6173 (2020).
Yao, S. et al. Radar-camera fusion for object detection and semantic segmentation in autonomous driving: a comprehensive review. IEEE Trans. Intell. Veh. 9, 2094–2128 (2024).
Wu, Y. et al. CMOS-compatible retinomorphic Si photodetector for motion detection. Sci. China Inf. Sci. 66, 162401 (2023).
Chen, G. et al. Event-based neuromorphic vision for autonomous driving: a paradigm shift for bio-inspired visual sensing and perception. IEEE Signal Process. Mag. 37, 34–49 (2020).
Liu, L. et al. Computing systems for autonomous driving: state of the art and challenges. IEEE Internet Things J. 8, 6469–6486 (2021).
Kim, M.-K., Kim, I.-J. & Lee, J.-S. CMOS-compatible compute-in-memory accelerators based on integrated ferroelectric synaptic arrays for convolution neural networks. Sci. Adv. 8, eabm8537 (2022).
Wu, C. et al. Programmable phase-change metasurfaces on waveguides for multimode photonic convolutional neural network. Nat. Commun. 12, 96 (2021).
Gorelick, L., Blank, M., Shechtman, E., Irani, M. & Basri, R. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2247–2253 (2007).
Wu, Y. et al. A spiking artificial vision architecture based on fully emulating the human vision. Adv. Mater. 36, 2312094 (2024).
Acknowledgements
This work was supported by the National Key Research and Development Project of China (2023YFB2806701 W.D.), the National Natural Science Foundation of China under Grant (U23A20357 Y.Z., 62334001 Y.Z., 62305013 W.D., and 62574019 W.D.), and the China National Postdoctoral Program for Innovative Talents (No. BX20230033 W.D.).
Author information
Authors and Affiliations
Contributions
Y.Z., W.D., Y.W., and Y.C. conceived the concept and designed the experiments. W.D., Y.C., Y.Z. supervised the project. Y.W. fabricated the devices. Z.C., Y.W., R.L., C.X., Z.L., and Y.S.W. design weight control circuits. Y.W., J.G., C.Z., Q.R., X.M., Z.X., and Z.Z. performed the optoelectronic measurements. Y.W., D.W., K.L., X.W., Z.C., Y.C., and Y.Z. analyzed the data. Y.W. and W.D. wrote the paper. All the authors discussed the results and implications and reviewed the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Hyeok Kim, Chengkuo Lee, and Haotong Wei for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wu, Y., Deng, W., Liu, R. et al. Temporal-Spatial Fusion Vision Hardware Enables Streamlined In-Sensor Computing for Dynamic Scenes. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71907-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-71907-w


