Introduction

Modern sensor technology now surpasses human perceptual capabilities in sensitivity, range, and specificity across diverse modalities, including vision, chemical, and tactile sensing1,2. These technological advancements have enabled transformative applications in robotics, Internet of Things (IoT) networks, and biomedical systems3,4,5,6. However, despite the substantial volume of data generated by multimodal sensors, conventional computing architectures remain predominantly centralized, with primary computation performed on central processing unit (CPU) and graphics processing unit (GPU), often necessitating cloud-based resources for complex tasks (Fig. 1a)7,8,9,10,11,12,13. This centralized approach induces substantial bottlenecks as data are repeatedly converted and transmitted between sensors, memories, and computing units14,15,16,17.

Fig. 1: Overview of conventional computing and sensor edge computing architectures.
Fig. 1: Overview of conventional computing and sensor edge computing architectures.
Full size image

a Conventional von Neumann computing architecture in which sensor outputs are digitized, transferred to discrete memory banks, and subsequently processed by centralized computing units. It shows inherent data-transfer bottlenecks between the sensor, memory, and computing units. b Sensor edge computing system in which computation is relocated to or immediately adjacent to the sensing element. In-sensor computing embeds lightweight processing directly within each pixel, while near-sensor architectures co-locate computing units adjacent to the sensor array. These strategies perform computation tasks at the source, thereby minimizing data transfer and dramatically reducing power consumption.

Biological sensory systems, in contrast, process information directly at the sensory interface, utilizing synapses and dedicated neural circuits to encode and filter stimuli before conveying information to central processing regions18,19. For instance, retinal ganglion cells in the human visual system are adept at detecting edges and contrasts, efficiently pre-processing visual input prior to signal transmission to the brain16,20,21,22. Inspired by such biological processing, in- and near-sensor computing has emerged as a paradigm to overcome the inherent inefficiencies of conventional centralized computing architectures23,24,25,26,27. By integrating processing capabilities directly within or near the sensors, this decentralized approach enables task-specific computation at the data source, reducing energy consumption, latency, and bandwidth requirements. Moreover, it enhances data privacy and enables real-time decision-making in sensor-rich applications such as IoT networks, biomedical interfaces, and autonomous systems7,9,10,11.

Edge processing at the sensor level can be broadly categorized into in-sensor computing and near-sensor computing (Fig. 1b)23,24,28. In-sensor computing embeds computation capabilities directly within sensor pixels, either by integrating analog computational units or by incorporating multifunctional materials that simultaneously sense and process signals16,26,29. Near-sensor computing employs dedicated analog or digital computing architectures positioned in close proximity to the sensors, thereby minimizing data transfer across physical off-chip interfaces between sensor, memory, and computing units26,30,31,32,33,34. Achieving these paradigms requires interdisciplinary advances across materials science, device engineering, circuit design, system architecture, and hardware-software co-designed algorithms.

In this perspective, we provide a comprehensive overview of in- and near-sensor computing, encompassing advancements in materials, devices, circuit architectures, and algorithmic frameworks. We then present applications where computing at the sensor edge demonstrates significant practical benefits, including biomedical monitoring, autonomous systems, and artificial intelligence (AI)-driven IoT platforms. Finally, we discuss the key challenges associated with material integration, large-scale deployment, and real-world implementation, providing insights into future research directions.

Overview of in-sensor and near-sensor computing

In-sensor computing integrates computational functionality directly into the sensor, merging sensing and processing into a single unit. This approach enables simultaneous data access and computation at the point of data acquisition. In-sensor computing primarily employs analog computational methods that process raw sensor output without or with minimal analog-to-digital conversion (ADC)25. Recent advances in analog compute-in-memory (CIM) technologies, particularly through employing non-volatile memories such as memristors and memory architectures based on field-effect transistor (FET), have provided an effective platform for in-sensor computing17,29,35,36,37,38,39. These components offer programmable channel conductance capable of emulating synaptic plasticity, thereby constructing artificial neurons or synapses that process sensory information in a manner analogous to biological systems. The circuit implementations typically include analog processing arrays supporting parallel multiply-accumulate (MAC) operations essential for neural network computations. On the algorithmic front, in-sensor computing employs event-driven processing methods and specialized neural network architectures designed for real-time feature extraction and preliminary inference within sensor pixels26,40,41. Consequently, in-sensor computing is generally deployed for pre-processing, or task-specific computations that are subsequently refined through post-processing.

Near-sensor computing employs dedicated computing units in proximity to sensors, preserving a clear separation between the sensing and computing functions while enabling immediate local data processing. Such systems are designed with computational structures and integrated cache memories that minimize external memory access while efficiently executing data processing or matrix–vector operations fundamental to neural network processing42,43. However, the inherent limitation of on-chip memory in near-sensor architectures necessitates rigorous algorithm optimization. Techniques such as sparse coding, quantization, event-driven processing, weight compression, and pruning are employed to facilitate the deployment of complex models within these constrained environments. These strategies enable the efficient execution of advanced algorithms on hardware with limited resources, thereby enhancing performance and reducing energy consumption. Additionally, federated learning approaches, which distribute training across multiple sensor nodes, offer a promising route to preserve data privacy and mitigate communication overhead between sensors and central processors7.

In-sensor computing

As sensor arrays scale in resolution and complexity, the energy and bandwidth required to transfer raw data to external processors become significant constraints. In-sensor computing addresses these limitations by integrating both sensing and processing functions on a single platform, thereby enabling energy-efficient and real-time data processing. Recent approaches include integrating distinct sensing and computing components in the pixel of sensor array, and developing monolithic computational sensors in which multifunctional materials simultaneously perform both sensing and computing within a single physical device (Fig. 2). These integrative devices utilize various materials including metal oxides, organic, perovskite, and 2D materials to support modalities such as vision, tactile sensing, olfaction and acoustics. In the following sections, we present two architectural approaches for in-sensor computing: heterogeneously integrated sensors with computing units and monolithic computational sensors, along with appropriate material selections for each implementation.

Fig. 2: Overview of materials and device architectures for near-/in-sensor computing.
Fig. 2: Overview of materials and device architectures for near-/in-sensor computing.
Full size image

Functional material platforms, including 2D semiconductors, organic materials, perovskites, and metal oxides, enable sensing, computing, and memory storing at the sensor interface. In heterogeneously integrated pixels, discrete sensing and computing elements are collocated within each array unit, enabling the direct routing of stimulus-dependent signals to local processing nodes. Monolithic computational sensors employ multifunctional materials that provide simultaneous sensing and computing functions within a single device. Specific materials are chosen based on the target modality, such as pressure, light, or chemical signals, to optimize sensitivity, dynamic range, and operating bandwidth for each application domain.

Heterogeneous integrated sensor with computing unit

Implementation of in-sensor computing usually involves the heterogeneous integration of a memory component for CIM directly with a sensor, which allows the sensing output (e.g., photocurrent or piezoelectric charge) to be used to program or modulate the memory device. Memory devices for this application are generally two-terminal and three-terminal devices, with their selection tailored to meet specific computational and functional requirements.

Two-terminal non-volatile resistive-memory devices, based on transition metal oxides (e.g., HfOx, CuOx, TiOx, and TaOx)17,44,45,46,47,48,49,50,51,52 and 2D materials(e.g., MoS2 and graphene)53,54,55 retain their programmed states by converting transient electrical signals into persistent resistance states, enabled by material-level structural changes (Fig. 3a). This resistance changing process involves various mechanisms, including the reversible formation and dissolution of conductive filaments typically composed of oxygen vacancies or metal ions, or stoichiometric changes within the material layer. An applied bias drives local ion migration to form low-resistance paths, while polarity reversal of the applied voltage dissolves these conductive filaments, resulting in the restoration of the high-resistance state. Due to this reversible non-volatile switching within a simple two-terminal structure, these devices can form dense memristive crossbar arrays that enable massive in situ MAC operations according to Ohm’s and Kirchhoff’s laws.

Fig. 3: Resistance switching mechanisms and emerging three-terminal architectures for in-sensor computing.
Fig. 3: Resistance switching mechanisms and emerging three-terminal architectures for in-sensor computing.
Full size image

a Schematic illustration of the four primary switching mechanisms employed in memristive devices: filamentary switching via the formation and dissolution of conductive filaments driven by oxygen vacancy or metal ion migration, interfacial switching through modulation of Schottky or tunneling barriers at material interfaces, ferroelectric switching, where polarization reversal in a ferroelectric layer modulates channel conductance, and phase change switching, involving electric-field-induced transitions between crystalline and amorphous phases with distinct resistivities. b Representative three-terminal memristive device architectures. A ferroelectric field-effect transistor (FeFET), in which a ferroelectric gate insulator enables non-volatile control of channel conductance, and a Mott field-effect transistor (Mott FET), which utilizes electric-field-driven insulator-to-metal phase transitions, yielding hysteretic and tunable resistance states.

Volatile two-terminal resistive memory, in contrast, exhibits transient conductance changing behaviors56, fabricated by using metal oxides (e.g., SiO2, VO2, TiOx, and NbOx)57,58,59,60,61,62 and 2D materials (e.g., h-BN, WSe2, and MoS2)54,63,64. These devices employ various mechanisms underlying their volatile characteristics, including Mott, diffusive, and capacitive transitions in accordance with the material compositions and structures56. For example, the temporary conductive filaments can be formed via migration of oxygen vacancies toward the top electrode under an applied electrical bias, enabling rapid electron injection and transient conduction. During filament formation, Joule heating generated by current flow enhances this migration process by increasing ionic mobility65 while localized stress at the electrode-oxide interface also promotes oxygen vacancy migration into the filament66. Upon removal of the external bias, thermal dissipation weakens the conductive filaments. At the same time, stress relaxation at the interface accelerates oxygen vacancy redistribution, causing these filaments to decay spontaneously and returning the device to a high-resistance state67. This volatility is governed by thermal, mechanical, and electrochemical relaxation processes, with partial filament remnants possibly leaving residual conductance. Such volatile resistive memories provide a physical basis for spatiotemporal computing paradigms.

For in-sensor computing, these two-terminal resistive-memory elements enable highly compact integration and efficient current-mode analog computing directly at the sensor interface68. When the non-volatile resistive memories are incorporated at the pixel level, the stimulus-induced output from the sensor biases the memories, thereby modulating their conductance without the need for ancillary peripheral circuits. For instance, in the one-photodiode–one-resistive-memory (1P-1R) architecture, the photocurrent from the photodiode immediately programs the conductance state of its paired resistive memory, corresponding to the illumination intensity. This configuration unifies sensing and analog programming, eliminating the need for ADCs or external memory modules and enabling in-array computation within resistive-memory crossbar networks. Such in-sensor computing arrays support massively parallel vector-matrix operations, enabling direct image encoding, associative memory functions, and bio-stimulus domain reduction, all within the analog domain, with energy consumption on the order of millijoules per inference16,69.

Volatile resistive memories can also be integrated with sensory interfaces to perform temporal feature extraction or dynamic encoding of stimulus patterns, leveraging the intrinsic short-term memory and nonlinear dynamics of these devices. For instance, the photocurrent generated by a photodiode under an optical stimulus directly drives the formation of a transient conductive filament (CF) in a volatile resistive memory, thereby creating an analog memory trace of light intensity or motion that decays over time. The inherent short-term memory characteristics of the device allow the encoding of stimulus duration or frequency directly within the sensor array, obviating the need for digital conversion or external storage. In the context of reservoir computing (RC), volatile memristors serve as physical reservoirs, transforming time-varying inputs from the sensors into high-dimensional representations via stimulus-dependent conductance relaxation70,71. Within spiking neural network (SNN) architectures, these devices emulate dynamic synaptic elements or leaky-integrate-and-fire neurons in the sensory neurons by exhibiting short-term plasticity and transient conductance decay, supporting spike-timing-dependent encoding and event-driven processing for tasks such as motion detection and audio recognition72. This in-sensor neuromorphic paradigm enables functions such as event-driven vision, adaptive temporal filtering, and motion recognition at the device level; therefore, it achieves substantial energy saving by eliminating the overhead of analog-to-digital conversion and external memory access41.

Three-terminal memory devices decouple sensing and programming pathways, thereby avoiding state-drift and reliability issues inherent to two-terminal architectures (Fig. 3b). These devices are normally implemented based on FET structure with tunable materials integrated at gate or drain/source. In ferroelectric FETs (FeFETs)73,74, a ferroelectric layer (e.g., HZO, PVDF, AlXN, and 2D materials)37,75,76,77,78,79,80,81 is incorporated into the gate stack, enabling non-volatile polarization switching that shifts the threshold voltage of the transistor. This results in persistent modulation of the channel conductance without the need for auxiliary memory or logic units. Alternatively, three-terminal Mott FETs incorporate Mott materials such as VO2 at the drain, wherein bias-induced insulator-to-metal transitions produce hysteretic resistance changes while the gate independently controls channel conductance82,83,84. Such Mott devices exploit the disparate threshold voltages for the on-to-off and off-to-on transitions, resulting in a hysteresis window that functions as an intrinsic memory element. By decoupling the memory element from the sensing terminal, these three-terminal architectures allow sensor-derived signals to program the memory state directly while preserving a separate read-out path, thereby avoiding signal degradation and improving endurance. This offers a versatile platform for in-sensor computing where sensing, memory, and processing units are collocated in a single device footprint.

These three-terminal devices also support direct coupling with the sensing layer to achieve compact and adaptive in-sensor learning platforms. For instance, FeFETs with photosensitive layers wherein photogenerated carriers modulate ferroelectric polarization in the gate stack, allowing light-induced threshold shifts and channel conductance tuning without separate control circuitry85,86,87. Moreover, recent 2D material-based integration techniques enable a hybrid 2D perovskite-ferroelectric structure, which shows improved stability and compositional tunability compared with 3D halide perovskite. It combines 2D Ruddlesden–Popper (RP) perovskite materials (such as Cs2SnI2Cl2) with 2D ferroelectric materials (like α-In2Se3) in a layered structure, which induces unique optoelectronic interfaces with engineered band alignment. In these heterostructures, the RP perovskite acts as a light absorber that modulates the polarization states of the ferroelectric layer, thereby enabling direct optical gating of the transistor88. These heterogeneously integrated FeFET architectures combine the high responsivity of perovskites with polarization control, in which environmental stimuli directly program memory states and enable stimulus-driven processing at the sensor edge.

Emerging multifunctional materials for monolithic in-sensor computing

Recent advances in material engineering have developed monolithic computational sensors, in which sensing and processing co-occur within the same device, or even a single functional layer, rather than relying on discrete sensor and compute elements. To realize such unified sensor-compute devices, the constituent materials must combine high sensitivity to the targeted stimulus, electronically tunable properties (for instance, via band structure or defect-state engineering) that permit on-site signal modulation, and the capability to store non-volatile or metastable states so as to encode memory directly in situ. By eliminating the interconnects between sensing and computing units, monolithic architectures overcome the spatial overhead and fabrication complexity of heterogeneously integrated systems, instead converting external inputs directly into processed information without requiring separate computing elements.

Organic materials have been employed for in-sensor computing due to their synthetically tunable molecular structure that allows precise engineering of energy levels, bandgaps, and charge transport pathways89,90,91. This structural flexibility allows direct modulation of electronic properties in response to optical, chemical, or mechanical stimuli through mechanisms including charge generation and transport modulation92,93,94,95,96. For instance, organic mixed ionic–electronic conductors (OMIEC) with π-conjugated backbones facilitate transduction between ionic and electronic signals to be used for chemical sensing applications89,97,98,99. Thus, OMIEC-based organic electrochemical transistors (OECTs) are capable of multimodal sensing, memory, and computational tasks within a single device100,101. By controlling ionic doping within their crystalline-amorphous microstructures, OECTs switch between transient sensing mode and in-memory computing mode, thereby unifying receptor and synaptic functionalities. Multifunctional organic frameworks further extend this paradigm by offering tunable responses conducive to integrated sensing-computing operation. Such advances have enabled monolithic organic sensor-compute layers that execute on-device pattern recognition. For example, ionogel with gas-solvating abilities can efficiently capture chemical species, while conducting polymers transduce these interactions into measurable electrical signals102. By combining such materials, monolithic systems can integrate chemical sensing with signal processing, enabling chemosensory computing. However, limitations such as environmental instability and inherently low charge mobility compared to many organic systems restrict their long-term stability and scalability compared to inorganic counterparts. Nonetheless, the vast compositional diversity and solution-processable nature of organic materials provide a compelling route toward compact and low-power in-sensor computing platforms.

Perovskite materials, defined by ABX₃ stoichiometry, constitute a highly tunable class of compounds whose physical and electronic properties can be systematically tuned through targeted substitution at the A, B, or X lattice. Among these lattice positions, the X-site anion not only determines the perovskite family, such as oxides when X = O2 and halides when X = Cl, Br, or I, but also governs critical properties such as bandgap, charge carrier mobility, ion diffusivity, and optical absorption, thereby setting the responsiveness of the material to external stimuli. Oxide perovskites are characterized by ferroelectricity, resistive switching behavior, and high environmental stability103. Moreover, their wide bandgaps make them inherently responsive to ultraviolet (UV) illumination104, which facilitates optical signal modulation via polarization-induced photoconduction and vacancy-modulated conduction pathways105,106. Halide perovskites, in contrast, combine strong photoresponsivity with mixed ionic–electronic conduction, supported by their soft lattice and defect-tolerant structure107. Upon illumination, they generate photocarriers and facilitate halide ion migration, leading to dynamic and history-dependent conductance modulation. These properties allow direct mapping of optical input into electrical states, making halide perovskites highly suitable for optoelectronic in-sensor computing108,109,110,111,112,113. Despite their promise for in-sensor computing, perovskite materials face several challenges that hinder practical implementation in in-sensor computing. These include environmental instability, limited compatibility with conventional photolithographic and solvent-based processing, and device-to-device variability. In particular, halide perovskites are highly sensitive to moisture and light, which compromises long-term stability114. Addressing these limitations requires advances in material design, encapsulation strategies, and processing techniques to ensure reliable performance under real-world operating conditions.

Two-dimensional (2D) materials, whether as monolayers or few-layer van der Waals stacks, offer unique advantages for in-sensor computing115,116. Their dangling bond-free surfaces and atomic-scale thickness enable strong coupling with external stimuli such as light, ions, or molecules117,118,119,120. These inputs directly modulate charge transport across the entire channel, rather than being confined to interface regions as in bulk materials. Consequently, field-driven mechanisms such as photogating, ionic gating, and electrochemical doping can be seamlessly integrated, unifying sensing and processing within a single device118,121,122. Beyond electrostatic modulation, many 2D semiconductors possess direct bandgaps within the visible range and exhibit strong light–matter interactions and high exciton binding energies, enabling efficient photocarrier generation even under low illumination123,124. While these properties support sensitive optical detection, monolithic 2D systems usually lack intrinsic computational functionality, motivating vertical or lateral 2D heterostructures that spatially separate receptor and processing functions within stacked van der Waals architectures117,120,125,126. Despite their promise, practical deployment of 2D materials in in-sensor computing is hindered by material-level imperfections such as grain boundaries, interface disorder, and defect fluctuations, which induce device-to-device performance variation. Moreover, the intrinsically low optical absorption of monolayer materials limits responsivity, necessitating photonic enhancement strategies such as plasmonic coupling, optical cavities, or multilayer stacking to ensure sufficient signal strength. Addressing these challenges will be critical to unlocking the full potential of 2D materials for compact, high-performance in-sensor computing platforms.

The implementation of 2D materials in computing systems has been constrained by the need for high-quality single-crystal films and the challenges associated with their large-scale synthesis and integration. Recent advances, however, have enabled wafer-scale, high-throughput growth of single-crystal 2D materials, overcoming previous manufacturing bottlenecks127. While transfer methods still require refinement, the ability to fabricate continuous and high-quality films at wafer scale has significantly accelerated research into device architectures based on 2D materials. In parallel, metal halide perovskites have emerged as attractive candidates due to their compatibility with large-area manufacturing at low cost, while they suffer from environmental instability. Perovskites no benefit from advanced passivation and encapsulation techniques, enabling devices to retain their performance after extended operation, with demonstrated stability exceeding several years128. Additionally, their successful monolithic integration with CMOS technologies further broadens their potential applications129.

Despite these promising developments, critical challenges remain, including device-to-device performance variability and the complexity of integrating multimodal control within individual pixel elements. Such complexity can adversely affect energy efficiency; thus, it requires continued innovation in materials engineering and system-level design. To advance toward scalable and reliable in-sensor computing platforms, future efforts must focus on improving intrinsic material stability, enhancing interface and defect control, and developing fabrication processes compatible with industry standards. Through such multidisciplinary refinements, in-sensor computing can fulfill its promise of real-time, energy-efficient intelligence at the edge.

Near-sensor processing for energy-efficient edge processors

Near-sensor computing architectures place dedicated post-processing and artificial intelligence (AI)-inference modules in close proximity to the sensor33, thereby minimizing data movement and enabling local feature extraction and real-time decision-making without dependence on a cloud server. These systems have evolved from simple low-level pre-processing to fully realized on-device AI computation, supported by optimized solid-state circuits and architectures that balance computational efficiency, power consumption, and integration feasibility. Central to this platform are hardware accelerators, such as neural processing units and CIM arrays, that execute matrix–vector operations with minimal latency and power overhead, further enhancing energy efficiency at the edge. Moreover, embedding on-chip learning capabilities endows the sensing system with adaptive behaviors, allowing models to be updated in situ based on local stimuli without the need to transmit raw data to cloud servers.

Computing architectures and solid-state circuits for machine learning

One of the primary motivations for near-sensor computing is the reduction of power and latency costs associated with continuous data transfer between sensor and central computing units (Fig. 4). As sensor resolutions and data rates increase, traditional von Neumann architectures suffer from a memory wall, where energy-expensive data movement dominates system budgets. By relocating computing elements adjacent to the sensor, either within the memory hierarchy (near-memory) or inside the memory devices themselves (in-memory), near-sensor approaches dramatically enhance energy efficiency.

Fig. 4: Energy efficiency across computing architectures.
Fig. 4: Energy efficiency across computing architectures.
Full size image

Relative energy and computational efficiency of three architectural classes are illustrated with a focus on memory-bound applications such as neural networks. The bottom section shows the traditional von Neumann architecture, where significant energy is consumed on transferring data between computing and memory units. The middle section highlights emerging near- and in-memory computing architectures that collocate storage and logic to reduce off-chip data transfer overheads. The top section introduces brain-inspired, neuromorphic computing, which exploits event-driven spike-based data processing at the sensor, capable of dramatically minimizing data communication.

In near-memory computing, computational and storage units are partitioned into compact modules and collocated to minimize the energy overhead associated with transferring data between source and sink. This approach retains the same cells employed in traditional von Neumann architectures, allowing compatibility with existing dataflow design and the digital circuit design tools. A classic example of near-memory computing is the systolic array, which interleaves input/output storage with computation units to significantly reduce the energy cost of data movement. This architecture excels at repetitive operations with regular dataflow patterns and in matrix-centric applications, such as matrix multiplication130 and deep convolutional neural networks131. Moreover, by adopting weight-stationary or pipelined configurations, systolic arrays further reduce data transfer132 and elevate throughput, surpassing a traditional von Neumann machine. However, because systolic arrays operate in the digital domain, an inherent separation between analog input and digital processing imposes an energy penalty through signal transformation using analog-to-digital/digital-to-analog converters (ADC/DAC) and data transfers.

In contrast, in-memory computing architectures embed computation directly within memory, such as dynamic random-access memory (DRAM) and static random-access memory (SRAM), and emerging analog memory technologies to perform vector-matrix multiplications in situ, thereby eliminating interconnect energy costs. SRAM-based designs leverage multi-bit pulse-width modulation133 or gate modulation techniques to execute analog MAC operations entirely within the memory array134, and state-of-the-art 3 nm process nodes have shown significant improvements in tera operations per second per watt (TOPS/W) efficiency for such systems135. Additionally, non-volatile memory devices including floating-gate transistors, resistive random-access memory (ReRAM) crossbars, and FeFET arrays have been explored to further increase computational density by storing multi-bit weight in situ, while delivering low-current analog readout that further reduces energy consumption136,137,138. These advances confirm the transformative potential of coupling memory and logic at the physical level to achieve ultra-low-power and high-throughput AI acceleration at the edge139,140.

Neuromorphic computing architecture is inspired by the event-driven nature of biological neural systems, offering a promising pathway toward achieving energy-efficient and real-time information processing. By emulating the operational principles of neurons and synapses, where spikes trigger state changes and information propagation, these architectures achieve adaptability, sparsity, and real-time learning capability that closely mirror their biological counterparts. These processors remain quiescent until an input event occurs, at which point computation is activated, thereby minimizing static power dissipation40. Design of these neuromorphic systems includes both fully digital and mixed-signal implementations. Digital neuromorphic systems usually employ asynchronous design techniques, eliminating the need for global clocking. In mixed-signal neuromorphic architectures, analog neuron circuits are integrated with non-volatile memory devices to perform local computation directly at the memory cell level, delivering high density and reduced area footprints. These neuromorphic cores interface seamlessly with sensors, such as dynamic vision cameras that emit spikes only upon pixel brightness changes40 or silicon cochleae that transduce acoustic inputs into asynchronous spike trains, eliminating the need for energy-costly analog-to-digital conversion. This event-driven design paradigm allows real-time processing with minimal power consumption.

At the device level, the artificial neuron constitutes the fundamental processing element. Implementations of CMOS-based analog circuits, digital circuits, and non-Si-based devices141, mimic key neural behaviors, such as integration, thresholding, and spiking, into hardware primitives. Analog CMOS neurons are preferred for their superior area and energy efficiency, as well as their ability to naturally capture the temporal dynamics of neuronal activity. Commonly spiking neuron models realized in hardware include the Integrate & Fire, Leaky Integrate & Fire142, and Hodgkin-Huxley models. Additionally, the integration of CMOS-compatible non-volatile memories such as floating-gate transistors and ReRAM will facilitate tight co-placement of synaptic weight storage with neuronal circuits for fully integrated and adaptive neuromorphic systems that encode both synaptic and neuronal states on chip142,143.

On-chip learning

In-memory computing and neuromorphic systems-on-chip (SoCs) based on emerging devices have demonstrated exceptional energy efficiency, making them attractive for edge and IoT applications. However, these architectures often face challenges stemming from device-to-device mismatch, process-induced variation, and analog non-idealities inherent to the non-volatile memories144,145. Analog devices inherently introduce various noises, including thermal fluctuations, shot noise, and external electromagnetic interference, which cause computational inaccuracies and degrade signal fidelity during processing. In addition, device drift poses a significant challenge, arising from temporal variations in sensor characteristics induced by temperature fluctuations, material aging effects, environmental instability, and stress-induced degradation. These issues degrade computational accuracy, often necessitate extensive per-chip calibration or iterative tuning to achieve reliable operation across varying conditions144,145.

On-chip learning, where synaptic weights and circuit parameters are adaptively updated in real-time on the device, presents a promising solution to mitigate these sources of error by dramatically compensating for mismatch and drift during normal operation. By embedding learning directly within the hardware loop, this approach enhances robustness and accuracy even in the presence of significant process variation, eliminating the need for expensive off-chip retraining or per-device tuning144,145,146.

Beyond addressing mismatch and variation, on-chip learning offers a critical advantage in enhancing privacy for IoT and edge devices. By confining data processing and model updates to the local device, it minimizes the need to transmit sensitive user data to remote servers for processing. This local learning capability reduces the exposure of user data to potential breaches during transmission or storage, aligning well with the growing demand for privacy-preserving machine learning solutions. Furthermore, on-chip learning enables personalized adaptation without compromising data security, a key consideration for applications such as wearable health monitoring, smart home devices, and autonomous systems.

Despite these advances, scaling near-sensor circuits for widespread deployment presents several challenges. Memory bandwidth bottlenecks, thermal management, and the complexity of heterogeneous integration remain key issues that impede large-scale implementation147. To advance near-sensor computing, future research should focus on heterogeneous 3D integration148, where sensor arrays, AI accelerators, and memory units are vertically stacked to reduce footprint and improve computational efficiency137. Additionally, adaptive circuit architectures, capable of modulating their power consumption and computational precision based on workload requirements, will play a crucial role in enabling near-sensor intelligence at scale144,145. As on-chip learning techniques mature and integration challenges are surmounted, these systems are poised to redefine edge computing paradigms, delivering low-latency, high-efficiency inference for applications including biomedical monitoring and industrial automation without reliance on cloud infrastructure.

Algorithmic frameworks for in-sensor and near-sensor computing

In the previous sections, we discussed in- and near-sensor computing hardware paradigms that decentralize computation by embedding processing directly within or adjacent to sensing units. This shift transforms conventional sensing architectures where data is sensed, stored, and then processed centrally, into a unified framework where computation occurs at the sensor interface itself. Unlike traditional centralized systems capable of executing general-purpose algorithms, these distributed and resource-constrained architectures demand customized algorithmic solutions that operate within strict hardware limitations, such as limited reconfigurability. Since computation occurs on compact and constrained hardware platforms, the algorithms must be reconceived to align with decentralized hardware architectures, maximizing computational efficiency and enabling robust performance in compact sensing environments.

Recent hardware advances for sensor edge computation have enabled integrated device and circuit architecture that supports parallel processing for data-driven applications and minimizes the communication overhead between the computing components. The algorithmic frameworks supporting these architectures must specifically address the constraints of edge environments. We present the current algorithmic landscape with a particular emphasis on artificial intelligence (AI) and machine learning (ML) techniques that accommodate the unique capabilities and limitations of sensor-integrated computing.

In-sensor computing algorithms

In-sensor computing embeds processing directly within the sensor, enabling preliminary data reduction before any off-chip transfer. However, the on-pixel circuitry usually has limited computational bandwidth; it is thus critical for the algorithms to be efficient. Lightweight signal processing algorithms embedded within sensors often perform edge detection, principal component analysis, and simple filtering to efficiently preprocess raw sensing data149,150. Moreover, recent AI-based approaches, such as dimensionality reduction methods using autoencoders, have been adapted for in-sensor implementations151,152,153. By encoding input data into a compressed latent representation and subsequently reconstructing it, autoencoders reduce the volume of data that needs to be transmitted or stored while preserving critical features, which is particularly beneficial in resource-constrained environments.

A complementary strategy exploits event-driven operation to minimize needless computation and data transfer. Here, sensors remain quiescent until they detect a change in the environment, at which point processing is triggered only when significant events occur. This approach enhances energy efficiency and reduces data redundancy by focusing on pertinent information. In electronic skin applications, for instance, event-driven in-sensor computing has been employed to compress inactive intervals, leading to more efficient data handling154. Likewise, neuromorphic sensors, such as dynamic vision sensors, emit spikes only upon pixel-level brightness shifts, coupled with spike-based algorithms that emulate biological processes155. These event-driven algorithms can be integrated with AI-inspired latent-space models to achieve both sparsity and expressiveness152.

To process large amounts of sensing data under tight resource budgets, on-node data compression and optimized programming at the sensors become integral techniques for viable edge computing. Methods such as compressive sensing156 and sparse coding157 have been tailored for sensors to reduce the data dimensionality while preserving critical information, being pivotal for resource-constrained environments. For example, in vehicular sensor networks, compressive sensing-based data harvesting methods have demonstrated the information volume reduction that sensory nodes must transmit to the fusion center, where data analytics occur. This approach leverages two principles to compress the sensing data, sparsity and incoherence: sparsity focuses essential information in a small subset of the original sensing signal, and the incoherence indicates low correlation between data samples to ensure uniqueness of them. These allow fewer data acquisition from sensors than conventional sampling methods, while ensuring accurate data recovery even in the presence of missing measurements158. Similarly, sparse coding aims to represent high-dimensional data as a sparse linear combination of basic elements from a dictionary, a collection of fundamental components that capture essential features of the original data. Recently, the Hierarchical Riemannian Pursuit159 has demonstrated improved performance regarding both speed and accuracy for the recovery process by employing coarse and fine learning of the dictionary in a wireless sensor network. These approaches allow the sensor to implement feature extraction and reduce the data dimension, while it can be restored and analyzed in a near-sensor or further post-processing computing system.

Near-sensor computing algorithms

Near-sensor computing collocates computational resources with sensors, often at the edge of the network. This paradigm leverages AI and ML to enable advanced analytics, pattern recognition, and decision-making without offloading data to remote servers. Yet the deployment of state-of-the-art AI/ML models on the edge is hampered by the limited computational and memory resources of edge devices. To address this, algorithms for quantizing neural network weights and activations have been extensively developed160. By reducing the precision of these parameters from 32-bit floating-point to lower-bit representations such as 4-bit integers, quantization significantly decreases the memory footprint and computational requirements of models, facilitating efficient in-sensor data processing without substantial loss in accuracy161. Additionally, post-training quantization (PTQ) and quantization-aware training (QAT) have been introduced that maintain low-bit weights and activations, while suppressing the impact of quantization noise162. These studies underscore the critical role of quantization in optimizing neural network deployment within sensor networks, balancing the trade-off between computational efficiency and model performance.

Another major advancement is federated learning (FL), which enables a decentralized learning paradigm across edge nodes to collaboratively train ML models without transferring raw data to a central server163. Traditional centralized approaches face challenges related to energy consumption, bandwidth limitations, and privacy risks in sensor networks. To address these, FL frameworks integrate energy-harvesting capabilities into FL to allow resource-constrained sensor nodes to participate in training only when they have sufficient energy164. However, this framework presents unique challenges, such as time-varying device availability and its impact on model convergence. A novel convergence analysis shows that maintaining a uniform client scheduling strategy can mitigate the adverse effects of unpredictable energy-harvesting conditions, ensuring optimal learning performance. Another key innovation is to combine FL with split learning (SL), to simultaneously reduce communication overhead and computational burden on sensor nodes (Fig. 5). A recent demonstration introduces an auxiliary network at the client side, allowing for local model updates and significantly reducing communication costs between clients and the central server165. The framework maintains only a single server-side model, making it highly scalable for large-scale sensor deployments. These advances highlight how federated learning is being adapted to overcome the unique constraints of sensor networks, making real-time, privacy-preserving, and resource-efficient learning feasible.

Fig. 5: Illustration of federated split learning for in- and near-sensor computing.
Fig. 5: Illustration of federated split learning for in- and near-sensor computing.
Full size image

Illustration of federated split learning for in- and near-sensor computing. Both training and inference phases are included. During the training phase, the complete model is divided into server-side and client-side sub-models. The Edge Server coordinates with the sensors to perform client-side model updates through a SplitFed training mechanism165, and the Fed Server performs server-side model aggregation. For the inference phase, sensors directly perform a forward pass of the measured data and send the smash data to the Edge Server, which completes the inference.

In- and near-sensor computing have emerged as critical enablers of intelligent and resource-efficient systems. However, hardware-algorithm co-design in near-sensor computing faces fundamental constraints that limit the complexity and adaptability of deployed algorithms. A primary limitation is the restricted memory capacity in edge devices, which typically ranges from several kilobytes to a few megabytes, substantially less than the gigabyte-scale requirements of complex neural networks166. In addition, analog computing hardware requires dedicated low-level programming approaches that diverge from conventional digital implementations, complicating software integration167. Power constraints in battery-powered systems add an additional challenge, requiring careful balance between computational performance and energy efficiency. To address these issues, a variety of mitigation strategies have been explored. Quantization techniques, for instance, reduce numerical precision from 32-bit floating-point to 4-bit representations, significantly decreasing both memory usage and computational load160,161. Methods such as post-training quantization and quantization-aware training help maintain model accuracy while suppressing quantization-induced noise162. Further optimization techniques, including weight compression, pruning, and sparse coding, facilitate efficient model deployment under the stringent resource-constrained edge environments156,157.

Practical deployment of in- and near-sensor computing systems encounters additional challenges arising from device-level non-idealities, including sensor drift, analog noise, and fabrication-induced process variations. Although conventional software-based techniques provide customized solutions for specific computing architectures, practical integration requires comprehensive strategies to compensate for these intrinsic limitations across diverse operational conditions. Recent advances in drift-aware feature learning demonstrated the effectiveness of autoencoder-based pre-processing in compensating for signal degradation induced by sensor drift. Complementary approaches, such as CorrectNet, address device-level variability in analog computing systems by implementing targeted error suppression and compensation techniques168,169. These developed techniques are versatile and can be broadly applicable to various analog devices used in in-sensor and near-sensor computing units that suffer from drift, analog errors, and noise. These techniques exhibit broad applicability across a range of analog device platforms used in in-sensor and near-sensor computing, where noise and analog imperfections are prevalent. Importantly, the integration of such algorithmic approaches through hardware-software co-design frameworks has enabled emerging on-chip learning capabilities, paving the way for more robust and adaptive edge intelligence systems.

The synergy between AI/ML algorithms and innovative hardware architecture continues to drive advancements in this domain. By addressing existing challenges, these paradigms hold the potential to revolutionize edge intelligence, paving the way to real-time, low-power intelligence in IoT, autonomous systems, and beyond.

In-sensor and near-sensor applications

By integrating computation directly within or adjacent to sensing modules, in- and near-sensor computing architectures enable immediate signal processing at the point of data acquisition. This paradigm addresses critical challenges in data-intensive domains by reducing interface bottlenecks, minimizing data movement, and enabling real-time processing. Such architectures have demonstrated significant potential across various sensor-rich applications such as biomedical systems, human–machine interfaces (HMI), and IoT–based environmental monitoring (Fig. 6a). By minimizing latency and energy consumption, these computing approaches represent a transformative advancement in the design of intelligent sensing systems. Here, we present applications of in- and near-sensor computing and provide insights into emerging research directions and future applications.

Fig. 6: Application of edge intelligence and its roadmap.
Fig. 6: Application of edge intelligence and its roadmap.
Full size image

a A diagram illustrates sensor edge computing systems across different modalities and their applications. Various sensor types (optical, tactile, chemical, thermal, and multimodal) combined with in- and near-sensor computing units support diverse applications, including healthcare monitoring and robotics, etc. b A radar chart analysis comparing the performance of three distinct computing architectures, of performance comparison across three computing paradigms—conventional separated sensor-processor and computing systems, in-sensor computing, and near-sensor computing, across systems. The evaluation includes six critical key metrics: power efficiency, degree of sensor-compute integration level, computational versatility, bandwidth efficiency, task complexity support, and responsiveness (low, i.e., latency). The chart highlights the trade-offs and advantages associated with each paradigm in the context of edge intelligence. c A computational-complexity roadmap for edge intelligence shows that the applicable computational tier of in- and near-sensor computing in the system. At the lowest tier, in-sensor computing performs elementary pre-processing of raw sensing data directly within the pixel or sensor element; at the intermediate tier, near-sensor computing systems execute compact artificial-neural-network (ANN) inference on preprocessed data; and at the highest tier, full-scale ANN model requires advanced hardware and system-level solutions such as 3D integration and federated split learning protocols to manage the dataflow and training requirements of comprehensive deep learning workloads.

On-site medical diagnostics

In biomedical applications, sensors capture a variety of physiological signals, such as electrical, mechanical, and chemical, that are indispensable for disease diagnosis, patient monitoring, and therapeutic interventions. However, the sensitive nature of these bio-signals raises significant privacy concerns when processed via a conventional cloud-based system, exposing personal health data to potential breaches and unauthorized access. By contrast, in- and near-sensor computing architectures perform data analysis directly at or adjacent to the data acquisition site, thereby obviating the need to transmit raw data to remote servers. This local processing not only preserves patient confidentiality but also enables real-time interpretation of vital signs. Therefore, these paradigms have given rise to a wide range of on-site diagnostic platforms, wearable health-monitoring tools, electronic skin interfaces, and advanced prosthetic systems, all of which benefit from secure, low-latency, and context-aware signal processing.

The integration of edge computing systems embedded in diagnostic platforms has dramatically accelerated the speed and lowered the power requirements, particularly for point-of-care and epidemic-control applications. One example employs indium gallium zinc oxide (IGZO) field-effect transistor coupled to a microfluidic sampling module, with an on-chip artificial-neural-network accelerator for near-sensor inference31. This system simultaneously detected both viral spike proteins and host antibodies within a single assay cycle of less than 20 min, achieving detection limits on the order of 1 pg/mL and classification accuracy exceeding 93%. By performing all critical signal processing and pattern recognition at the sensor periphery, the platform circumvents latency and privacy concerns inherent to cloud-based workflows, while reducing energy consumption by over orders of magnitude compared to conventional lab-based assays. Such advances exemplify the transformative potential of edge-embedded AI for rapid, sensitive, and secure biosensing in the context of emerging infectious diseases.

Another notable implementation in the biomedical domain is a photonic in-sensor computing system that demonstrates multimodal in-sensor computing for biomolecule classification. This approach addresses the spectral overlap and thermal sensitivity challenges in conventional biomedical sensing techniques170. In the demonstrated system, a photonic multimodal spectroscopic sensor extracts refractive index (n and k) spectral signatures and feeds them into a convolutional neural network embedded in a silicon photonics processor. The system achieves real-time classification of protein species across 45 distinct classes across different temperatures with an accuracy of 97.58%. This integrated photonic in-sensor computing approach not only minimizes data transfer and associated energy costs but also enables rapid, edge-resident biomolecular diagnostics with performance comparable to that of centralized laboratory platforms.

Wearable health monitoring

Electronic skin is an emerging approach that integrates soft, multimodal sensors with embedded AI algorithms that closely emulate biological tactile sensing, both in terms of spatial resolution and adaptive response behavior. For instance, a nanowire-based piezoelectric memory sensor has achieved a spatial resolution of 60 nm, enabling on-chip force-image pre-processing such as contrast enhancement that accelerates downstream recognition tasks by 34.6%171. In addition, the combination of piezoresistive and piezoelectric sensors has been leveraged to implement synaptic-learning mechanisms directly on the sensor, converting tactile inputs to neural spike patterns that mimic biological mechanoreception. In one demonstration, an artificial finger classified 20 distinct textile textures with 99.1% accuracy using deep learning techniques172.

Moreover, by localizing signal processing at or immediately adjacent to the sensor, these systems minimize latency in motion classification and feedback generation, essential for naturalistic prosthetic control. A distributed edge neural network, for instance, has been developed to fuse surface electromyography, strain, and inertial signals in situ, running on ultra-low-power chips (~20 µW) ideal for energy-constrained wearable or rehabilitative prosthetic devices173. Similarly, electrolyte-gated transistor-based neuromorphic systems have been shown to distinguish Parkinsonian gait from normative walking patterns at the sensor edge, offering promise for early diagnosis and adaptive assistance in movement disorders174.

Human–machine interfaces

HMIs enable seamless interaction and bidirectional communication between users and machines through multimodal input and output channels. By embedding in- and near-sensor computing, these systems can interpret user commands and environmental cues in real time at the point of acquisition. This localized processing dramatically reduces latency and enhances responsiveness, which are crucial for applications such as motion tracking, touch recognition, gesture control, and wearable interface devices. These capabilities are essential for providing instantaneous feedback, thereby forming the basis of intuitive and efficient human–machine collaboration.

Physical human interaction

Motion-driven human–machine interfaces demand precise acquisition and interpretation of dynamic physical signals such as body movement, gestures, and temporal sequences to support applications such as VR/AR control, wearable tracking, and pattern recognition. For instance, a full-body sensing suit employing topographic MXene-based piezoresistive sensors has demonstrated embedded unsupervised learning via k-means clustering175. This enables precise posture reconstruction across all joint deformations with minimal sensor count, thereby achieving low-latency avatar control in virtual environments. Yet real-world motion encompasses more than mechanical deformation alone. To address this, floating-gate phototransistor arrays have been demonstrated that fuse visual, tactile, and auditory inputs directly on-chip176. Their tunable spectral responsivity and adjustable threshold voltage allow the system to learn and associate spatiotemporal events, such as synchronizing music with dance, without requiring timestamping references. Building further on temporal processing, analog reservoir computing systems utilizing rotating-neuron circuits demonstrated motion-based time-series prediction and recognition177. By employing differential pairs and cross-coupled amplifiers, these systems capture intricate time-series patterns such as handwriting strokes and gestures, enabling ultra-low-power operation at sub-microwatt levels.

Touch-based recognition interfaces require precise transduction and interpretation of subtle surface interactions, such as fingerprint ridges, texture gradients, or contact pressures, with minimal latency. A notable example is an in-sensor reservoir computing system that integrates deep-ultraviolet GaOx-based optical synapses with a back-end memristor array for latent fingerprint recognition178. In this design, the GaOx optical synapse layer encodes temporal dynamics of incoming tactile signals into analog photonic information streams, which are then projected nonlinearly across the memristor array to perform spatial classification. This resulting monolithic platform exhibits in situ inference, achieving classification accuracy over 90% while operating with ultra-low energy consumption, demonstrating the feasibility of highly integrated, energy-efficient touch-based recognition platforms.

Robotic tactile intelligence

In adaptive robotic systems, rapid integration of multimodal sensory inputs with motor responses is essential for real-time interaction in dynamic environments. To this end, neuromorphic edge architectures mimic biological sensorimotor circuits by transducing tactile, proprioceptive, or inertial signals directly into actuation commands. For instance, organic electrochemical transistor-based artificial nerve circuits have been devised to couple tactile sensors with dendritic spiking processing179. This system enables closed-loop slip detection and low-voltage reflexes feedback that closely emulates mechanoreceptor functionality. A similar approach uses a Nafion-based ionic memristor sensor integrated with piezoresistive elements to replicate synaptic plasticity in robotic epidermis180, enabling the recognition of tactile patterns and supporting memory-driven grasp adjustments in soft robotic hands. Beyond the tactile domain, orientation-aware neuromorphic systems have been implemented for aerial platforms like drones. SnS2-based memtransistor arrays configured as analog Kalman filters provided trajectory estimation through low-power sensor fusion181. By performing direct hardware-based noise filtering and integrating complementary data from gyroscope and accelerometer sensors, this system reduced power consumption to 25% of the conventional software implementations. These integrated approaches, spanning organic electrochemical circuits, ionic memristive skins, and solid-state analog filters, demonstrate a convergent strategy for embedding adaptive, low-latency sensorimotor intelligence directly within robotic hardware.

Interactive wearable devices

Wearable technologies impose stringent requirements on energy efficiency, thermal dissipation, and form factor miniaturization to support advanced on-device intelligence. For instance, AI-augmented smart glasses leverage an ultra-compact object detection model integrated within a system-on-chip architecture182. Operating at 18 frames per second while consuming less than 100 mW, the system supports real-time face and object recognition entirely on local devices through compressed and quantized CNNs, enabling extended battery life and always-on vision. For more computationally intensive AR/VR applications, adaptive near-sensor architecture, which is a modular deep learning processor, outperforms the widely used NVIDIA Deep Learning Accelerator under sub-mm² area constraints, achieving 42% lower energy and 84% smaller area in dynamic workload testing32. Its lightweight CNN core operates near the sensor array and supports dynamic power scaling in response to varying computational demands, enabling efficient real-time multimodal sensor fusion for power-efficient mixed-reality experiences.

IoT & environmental sensing

In IoT and environmental sensing applications, the deployment of edge computing is pivotal for enabling real-time analysis of physical and chemical signals, such as gas concentration, ambient light intensity, or visual scenes, directly at the sensor node. By processing data locally, edge computing enables low-latency environmental analysis in resource-constrained settings, reducing dependence on cloud connectivity and extending operational lifetime.

Gas detection with AI

Rapid and spatially resolved detection of hazardous chemicals such as toxic gases and pollutants is imperative for effective environmental monitoring. To meet this demand, in- and near-sensor computing architectures perform localized analysis of chemical signatures, thereby reducing response latency. Such edge-enabled systems support continuous tracking of analyte concentration and precise localization of emission sources using compact and energy-efficient sensor nodes. For instance, an artificial olfactory system employs AlGaN/GaN high electron mobility transistors with a graphene gate electrode and Pd nano islands to achieve highly sensitive nitrogen dioxide detection30. A dedicated near-sensor microprocessor then executes a neural network model, augmented by Bayesian optimization, to reconstruct spatiotemporal gas distributions and pinpoint leak origins in real-time capabilities. Another approach integrates a silicon nanowire FET (Si-NW FET) sensor with an SNN for in-sensor gas classification183. In this design, the Si-NW FET integrated with catalytic metal nanoparticles serves as both a gas sensor and a spiking neuron unit, transducing analyte interactions into discrete spike signals for downstream SNN processing. This unified sensor-neuron design supports low-power detection of gases such as H2 and NH3 without external computing resources, facilitating its use in miniaturized safety nodes in IoT deployments.

Image processing at the edge

In vision systems, the growing reliance on AI-driven image and video analysis has elevated the importance of low-latency and energy-efficient processing at the sensor periphery. To this end, in- and near-sensor computing architectures integrate convolutional and inference engines directly adjacent or within the photodetector array. This includes vision tasks from object detection and tracking to feature extraction and classification, to execute locally stringent power and bandwidth constraints. A representative example is the stereoscopic artificial compound eye system, which directly demonstrated both in-sensor memory encoding and near-sensor neural processing for 3D object tracking26. They emulated the visual system of a praying mantis using two hemispherical focal plane arrays (FPAs) composed of 16 × 16 pixels, each containing an InGaAs photodiode integrated with a HfO2-based ReRAM (1P-1R). The 1P-1R array architecture supports optical programming and one-shot readout, enabling in-sensor data compression and spatiotemporal memory. The encoded data are directly input into a federated split neural network (FSNN), which performs near-sensor regression to estimate 3D position and velocity vectors of moving objects. Another example is an electrostatically doped silicon photodiode configured as a 3 × 3 array, which has been shown to perform programmable convolutional filtering, including edge detection and spatial filtering, in-sensor, demonstrating a scalable, CMOS-compatible route to pre-processing raw image data184. Another example, a network-embedded inference framework such as NetPixel repurposes programmable switches to carry out image classification in-sensor, offloading computation from could servers and reducing inference latency185. Monolithic 3D integrations of IGZO-FET photodiodes, RRAM-based analog CIM, and CMOS logic show another paradigm, achieving ultra-low-power keyframe extraction directly on the vision chip186. Additionally, smart image sensors that combine in-pixel frame differencing with on-chip object localization enable real-time motion detection at extremely low power187. Collectively, these approaches confirm the viability of embedding increasingly sophisticated AI vision pipelines at the edge, thereby supporting autonomous decision-making, privacy preservation, and broad deployment across surveillance, AR/VR, and mobile robotic applications.

Summary and outlook

In- and near-sensor computing paradigms provide potential to address fundamental inefficiencies in conventional computing architectures by relocating data processing to or immediately adjacent to the point of acquisition, thereby eliminating redundant data transmission, reducing latency, and lowering energy consumption. Both in- and near-sensor computing paradigms enhance computing efficiencies, providing benefits in different hierarchies in the overall computing system. In-sensor computing delivers these advantages through heterogeneous integration of compact processing units within each sensing pixel or exploiting multifunctional materials that simultaneously transduce and process signals, thus enabling immediate pre-processing such as network-in-memory operations, neuromorphic designs with CMOS-based artificial neurons, and event-driven computations and analog feature extraction without analog-to-digital conversion. However, this approach involves specifically designed hardware architectures that limit its applicability to specific computational tasks and sensing modalities. Near-sensor computing approaches co-locate AI accelerators, such as quantized neural-network inference engines and lightweight matrix–vector cores, in close proximity to sensor arrays, supporting on-device real-time inference while minimizing the energy cost of parameter transfers. Unlike in-sensor computing, near-sensor computing combines standard sensor hardware with a proximate processing unit, thereby delivering computational versatility with sufficient resources for complex neural network applications. On-chip learning capabilities further enhance system robustness and privacy by enabling local adaptation without transmitting sensitive data to external servers.

Complementary algorithmic frameworks have been designed in parallel to harness these hardware advances within the resource-constrained sensing environments. Quantization techniques such as PTQ and QAT utilize low-bit weights and activations, reducing memory footprints and compute demands with minimal degradation in accuracy. Compressed sensing and sparse coding further diminish the data volume by exploiting signal sparsity, enabling reconstruction from far fewer measurements than traditional sampling methods. FL approaches distribute model training across local sensor nodes, preserving user privacy by retaining raw data locally while sharing only model updates. This approach reduces both communication and computation overhead by offloading partial model segments. Additionally, drift-aware feature learning techniques employ autoencoder-based pre-processing to compensate for sensor signal degradation over time, while error suppression and compensation methods like CorrectNet address device variations and noise in analog computing platforms. These approaches ensure robust operation of in- and near-sensor systems despite inherent hardware non-idealities.

Despite these advances, critical integration gaps persist since hardware and software innovations have often progressed in isolation, leaving unexploited synergies between them. Future architecture will require co-design approaches that align pixel-level pre-processing with adjacent AI accelerators (Fig. 6c). This includes the adoption of monolithic 3D integration technologies for direct data transmission without intermediary interfaces, and the development of hardware-aware quantization schemes designed to mitigate bandwidth constraints. To reconcile the mismatch between software demands and hardware capabilities, emerging strategies such as hierarchical compression and adaptive precision scaling are poised to balance the trade-off between computational efficiency and inference accuracy. Looking forward, the rapid progress in model compression and hardware-aware neural architecture suggests that compact variants of foundation models will soon be deployable at the edge188,189,190,191,192. These highly compressed language and multimodal models could enable context-aware reasoning and multimodal decision-making directly within sensor networks. Realizing this vision will demand convergence of materials science, device engineering, circuit design, and algorithmic innovation, pointing toward edge intelligent systems that mimic the energy efficiency and adaptability of biological sensory processing.