Introduction

The energy consumption of traditional digital circuits is linked to advancements in scaling technology and the manipulation of supply voltages1. As the progression of Moore’s law begins to stagnate, the search for novel computing methodologies has intensified. Among these, analog computing has re-emerged as a significant area of interest. The interest in analog computing is driven by the need for energy-efficient solutions in continuous-time signal processing, a domain where analog techniques excel due to their lower power requirements and inherent parallelism. These attributes position analog computing as a transformative technology, particularly in applications constrained by power availability, such as edge computing devices and biomedical instrumentation2.

Historically, analog computing was primarily utilized in circuits designed for specific applications, such as front-end circuits interfacing with sensors3. Examples of such applications include neural interfaces4, biosensing readouts5, and low-power readouts for MEMS6. Recent work has expanded the use of analog circuits for computation, demonstrating their applicability in basic arithmetic operations like addition and multiplication, nonlinear computations, and solving complex operations such as differential equations7,8. This study employs translinear circuits9 to showcase various analog operations while consuming significantly low power. Field Effect Transistor-based translinear circuits operating in the subthreshold domain have shown higher energy efficiency10.

Although analog computing is noted for its energy efficiency, it suffers from mismatch and variation issues. Mismatch refers to the time-independent random variations in physical properties of identically designed devices11. Various techniques are employed to mitigate mismatches, including using larger devices, common centroid layouts, and symmetrical designs. From a circuit design perspective, strategies such as chopping, correlated double sampling, and auto zeroing are used to reduce mismatch12.

In this study, we utilize programmable translinear circuits to mitigate the variations typically associated with device fabrication. Specifically, we deploy Floating Gate (FG)-based Multiple Input Translinear Element (MITE) devices, enabling a broad spectrum of analog operations13. This research demonstrates essential analog functions such as multiplication, division, and squaring operations. Additionally, we also demonstrate complex analog computations such as neural decoding. Traditionally, decoding neural data in neural implants has required the use of multiple ADCs in the front-end or a multiplexed ADC system to digitize multiple neural channels, leading to significant power consumption, particularly in devices that employ several hundred electrodes14. Our approach, however, uses MITE-based adaptive circuits to decode broadband intracortical neural data from non-human primates performing reach-to-grasp tasks. By integrating analog computations directly into the front end, we eliminate the need for an ADC for each neural channel, significantly enhancing processing efficiency and reducing system complexity. Figure 1 shows the overall concept of neural decoding and the use of analog computation for mapping neural data to kinematics. The analog adaptive filtering technique presented in this paper would not have ADC for each of the electrodes and will not be using the DSP in inferencing mode. The mixed signal system presented in this paper could potentially remove or reduce the amount of ADC used in inferencing kinematics from neural decoding.

Fig. 1: Overview of the mixed signal system presented in this paper with comparison on the digital process.
figure 1

The mixed signal system only requires the digital processor to be active at the time of weight updates in the analog adaptive filter, minimizing the overall power consumption. Further, the proposed system uses a lesser number of ADC than a digital processor. The brain is adapted from “Human Brain.png” by Injurymap, Wikimedia Commons, CC BY 4.0. Modifications were made.

Neural data exhibits significant day-to-day variability, which presents a substantial challenge for maintaining consistent decoding accuracy15. To address this, we have developed an on-chip non-linear learning algorithm that dynamically adjusts the weights on the MITE devices. This capability enables quick calibration against the daily variations in neural data, significantly improving the resilience and accuracy of neural decoding. Such adaptive learning is crucial for the long-term, reliable operation of neural implants, as it compensates for both variations in neural data and mismatches inherent in analog circuits. All circuits were prototyped and evaluated using a Field Programmable Analog Array (FPAA)16, fabricated using a 350 nm CMOS process, which demonstrates the feasibility and effectiveness of our approach.

The paper is organized as follows. Section “Floating Gate Multiple Input Trans-linear Element (FG MITE)” describes the overall Floating Gate MITEs and presents analog how this element is used in analog computations like additional multiplications. Section “Analog Adaptive Filtering” presents an adaptive filtering system using the MITE.

Floating Gate Multiple Input Trans-linear Element (FG MITE)

Floating-gate-based multiple Input Translinear Elements (MITE) are Field-Effect Transistors (FET), which have a polysilicon gate surrounded by an insulator like silicon dioxide, which isolates the gate, which means no DC path exists from the gate terminal to the actual gate of the transistor. Multiple inputs are coupled into the floating terminal via a capacitor, as seen in Fig. 2a. Each of these inputs linearly affects the transconductance of the device. In this study, a version of 2 input MITEs is used, which has inputs coupled via capacitor C1 and C2, where the third input V3 is held constant.

Fig. 2: Overview of floating Gate MITE device and I-V relationships.
figure 2

a Symbol for p channel N input Floating Gate Multiple Trans-linear Element (FG MITE). b Id vs Vg relationship of FG gate device with different stored charges stored on the device.

To capture the non-linear behavior of the FG-based MITE devices, this study uses the EKV model17 that accurately models all the regions of operations, sub-threshold, and above the threshold. This modeling allows an accurate and smooth transition between these two regions of operation. The general form of the EKV model for a floating gate MITE device with n inputs can be written by (2) where VFG is given by (1).

The model allows designing algorithms on software while accounting for the non-linearity observed in the MITE elements. Further, it enables comparing the hardware results with the ideal results obtained from the software:

$${V}_{FG}\propto {V}_{F{G}_{prog}}+{C}_{1}{V}_{1}/{C}_{T}+{C}_{2}{V}_{2}/{C}_{T}+\ldots +{C}_{n}{V}_{n}/{C}_{T}$$
(1)
$$\begin{array}{rcl}{I}_{sd}&=&{I}_{th}W/L\left({\ln }^{2}\left(1+{e}^{(\kappa ({V}_{b}-{V}_{FG}-{V}_{TP})-({V}_{b}-{V}_{s})+\sigma ({V}_{b}-{V}_{d}))/2{U}_{T}}\right)\right.\\ &&\left.-{\ln }^{2}\left(1+{e}^{(\kappa ({V}_{b}-{V}_{FG}-{V}_{TP})-({V}_{b}-{V}_{d})+\sigma ({V}_{b}-{V}_{s}))/2{U}_{T}}\right)\right)\end{array}$$
(2)

Figure 2b shows the measurement performed on an MITE device and its fit to the above equation (2) with different programming voltages (\({V}_{F{G}_{prog}}\)). In these equations, VTP represents the threshold voltage for a pMOS transistor, Ith denotes a device-specific parameter determined, and W and L are the width and length of the pMOS device, respectively. The thermal voltage is represented by UT, while CT denotes the total capacitance at the floating node, as illustrated in Fig. 2a. The input voltages V1, V2, and V3 are coupled through capacitance C1, C2, and C3, respectively. The stored charge on the floating node is represented by the \({V}_{F{G}_{prog}}\). A detailed description of each device parameter can be found in refs. 13,18,19.

Programming FG MITE

The stored charge on the floating node can be precisely controlled by adding electrons to the floating node via hot-electron injection and removing trapped electrons using Fowler-Nordheim tunneling, as described by ref. 20. This process affects \({V}_{F{G}_{prog}}\).

In Fowler-Nordheim tunneling, the voltage across the tunneling capacitance Ctun is increased. This effectively increases the electric field across the oxide, allowing trapped electrons to escape through the voltage barrier across Ctun. As a result, the stored charge on the floating gate device increases, leading to a reduction in the effective threshold voltage and conductance.

Hot-electron injection in MOSFETs occurs when a sufficiently large source-to-drain voltage is applied while an adequate current flows through the device21. This process reduces the stored charge on the device, thereby increasing the effective threshold voltage and conductance. The silicon dioxide insulator acts as a voltage barrier, trapping electrons on the polysilicon floating gate permanently, which ensures long-term, non-volatile memory storage. The trapped electrons remain on the floating node without external power, contributing to the non-volatile nature of the memory.

Hot-electron injection is widely used for fine programming due to its lower variability and higher resolution compared to Fowler-Nordheim tunneling. While tunneling is used to erase the FPAA device globally, hot-electron injection is utilized for data storage on the FG MITE device, which can be employed in analog circuits.

In this work, the weights are fine-programmed using hot-electron injection, chosen for its efficiency and the ability to program individual floating gate devices effectively.

Basic arithmetic operation

Addition

Current mode addition can be performed by simply adding the drain terminals of PMOS devices together, as shown in Fig. 3a. Each PMOS device should have a Vsd > 0.2 V be on subthreshold saturation, where drain current will become independent of Vsd. An n-channel current mirror can be used in the output of an application that needs to source the summation current by providing the output current at the bias of the current of the n-mirror.

Fig. 3: Overview of current domain addition and subtraction circuits.
figure 3

a Addition circuit using FG PMOS, number of additions can be increased by adding more FG PMOS devices to the same node. b Subtraction utilizes a n-mirror and FG PMOS devices.

Subtraction

Similar to an addition circuit, multiple current sources can be connected at the bias and output terminals of the n-channel current mirror, as shown in Fig. 3b. The resultant output current is equal to the difference between the bias current and the output terminal input current.

Multiplication in current domain

Multiplication is a crucial operation in analog circuits, particularly in signal processing and machine learning applications. FG MITE devices enable multiplication in the format of II1 × I2, which is essential for weighted multiplications often required in neural network computations.

The general form of the EKV model for an FG MITE, given in (2), can be simplified for a two-input FG MITE under the conditions VFG < VTP and Vds > 4VTP, known as the subthreshold saturation region. In this model, α denotes the proportionality constant, Vprog corresponds to the stored charge, VDD represents the source voltage, and \({w}_{i}=\frac{{C}_{i}}{{C}_{T}}\). The term Ci represents the input capacitance of i th input.

$${I}_{sd}={I}_{th}\frac{W}{L}{e}^{\frac{\kappa ({V}_{DD}-\alpha ({w}_{1}{v}_{1}+{w}_{2}{v}_{2}+{V}_{prog}))}{{U}_{t}}}$$
(3)

Since the parameters α and VDD are constant for a device, (3) can be rewritten by combining καwi as Wi and \({I}_{th}\frac{W}{L}{e}^{\frac{\kappa ({V}_{DD}-\alpha {V}_{prog})}{{U}_{t}}}\) as Io(Vprog):

$${I}_{1}={I}_{o}({V}_{pro{g}_{1}}){e}^{\frac{-\kappa ({W}_{1}{v}_{1}+{W}_{x}{v}_{x})}{{U}_{t}}}$$
(4)
$${I}_{1}={I}_{o}({V}_{pro{g}_{1}},{v}_{1}){e}^{\frac{-\kappa ({W}_{x}{v}_{x})}{{U}_{t}}}$$
(5)

By connecting three FG MITEs as shown in Fig. 4a, where two MITEs serve as inputs for the multiplication and the output is produced by a third MITE with its gates connected to the input FG MITE blocks. The current through the two input MITEs can be represented by (5), and the output current is given by (6). This can be further simplified by substituting the current equations from (5) into (6), resulting in the multiplication form IoutI1 × I2. This output current can be effectively used in multiplication calculations and perform a multiply-accumulate operation.

Fig. 4: Overview of multiplication circuit and its results.
figure 4

a Multiplication circuit realization using two input floating gates. b Experimental results from multiplication circuit.

Figure 4b presents results from the hardware. It shows the linearity of the multiplication for I1, I2 [10, 50] currents in the range from 10 nA to 50 nA. This linearity offers more accurate multiplication results. The precision of multiplication results is limited by the driving and measuring circuits.

$${I}_{out}={I}_{o}({V}_{pro{g}_{out}}){e}^{\frac{-\kappa ({W}_{x}{v}_{x}+{W}_{y}{v}_{y})}{{U}_{t}}}$$
(6)
$${I}_{out}=\frac{{I}_{o}({V}_{pro{g}_{out}})}{{I}_{o}({V}_{pro{g}_{1}},{v}_{1})\times {I}_{o}({V}_{pro{g}_{2}},{v}_{2})}\times ({I}_{1}\times {I}_{2})$$
(7)

Division in current domain

The division is the inverse operation of multiplication, which is performed to find the quotient of two numbers. Analog current domain division could be achieved with the circuit topology shown in Fig. 5a. The circuit could be analyzed as similar to the multiplication circuit, simplified relationship between input currents I1 and I2 the output current is given in (11). Figure 5b shows the measurement results obtained from the circuit shown in Fig. 5a for I1 [20, 30] and I2 [50, 70]. These results show that linear division is achievable with MITE devices.

$${I}_{1}={I}_{o}{({V}_{pro{g}_{1}})}^{{\prime} }{e}^{\frac{-\kappa (W{v}_{x}+W{v}_{y})}{{U}_{t}}}$$
(8)
$${I}_{2}={I}_{o}{({V}_{pro{g}_{2}},{v}_{2})}^{{\prime} }{e}^{\frac{-\kappa (W{v}_{x})}{{U}_{t}}}$$
(9)
$${I}_{out}={I}_{o}{({V}_{pro{g}_{out},{v}_{1}})}^{{\prime} }{e}^{\frac{-\kappa (W{v}_{y})}{{U}_{t}}}$$
(10)
$${I}_{out}=\frac{{I}_{o}{({V}_{pro{g}_{2}},{v}_{2})}^{{\prime} }\times {I}_{o}{({V}_{pro{g}_{out}},{v}_{1})}^{{\prime} }}{{I}_{o}{({V}_{pro{g}_{1}})}^{{\prime} }}\frac{{I}_{1}}{{I}_{2}}$$
(11)
Fig. 5: Overview of division circuit and its results.
figure 5

a Division circuit realization with two input FG MITE elements. b Experimental results of the current division operation.

Square

Square and Square root operations are another useful computation that could be simply approximated with analog MITE devices. In a digital counterpart, this computation must have floating point operations to produce reasonably accurate results, requiring higher power consumption and complex dedicated circuits. A straightforward method of computing squares is using multiplication. However, to reduce the number of circuits used, the power computation circuit could be realized by connecting FG MITEs as shown in Fig. 6a.

$${I}_{1}={I}_{o}{({V}_{pro{g}_{1}})}^{{\prime} }{e}^{\frac{-\kappa (W{v}_{1}+W{v}_{y})}{{U}_{t}}}$$
(12)
$${I}_{out}={I}_{o}{({V}_{pro{g}_{out}})}^{{\prime} }{e}^{\frac{-\kappa 2\times (W{v}_{y})}{{U}_{t}}}$$
(13)
$${I}_{out}=a\times {I}_{o}^{2}$$
(14)

Measured square value from hardware implementation is shown in Fig. 6b which further shows curve fitted to 2nd order polynomial in the form a(xb)2 − c with the coefficients a ≈ 1.089 × 10−9, b ≈ 6.49A, c ≈ 1.523 × 10−7A With this circuit configuration, a square root operation could be performed by simply interchanging the inputs and outputs. Further, having n input MITEs instead of the 2 input MITEs used in this work enables computation of various powers rather than being limited to squares.

Fig. 6: Overview of the square circuit and its results.
figure 6

a Circuit realization of FG MITES based current square. b Experimental results of current squaring and curve fit on the experimental data.

One of the advantages of FG MITES devices is that they perform general computations without having different specialized circuits for each of the computations shown. Further, these devices can be individually programmed by changing the charge of the floating node, which can reduce the mismatches and function as memory.

Analog adaptive filtering

Adaptive filters are essential in signal processing and control systems due to their ability to adjust their response to changes in signal characteristics. These systems are particularly useful when signal properties are non-stationary or unknown. Unlike fixed filters, which have constant parameters, adaptive filters can dynamically update their coefficients to optimize an error criterion in real-time. Figure 7a presents a high-level block diagram of a general adaptive filter.

Fig. 7: Overview of the Analog Adaptive Filter for and its components.
figure 7

a High-level block diagram of an adaptive filter. b Multi-input adaptive filter with m inputs and n time-delayed taps for each of the inputs. c Hardware realization of a multi-input time-delayed adaptive filter.

In this study, we focus on the adaptive filter system illustrated in Fig. 7b, where the output is given by linear combinations of convolution kernels as described by the following equation:

$$y(t)=\sum _{m}\sum _{n}{W}_{i,j}{x}_{i,j}(t)$$
(15)

During the adaptive filter learning phase, the objective function used to update the coefficients is given by:

$${\mathbb{E}}[e(t)]={\mathbb{E}}\left[{\left(\left(d(t)-\sum _{m}\sum _{n}{W}_{i,j}{x}_{i,j}(t)\right)\right)}^{2}\right]$$
(16)

We utilize the Least Mean Square (LMS) algorithm, which iteratively updates Wi,j with a learning rate μ to minimize the error function in (16):

$${W}_{k+1}={W}_{k}+2\mu {e}_{k}X$$
(17)

The charge storage capability of FG-based MITEs makes them particularly suitable as both weight storage elements and multiplicative components in an analog adaptive filter. In this study, we implement a hardware adaptive filter, as shown in Fig. 7c, utilizing FG MITE at each tap. Currently, system inputs are provided via built-in Digital to Analog Converters (DACs). In the final design, these will be replaced by analog front-end circuits that preprocess sensor inputs. Each input channel is passed through several low-pass filters, which introduce a time delay to the input signal. The delayed inputs and outputs from each low-pass filter are then fed into an array of weight storage cells. Each weight storage cell comprises two FG MITEs and a channel current mirror. This configuration allows for precise adjustment of the weight of each input by selectively increasing the current through the FG MITEs connected to the appropriate n-mirror branch via hot-electron injection. The accumulated output currents from all weight storage cells are then connected to a RAMP ADC, which charges a capacitor and triggers when the voltage across the capacitor reaches a specific value. The RAMP ADC has 10 bits of resolution and a measurement range of 18–38 nA. The resulting signal is digitized by an MSP430-based microcontroller, which processes the current output from the adaptive filter.

The weight adaptation process begins by measuring the output of the adaptive filter and evaluating the error relative to the expected output. Based on the inputs to each FG MITE, an error value is generated for each device. The weight adaptation algorithm, presented in Algorithm 1, describes the steps used to update the weights using the Hot Electron Injection process described in 2.1. During the hot electron injection fine-tuning, each FG MITE device is selected, and the tunneling terminal kept fixed at 5 V and input terminals at 3.5 V and 1 μs pulses of 5 V VDS is applied based on the number of pulses determined in Algorithm 1.

For additional details on the supporting analog circuit, including the OTA-C Supplementary Fig. 1 and RAMP ADC Supplementary Fig. 2 used in this work, see Supplementary information.

Algorithm 1

Analog Adaptive Filter Coefficient Updating Algorithm

iN Number of Samples

while i > 0 do

Y[i] ← ADC Measure ADC

ii − 1

end while

iC × T Number of Channels (C), No. of Taps (T)

e ← 2 × μ × (DY) × X Desired Signal(D), Learning rate(μ), Input Vector (X)

while i > 0 do

if e[i] < 0 then Select Switches connected to (-) ;

else Select Switches connected to (+) ;

end if

while \(\left\vert e[i]\right\vert\, > \,0\) do Inject Switch with 1us pulse

\(e[i]\leftarrow \left\vert e[i]\right\vert -1\)

end while

ii − 1

end while

Neural decoding using adaptive filter technique

This study employs an analog adaptive filter composed of FG MITEs to perform neural decoding. Neural decoding is the process of mapping kinematic variables (such as velocity or displacement) from the information contained in neural signals, such as action potentials. In this work, we use time-domain adaptive filtering to decode neural signals recorded from the motor cortex of a macaque monkey during an instructed delayed reach-to-grasp task, as described in ref. 22.

The dataset includes recordings from two non-human primates, a female and a male. For this study, we focus on a single trial involving the male primate and decode y component of displacement. The neural data was acquired using a Utah array with 10 × 10 electrodes. Given the significant overlap of information between multiple electrodes, reducing the input data to a lower-dimensional space can substantially decrease the complexity and cost of processing. To achieve this, we apply Principal Component Analysis (PCA), an unsupervised algorithm that transforms the original input vectors into a new coordinate system based on their principal components. PCA identifies the directions (principal components) in which the data varies the most and projects the data onto these axes, effectively reducing the dimensionality while preserving as much variance as possible.

This study uses PCA to transform the input neural signals into their principal components, thereby reducing the number of inputs to the adaptive filter. By focusing on the most significant components, PCA simplifies the data and helps retain the most relevant information for decoding. This reduction in input dimensionality leads to more efficient processing, enabling the adaptive filter to perform neural decoding more effectively. Figure 8 illustrates the signal processing chain used for decoding neural data into kinematic information. This study performs PCA offline using a software platform while the adaptive filter is implemented on analog hardware. Figure 7c shows the overall block diagram of the adaptive filter implemented on the hardware.

Fig. 8: Overall signal flow of the neural decoding system: electrophysiological (ECoG) signal acquisition from macaque and inference of hand kinematics using an adaptive filter.
figure 8

The brain is adapted from “Human Brain.png” by Injurymap, Wikimedia Commons, CC BY 4.0. Modifications were made. a Local Field Potential (LFP) obtained from Utah Array. b Most significant principal components extracted from LFP dataset. c After low pass filtering and normalizing the dataset using a 6-th order Butterworth filter with 25 Hz cutoff frequency and normalized by computing the z-score. d Results from the proposed adaptive filter.

To determine the optimal number of principal components (PCA elements) that would provide high accuracy while minimizing the number of channels, this study modeled the non-linear components, such as the FG-based MITE, in Matlab using the EKV equation (Eq. (2)). Figure 9a shows the Pearson correlation coefficient between the predicted and reference kinematics (y-displacement) for different numbers of principal components. The results indicate that the accuracy improves with increasing principal components up to a certain point, specifically around four components. Beyond this point, the accuracy gains become marginal while the complexity of the system and power consumption increases significantly. Therefore, selecting four principal components strikes a balance between achieving high accuracy and maintaining a manageable number of analog components and power efficiency. The four principal components were passed through low-pass filters to add time delays. The sum of these weighted time delays makes a temporal convolution filter. Figure 9a presents the Pearson correlation coefficient between the predicted and reference kinematics as we increase the size of the convolution kernel. However, increasing the convolution kernel size results in higher power and complexities the analog circuit adds. Simulation results suggest that the four principal components with three time-delayed signal results offer a better correlation coefficient to the current experiment while keeping power consumption and complexity minimal. Figure 9b further illustrates the predicted and reference y-displacement over time, demonstrating the effectiveness of this approach.

Fig. 9: Overall result comparison of proposed hardware adaptive filter with software estimations.
figure 9

a Prediction correlation from adaptive filtering with the number of principal components accounted and correlation with the number of time delays added (while keeping the PCA count to 4). b Predicted displacement from adaptive filtering with different number of principal components. c Presents results from hardware filter and the Matlab model against the reference signal. Hardware filter archive ρ = 0.69).

These selected principal components were inputs to the adaptive filter via on-chip DACs. The adaptive filter was configured with four inputs, each having three taps, as depicted in Fig. 7c. A programmable transconductance amplifier implemented the delay elements between each tap. The weights of the adaptive filter were realized using FG-based MITE blocks, which allowed for precise control and non-volatile storage of the weights. The output currents from all MITE blocks across the four channels were summed onto a single wire, and this aggregated current was sensed using a RAMP ADC. The weights were then adapted based on the weight update algorithm described in Algorithm 1.

Figure 9c presents the output of the hardware filter, comparing it with the adaptive filter implemented in Matlab and the reference displacement. Initially, with a random set of weights, the adaptive filter achieved a Pearson correlation coefficient of 0.07. The correlation improved significantly as the weights were iteratively updated using the algorithm implemented on the MSP430 microprocessor. The adaptation algorithm took 232 iterations in the hardware to reach maximum accuracy. The final Pearson correlation coefficient reached 0.69, indicating a strong alignment between the predicted and actual kinematics, thereby validating the effectiveness of the adaptive filtering approach.

The power consumption of each FG MITE and LPF was around 3 nW and 13 nW, respectively. The 4-channel 3-tapped version of the adaptive circuit shown in the 7c consumes ~140 nW of static power from 2.5 V power supply. RAMP ADC used in this work consumes 10 μW. When measuring the power consumption, the interfacing circuit power was not taken into account. Table 1 does a comparison study of systems which does neural decoding. The table shows that digital systems tend to use significant power, which makes them harder to implant, considering the thermal properties and power requirements.

Table 1 Comparison of multiple neural decoding systems

Out of the works presented in Table 1 analog neural decoding of head directions is presented in ref. 23 is very similar processing with different circuit realization. This approach uses ~300 μW per channel power consumption in SPICE simulations.

Discussion

This study primarily focused on computations with CMOS-compatible non-volatile device FG MITE elements and showed an adaptive filter implementation using computations discussed as building blocks. This adaptive filter presented is beneficial for applications that require low-power, non-stationary signal processing and for applications where it is difficult to predetermine the filter coefficients. Further, on-chip learning provides robustness over the mismatches and variations in analog systems. Otherwise, the exponential current-voltage relationship makes slight mismatches of voltage, which could result in significant variations over multiple devices, making the solutions harder to scale for multiple devices.

The proposed adaptive filtering approach can be extended by increasing the number of channels and time-delayed accounts in computations. Increasing the number of channels could support systems with a higher number of electrode systems, such as Neuropixels 2.0, and increasing the number of time-delayed elements could facilitate more complex relationships.

Further, the on-chip principal component computation can be realized with FG crossbar array to a large number of electrodes similar to the memristor implementation presented in ref. 24 The PCA components currently used in this work can be replaced by the input signals directly or by an analog vector-matrix multiplier to provide a weighted sum of input signals.