Introduction

Neural interfacing electrode arrays are key devices for understanding the dynamics of the Central Nervous System (CNS) and for the development of neural prostheses for rehabilitation in case of severe paralysis1,2,3,4,5,6,7,8,9. Studies have shown that the decoding accuracy of neural activity increased with an increase in the number of electrodes or cells recorded1,5,8,10. Therefore, a strong effort has been put to develop neural probes with high numbers of electrodes11,12,13,14,15 that produce large amounts of data. A key problem has become to exploit the richness of these data recordings, which is difficult to access using conventional methods. In particular, whether specific spatiotemporal patterns exist in multichannel neural data might not necessarily be obvious to determine, and the corresponding patterns underlying specific neural functions might be difficult to apprehend. To this end, fully unsupervised approaches that could extract patterns of neural dynamics from the large flow of data produced by neural implants would bring invaluable perspective to better understand the dynamics of the brain that underly behavior and to identify non-obvious behaviorally relevant neural features. In particular, because studying the activity of multiple individual neurons is important not only for the advancement of neuroprostheses but also for advancing the understanding of how information is encoded by the CNS16,17. In this context, spike-sorting is a critical preprocessing step in analyzing neural data which consists in isolating individual neuron activity from raw neural recordings18,19,20. Several algorithms have been developed for this purpose but, due to their power consumption, they cannot be envisioned to be directly embedded into brain implants. Therefore, the long-term objective of seamlessly integrating this neural data preprocessing step directly into implantable devices calls for new algorithms that are both efficient and online-compatible, while also being low in power consumption.

Over the past decade, formal deep neural networks (DNNs) have reached unprecedented interest to learn patterns within large amounts of data21,22,23,24,25. This second generation of artificial neural networks (ANNs) is now extensively used and ubiquitous in many applications. Yet, despite their capabilities, they face two main drawbacks to envision their embedding in neural interfaces. Firstly, their reliance on mainly supervised learning techniques such as backpropagation26 requires labeled datasets, posing a significant obstacle in applications where such labeled data is either limited, or difficult to obtain as in the case of large-scale neural recordings. Secondly, the computational demands of these networks often call for the use of specialized hardware like GPUs or TPUs to optimize their numerous parameters that need to be maintained in memory and learned through the minimization of a global loss function. Additionally, it is important to note that although state-of-the-art DNN architectures are constructed to handle sequences of information25,27,28, they inherently lack the true notion of time. Moreover, although self-supervised learning has been proposed to encode input features by DNNs automatically, further supervised decoding steps are required to match these embeddings to specific patterns. As a consequence, beyond their lack of energy efficiency, DNNs are not suitable candidates to build fully unsupervised pattern extraction from ongoing multichannel temporal data in which ground truths are unknown. Therefore, alternative approaches need to be found to eventually embed automated neural processing algorithms at very-low power into future intelligent neural implants for real-time identification and extraction of complex features from large-scale neural recordings.

In this respect, spiking neural networks (SNNs) are neuromorphic ANNs that model the membrane potential of their neural elements29,30,31. SNNs communicate through discrete action potentials (spikes) that are connected by dynamic synapses. These spikes are sparse in time and thus provide a sparse computing framework for learning. They typically rely on smaller numbers of parameters, and integrate biomimetic plasticity rules found in living neural networks such as spike-timing-dependent plasticity (STDP)32,33 or other post-synaptic rules. This third generation of ANNs is thus radically different from DNNs, as learning becomes local at each synapse and neural element based on the dynamics of pre- and post-synaptic neurons. This removes the necessity for large global memory storage and energy-consuming global minimization. Based on their local learning rules, SNNs can self-configure in a fully unsupervised way solely based on their inputs to automatically recognize patterns hidden in the data34,35,36, and while they are also capable of supervised learning through surrogate gradient backpropogation37,38,39,40, they require less data for training than traditional DNNs41,42,43. Finally, SNNs are compatible with very-low power neuromorphic hardware emulating spiking neurons44,45,46,47,48 and novel materials integrating resistive memories that are well suited to emulate artificial plastic synapses at ultra-low power49,50,51,52,53.

A number of previous studies have shown that STDP-based SNN architectures could be used for pattern or object recognition within static images in either supervised54,55,56, semi-supervised57, or fully unsupervised34,58,59,60 ways. Other architectures have been developed for supervised learning of patterns within a temporal signal such as speech61. Such a paradigm has further been extended to the supervised recognition or decoding of dynamic patterns within time-varying multivariate data such as EMG62, olfactory signals63, respiratory signals64, tactile braille reading65, EEG66, or intracortical data67,68. Toward unsupervised learning and classification of multivariate patterns, hybrid approaches have been developed consisting of a self-organizing SNN trained in an unsupervised way to represent an audio input followed by a supervised step to classify each representation into a sound class69,70. These networks classically have a feed-forward structure with several fully-connected layers, resulting in large numbers of parameters65, which can be reduced to some extend using convolutional layers55,69. As for the task of spike-sorting, although it has seen a significant rise in interest in recent years71,72,73, methods typically begin with a detection step, followed by often computationally-expensive feature extraction and clustering steps. This approach might introduce latencies that limit real-time applicability, but most importantly remain highly limiting in a perspective of having them embedded at very-low-power in neural implants, particularly in high-channel-count recording devices. Given their unique features, SNNs thus constitute promising candidates for future very-low power and fully unsupervised neural signal processing embedded within cortical implants. However, SNN architectures are typically dependent on the application for which they are dedicated and are not versatile to answer the need of a wide range of different applications. In this context, fully unsupervised extraction and classification of time-varying patterns in multichannel neural recordings using frugal SNNs is a problem that remains to be solved.

In a previous study, we proposed an attention-based SNN to extract and automatically classify action potential shapes from single-channel extracellular neural signals74. The same SNN was also later implemented in hardware using low-power FPGAs demonstrating the network’s online-classification capabilities75. The next challenge that we address in the present study is to automatically process multiple neural signals simultaneously, where the activity of individual neurons is captured by multiple nearby electrodes, a situation corresponding to the problem of extracting multivariate temporal patterns in a fully unsupervised way76. In this respect, previous SNN works have tackled the specific case of time-varying visual scenes. In particular, efficient fully unsupervised learning of spatio-temporal patterns corresponding to moving objects has been demonstrated using SNNs77,78. When searching for spatio-temporal patterns, the temporal information needs to be somehow captured by the network. In one of these studies, data from AER cameras could be automatically processed to count cars passing in each line of a highway77. In this case, spatiotemporal patterns typically had similar dynamics, such as their duration in time and how fast they moved across pixels. Moreover, this strategy was based on the order of spikes within the input, so that the output neurons eventually fire after only a few spikes of the patterns are emitted35,36, not waiting for the whole patterns to end. As a consequence, such approach typically fails when different spatiotemporal patterns are nested, for example one pattern being exactly the beginning (along either space or time or both) of another longer one, so that both only differ by their endings. In another study78, feedforward connections between neurons with different membrane dynamics were used to achieve memory of different time-scales in order to learn temporal patterns with different dynamics. An alternative strategy we employed in a previous study to automatically classify temporal patterns corresponding to different action potential shapes in an extracellular neural signal was to use several synapses between two neurons with different delays74. The drawback of these approaches is the multiplication of the number of neurons and/or synapses. An alternative could be to segment the flow of temporal data into fixed-size frames and process each frame like an image. This however prevents fully online processing of the temporal data without any prior on the length of the patterns to search for. Another approach that has recently been proposed is to learn synaptic delays79, but this requires supervised learning. To overcome these limitations, there is thus a need for very frugal architectures able to recognize multivariate temporal patterns in a fully unsupervised way and compatible for online processing of neural data streams.

Toward this goal, we propose here an SNN architecture dedicated to automatic multivariate temporal pattern extraction that highly contrasts with classical ones based on Leaky-Integrate-and-Fire (LIF) neurons80. Among the wide range of existing models of spiking neurons80,81,82,83,84 we used a variant of Low-Threshold Spiking (LTS) neurons84, whose dynamics automatically adapt to the temporal durations of input patterns without the need to multiply the number of synapses. A single layer of such neurons is connected to input spike trains by synaptic weights and the network learns through biological learning rules like STDP and Intrinsic Plasticity (IP) in a fully unsupervised manner. We show that, with only a handful number of neurons, this strategy is efficient to recognize highly overlapping multivariate temporal patterns, first on simulated data, and then on Mel cepstral representations of speech sounds and on multichannel neural data, and finally that it can be used to perform spike sorting on multichannel synthetic and real single unit neural data. These results are thus a step towards highly frugal SNNs for unsupervised learning of complex multivariate temporal patterns in multichannel neural data.

Results

Ideally, an SNN developed for unsupervised pattern identification and recognition should be able to process incoming data fed sequentially, and emit one output spike each time a specific pattern occurs in the data, and each spike needs to be emitted by a different neuron for each different pattern to allow direct inference without further supervised step. The initial step of this procedure is to encode the continuous input data into spike trains to enable SNNs to leverage the inherent event-driven nature of neuronal computation and capture temporal dependencies within the data. To this end, we considered two types of encoding methods. In the first one, the original data was quantized using a column of sensory receptive fields that generated spikes when the signal fell within the fields (Fig. 1a), a strategy we previously employed to encode extracellular signals for spike sorting using SNNs74. Five spikes were generated for each input value in order to increase robustness of the encoding with respect to small fluctuations of the input. Using this approach, the resulting spike trains directly reflected the shape of the original data (Fig. 1b). Original audio data was decomposed into 24 continuous Mel Cepstral signals (Fig. 1c, see Methods for details). Similarly, the multiunit activity of the neural data was binned and smoothed for each electrode, resulting in 30 continuous signals (Fig. 1d). In both cases, these input data were normalized between 0 and 1 and encoded using 24 receptive fields (Fig. 1e,f). This initial encoding resulted in spike trains that encoded both background noise and relevant patterns. In order to mitigate the influence of noise on the STDP learning process, a short-term plasticity (STP) rule from our previous work was introduced for each input spike train, which is a mechanism that weakens input synaptic weights all the more that the presynaptic activity is high (see Methods)74,85. With this strategy, only those spike trains encoding the peaks and throughs in the original data were retained (Fig. 1g,h). These final encoding spike trains were then considered as the input spike trains passed into the network to be processed.

Fig. 1: Receptive-field-based encoding pipeline.
Fig. 1: Receptive-field-based encoding pipeline.
Full size image

a Each signal was normalized and converted into 24 spike trains using quantization receptive fields. At each timestep, depending on the value of the signal, a spike was generated by one of 20 receptive fields equally spanning the 0–1 range of input values. Two additional spikes were generated both above and below the central spike making a total of five spikes per timestep (receptive fields in red) and 24 spike trains per signal. b Example of the normalized Mel 1 corresponding to eleven French vowels and encoded into a 24 spike trains. c Audio data decomposed into 24 Mel cepstral coefficients. d Multiunit neural data (middle) from embryonic mouse hindbrain spinal cord on the microelectrode array (MEA, left) was binned and smoothed to extract spike envelopes on each channel (right). e, f Initial encoding spike trains for audio and spike envelope of neural data, respectively. g, h Final encoding spike trains after short-term plasticity (STP) for audio and spike envelope of neural data, respectively. STP eliminated spikes corresponding to noise and only retained spikes indicative of a pattern. Some residual spikes corresponding to noise can be seen across some neural data channels as these channels were very noisy.

In the case of spike sorting data, we used a different encoding technique that was more robust to noise and avoided the use of STP. Moving forward, in the context of spike-sorting, we shall use the term ‘simulated action potentials’ for action potentials simulated in the spike-sorting data, ‘real action potentials’ for action potentials of real retina neurons in the real spike-sorting data, and ‘artificial spikes’ for artificial action potentials produced by the LTS neurons. As illustrated in Fig. 2a,b, a delta encoding technique inspired from the AER technique86 was employed, which proved to be more robust for noisier data (see Methods). We tested two types of spike-sorting datasets: simulated data and real data made available to test spike sorting algorithms. For the simulated data, an 8×8 MEA grid was placed in the vicinity of 6 neurons and artificial activity was simulated (Fig. 2c,d). The real spike-sorting datasets were taken from an open-source database87 of retinal ganglion cells recordings in mice that were recorded with a 16×16 MEA (Fig. 2f,g). Each real dataset includes a juxtacellular recording of one retinal neuron that provides the ground truth for the time occurrences of the action potentials of that particular neuron. For each dataset, we considered a subset of 8×8 electrodes that surrounded this neuron. In all cases, data was encoded using the same delta-encoding technique to obtain the final encoding spike trains (Fig. 2e,h).

Fig. 2: Spike-sorting data encoding pipeline.
Fig. 2: Spike-sorting data encoding pipeline.
Full size image

a A sample sine wave is encoded. Spike trains 1–5 encode the positive delta changes between the current timestep \(t\) and previous timesteps \(t-1\) through \(t-k\), where \(k=5\). Similarly, spike trains 6–10 encode the negative delta changes between \(t\) and \(t-1\) through \(t-k\). An artificial spike is generated in the respective spike trains at time t if the signal change exceeds the defined threshold \(\delta\). b An example of a real action potential encoded with the same technique but with a \(k\) value of 10, resulting in 20 spike trains. c Visualization of the multielectrode array (MEA) grid and neuron positions. The filled blue circles represent the soma (cell body) of the six generated neurons. The red circles correspond to the electrodes in the 8×8 MEA grid. d Simulated MEA spike-sorting data with a zoomed-in region to illustrate artificial spikes. e Examples of encoding spike trains for each of the six ground truths; each channel of the filtered data was encoded using the delta technique to obtain 1280 spike trains for the 64 channels. f Illustration of the 16×16 microelectrode array (MEA) grid used to record retinal ganglion cells in mice87, from which we considered an 8×8 subgrid around the electrode that was closest to the ground truth (GT) neuron. g Real MEA spike-sorting data with a zoomed-in region illustrating real action potential waveforms corresponding to the ground truth that spread across multiple electrodes. h Each channel was encoded in the same way as the simulated data, giving 1280 final spike trains.

Shown in Fig. 3a is the final network architecture, which consisted in a single layer of LTS neurons that were connected to the input spike trains by negative synaptic weights (between -1 and 0) initialized randomly according to a uniform distribution. The LTS neurons processed the spikes from the input spike trains (presynaptic spikes) and sporadically produced output spikes (postsynaptic spikes) according to the dynamics of their membrane potentials. A Winner-Take-All (WTA) mechanism implemented across the LTS neurons ensured that there was at most only one neuron that emitted a spike at any given timestep. Learning in the network happened through biological learning rules such as STDP and Intrinsic Plasticity (IP). For each postsynaptic spike emitted by a certain neuron in the network, the STDP rule strengthened the synaptic weights between this neuron and all the spike trains that had presynaptic spike within a certain coincidence time window chosen based on the maximum length of patterns to be searched for in the data. Another lateral STDP rule governing lateral inhibition weakened the synaptic weights between all the other neurons and the same spike trains thereby inhibiting other neurons from learning the same pattern (see Methods). The lateral STDP update rule was much weaker than the principal STDP rule, considering that patterns may share common spiking activity. Additionally, for each postsynaptic spike output by a certain neuron, the IP rule adapted the threshold of the neuron based on the size of the pattern learnt giving the network the ability to differentiate between patterns that are one inside the other. Figure 3b demonstrates the evolution of the membrane potential of an LTS neuron in the presence of multiple input spike trains. Each input spike inhibited the potential of the LTS neuron for a duration that was determined by the neuron’s membrane time constant. Once the stimulus ended, the neuron generated, by nature of the way they were modeled, a rebound. At the network level, the neuron with the highest inhibition produced the steepest rebound according to which it was the chosen neuron to emit an output spike (see Methods). Therefore, the network took in multiple spike trains representative of the original data as input and processed these spikes to produce sporadic output spike trains that corresponded to patterns in the input data, all while learning through biological learning rules in a fully unsupervised manner.

Fig. 3: Network architecture.
Fig. 3: Network architecture.
Full size image

a Inputs encoded as spike trains were passed into the network consisting of only a handful of Low-Threshold Spiking (LTS) neurons. Learning happened through Spike-Timing-Dependent Plasticity (STDP) and Intrinsic Plasticity (IP) rules that enabled the LTS neurons to modulate their input synaptic weights and thresholds, respectively. A Winner-Take-All (WTA) mechanism in the LTS layer chooses the neuron with the steepest rebound among the neurons that have crossed their respective thresholds to produce an output spike (see Methods) thereby ensuring that at most only one neuron emits a spike at any given timestep. b Illustration of the working mechanism of an LTS neuron with a simple case of 250 input spike trains (top: black = 0 or ‘no spike’; white = 1 or ‘spike’) to a single LTS neuron. The voltage of the LTS neuron (middle) is inhibited by the incoming example stimulus for a duration that is determined by the time constant of the neuron. Upon the end of incoming stimulus, the neuron generates a rebound potential and emits a spike (bottom) when it crosses a threshold. With every postsynaptic spike, the voltages of all neurons were reset to 0. This behavior is ideal as the neurons waited for the pattern to end and produced only a single output spike per pattern.

In order to establish a baseline for the network’s classification capabilities, the network was initially tested with artificial patterns that mimicked spike trains obtained from encoding spectrograms (Fig. 4). In this approach, artificial patterns mimicking unique frequency characteristics across 240 spike trains were repeatedly shown to the network to learn. Starting with basic patterns, and increasing the complexity of the patterns iteratively helped us better troubleshoot the classification performance of the network. The most basic example of artificial patterns included four non-overlapping patterns (Fig. 4a). The network was trained on 50 repetitions ( = epochs) of these four patterns and it can be seen that the network was able to identify these four unique patterns and each pattern was learned by a unique LTS neuron right from the beginning (Fig. 4b). Once all the input spikes are passed through the network, the output spike trains produced by the network were matched with the truth spike trains to obtain truth-output pairs to compute an evolving f-score along the learning process. It can be seen that for these simplest patterns, the network had a perfect f-score of 1 since the beginning. This can be attributed to the random initialization of the input weights to the LTS neurons and the absence of commonality in spike trains between patterns.

Fig. 4: Classification performance on artificial patterns.
Fig. 4: Classification performance on artificial patterns.
Full size image

a Case of four patterns that do not overlap in terms of frequency characteristics across 240 spike trains. b Four of the ten Low-Threshold Spiking (LTS) neurons in the network were able to identify and learn each of the patterns. The f-score was 1 throughout the training. c Case of patterns with two being inside two others. d Two LTS neurons each quickly encode two overlapping patterns. Then, after about 40 presentations of the patterns, Intrinsic Plasticity helped neurons adapt their thresholds in order to be able to differentiate between the small and big patterns so that the f-score eventually reached 1.

In both neural and vocal datasets, it is not uncommon to encounter patterns that are embedded within each other. Therefore, we tested the network with another example of four patterns (also repeated 50 times), where two were subsets of two other ones in terms of frequency characteristics (Fig. 4c). It can be seen that at the beginning of the training, two LTS neurons each learned two input patterns where one is inside the other (between epochs 7 and 38, see Fig. 4d). However, as learning continued, IP helped the neurons to adapt their threshold thereby preventing them to spike for the smaller patterns, which could then be learnt by two other neurons (the f-score eventually evolved to 1 as learning progresses).

After having established a baseline for the network’s classification capabilities with the artificial patterns, we then tested the network on real speech data consisting of eleven French vowels repeated 50 times by a native French speaker (Fig. 5a, top). The network was trained on 20 epochs of encoded spike trains from this data with eleven LTS neurons and at the end of the training, the output spike trains and the truth spike trains were matched to obtain the best truth-output pairs. After a learning period on as little as 20 epochs, the output spikes produced by the network became coherent with respect to the ground truth sequence of produced vowels (Fig. 5a, bottom). The f-score obtained on the classification performance of the network on the final epoch was 0.92 (see also the corresponding confusion matrix shown in Fig. 5b). The performance then remained stable after the 20th epoch if the data was continued to be fed to the network. At the end of training, each LTS neuron had learned a unique vowel as reflected by the final weights of the neuron that were strong for the Mels corresponding to this vowel (Fig. 5c). Figure 5d illustrates the evolution of the synaptic weights of neuron 1 (see cyan spikes in the bottom raster of Fig. 5a and first column of Fig. 5c), evolving from random initial values to final values.

Fig. 5: Unsupervised recognition of vowels.
Fig. 5: Unsupervised recognition of vowels.
Full size image

a Eleven French vowels were repeated several times and the encoding spike trains corresponding to one repetition are visualized. At the beginning of training, the output spikes produced by the network were random with respect to ground truth, whereas, at the end of training, each pattern was learned by a unique Low-Threshold Spiking (LTS) neuron. b Confusion matrix computed on the final epoch of training. c Final weights of the eleven LTS neurons after training. d Evolution of the weights of neuron 1 through time.

In a next step, we tested whether the network could also automatically identify simple spatiotemporal patterns in multielectrode array neural data. Neural activity was recorded in a whole embryonic hindbrain and spinal cord preparation, which was previously shown to exhibit rhythmic propagating waves of activity88,89. The multiunit spiking activity envelope of all these channels were encoded through the encoding mechanism as illustrated in Fig. 1h. This preparation exhibited a short and a long spiking patterns that were repeating in time. The short pattern had spiking activity propagating caudo-rostrally across the lower channels covering the lower thoracic and lumbar/sacral region of the spinal cord, whereas the long pattern had spiking activity propagating rostro-caudally across all channels. The network was trained on 10 epochs of the encoded spike trains with five output neurons. After 5 epochs, two unique output neurons had learnt the short and long patterns (Fig. 6a), producing consistent and coherent output spikes with respect to the ground truth (fscore = 1). After completion of the 10 epochs, the final weights of the LTS neurons (Fig. 6b) confirmed the learning of the short pattern by neuron 3 and the long pattern by neuron 4, with, in both cases, strong weights reflecting the corresponding patterns.

Fig. 6: Unsupervised recognition of rhythmic activity patterns in multichannel neural data.
Fig. 6: Unsupervised recognition of rhythmic activity patterns in multichannel neural data.
Full size image

a Neural data consisted of 3 short and 9 long spiking patterns over a period of 1 h. The network was trained on 10 repetitions of this recording. It can be seen that as training progresses, output neuron 3 and 4 had learnt to identify the short and long patterns, respectively. The f-score reached 1 after 5 epochs. b Final weights for each Low-Threshold Spiking (LTS) neuron confirming the learning where neurons 3 and 4 had strong weights for the inputs corresponding to short and long patterns, respectively.

In a final step, we evaluated the extent to which the proposed SNN could be used to perform fully unsupervised spike sorting. First, we considered simulated spiking data over 64 channels (see Methods). These spike trains encoded 6 action potential shapes from 6 different simulated neurons firing randomly in time for a total duration of 20 s. The network was trained on 10 epochs of the data but we found that it was very quick to learn, with an f-score of ~0.9 after processing only 4 seconds of data in the first epoch (Fig. 7a). The f-score of the network plateaued around 0.92 from the second epoch. Figure 7b further illustrates each of the six ground truth spike trains and their matched output neuron spike trains. The trains of simulated action potentials were reconstructed with a f-score above 0.95 (0.97 on average) and a precision above 0.98 (0.99 on average) for the five ground truths having the highest SNR (mean SNR across channels between 1.67 and 2.33; best SNR on channel with highest amplitude between 14.5 and 32.2). For the 6th simulated neuron having the lowest SNR (mean SNR of 1.28 and best SNR of 11.0), the classification f-score was 0.66, but the precision remained high in this case (0.94), indicating that although several action potentials were missed, when the occurrence of an action potential was predicted, this prediction was reliable. In a second step, we considered 7 real retina datasets made available to evaluate spike sorting algorithms (see Methods)87. Each dataset contains 5 minutes of data, and we trained the network on one epoch of each of these recordings and evaluated its performance with an f-score computed over the last minute of the recordings. The SNN managed to perform the spike sorting task for the 6 out of 7 datasets having high mean SNR for the ground truth action potential (Fig. 7c). For these 6 recordings, the ground truth and reconstructed spike trains are illustrated in Fig. 7d.

Fig. 7: Unsupervised spike-sorting on simulated and real neural data.
Fig. 7: Unsupervised spike-sorting on simulated and real neural data.
Full size image

a Evolution of f-scores with epochs of training on the simulated dataset. The inset shows the evolution of the f-score within the first epoch. b The six ground truth (GT) simulated action potential trains (blue) are shown together with the spike trains of their matched output Low-Threshold Spiking (LTS) neurons (orange) for the final epoch of the training. c F-score as a function of the mean Signal-to-Noise Ratio (SNR) computed across 64 electrodes for each ground truth waveform of the 7 real spike-sorting datasets. d The ground truth real action potential trains (blue) are shown together with the spike trains of their matched output LTS neurons (orange) for the last 5 s of each of the 6 datasets for which the SNN managed to perform spike-sorting with an f-score above 0.8.

These results were further compared to several spike-sorting algorithms gathered on the SpikeForest framework71. Out of the 10 classifiers presented in the SpikeForest framework, only 6 of them were reported with classification of all the recordings of the real spike-sorting dataset: HerdingSpikes2, IronClust, Kilosort, MoutainSort4, SpykingCircus and Tridesclous. We thus compared our network with the worst and the best performances of these 6 classifiers. For this purpose, we used the accuracy, precision and recall metrics as used by SpikeForest. As illustrated in Fig. 8a, the classification accuracy and recall obtained with our SNN-based approach were overall inferior to those of the best spike-sorter but superior to the worst one for two datasets. The precision of the SNN classification was however generally comparable to that of the best spike-sorter and even higher in one case. In order to test the reason why the SNN failed for one dataset, we varied the δ threshold used in the encoding step (see Methods) and found that classification could be improved with appropriate values tuned for each dataset (Fig. 8b), indicating that the encoding approach plays an important role in the overall SNN performance.

Fig. 8: Comparison of the SNN classification performance with respect to 6 spike-sorting algorithms.
Fig. 8: Comparison of the SNN classification performance with respect to 6 spike-sorting algorithms.
Full size image

For all methods, three metrics are considered: accuracy, precision and recall. For each recording and metric, the present SNN-based approach (red) is compared to the worst (blue) and best (green) performances across the 6 classifiers from SpikeForest. a Original data encoding for the SNN approach identical for all datasets. b Data encoding optimized for each dataset.

Discussion

The primary objective of this study was to propose and explore the capabilities of a minimal SNN architecture for unsupervised pattern classification in multichannel temporal data. In this work, we presented a very frugal single-layer SNN containing a very limited number of artificial neurons, yet capable of learning patterns in continuous streams of data through fully unsupervised biological learning rules. The same network architecture with few LTS neurons could classify multivariate patterns in four different types of data, namely simulated artificial patterns, audio data containing eleven different vowels, propagating waves of multiunit activity in an embryonic spinal cord, and simulated and real spike-sorting datasets. Although simplistic, the artificial data was used as a baseline for the classification capabilities of our SNN, and especially to validate the possibility for the SNN to distinguish between two patterns, one being fully inside the other, a problem that online unsupervised STDP-based classification with SNN could not achieve when the size of expected patterns remained unknown. Here, our algorithm was able to achieve this separation thanks to the use of intrinsic plasticity. Moreover, the use of LTS neurons allowed to automatically wait for the end of each pattern before emitting a spike without the need of computationally expensive delay synapses. The analysis of vowel classification revealed that the network maintained robustness across multiple instances of identical vowels, despite possible variability in their pronunciation. Classification on the spike envelopes of multiunit neural data was a proof of concept of fully unsupervised pattern recognition within multielectrode array data. Finally, the same network architecture was found to be also able to extract action potential waveforms in a fully unsupervised and online-compatible mode in simulated and real spike-sorting datasets with only a handful of neurons for several electrodes.

Throughout the presented simulations, we found that the quality of the data encoding was crucial for the classification performance of the SNN. We used two different types of encoding. The receptive field encoding method followed by STP ensured that the input spike trains contained spikes only during the presence of a pattern for audio and multiunit neural data. In case of single unit neural data (spike-sorting data), this method resulted in too few spikes given the rapid change in signal value of action potential waveforms. In this case, the delta encoding method was found to be more robust. We evaluated the SNN performance on the spike sorting task with a rule to set a δ threshold at each delay based on a multiplier of the MAD of the signal changes at these delays. The same multiplier was used for all experimental recordings, which we found was not optimal since better decoding accuracies could be obtained after optimizing the multiplier for each dataset. Developing a more robust encoding method could help the network to further improve its classification capabilities.

The choice of using LTS neurons as the spiking neuron model allowed to have a single layer of only a few neurons based on the fact that the neurons wait for the pattern to end and produce one postsynaptic spike after the end of the pattern. As a result, one neuron produced one output spike per pattern for easy inference. We showed that the combination of the LTS neurons and several biological plasticity rules resulted in each pattern requiring just one neuron to be learned. Existing research in the domain of unsupervised or even supervised classification with SNNs often involve multiple convolutional layers of LIF neurons to extract features from input data, thus typically involving several hundred thousand parameters that need to be learned77,78, precluding highly energy-efficient neuromorphic pattern learning. Sometimes, they also involve an additional post-hoc classifier at the readout layer to classify the aforementioned features69,70. The learning paradigm is also different as training is usually done non-sequentially by splitting the continuous data into frames of fixed duration, as opposed to the continuous learning in our case. Finally, handling temporal patterns has typically been solved using multiple synapses with different delays53,74,79. Here, thanks to the dynamics of LTS neurons, we propose a simple solution avoiding multiple synaptic delays that works sequentially in a fully unsupervised way on continuously incoming temporal data on multiple channels simultaneously.

A significant feature of the proposed algorithm and SNN architecture, is its ability to discriminate between highly overlapping patterns, and in particular between patterns where one is completely included into another one. This was possible thanks to the use of an IP rule that allowed the LTS neurons to adapt their thresholds based on the size of the patterns. The network is further dependent upon only a few critical parameters. The first one is the LTS neuron time constant. It determines the inertia of the neuron before generating a spiking rebound, and should thus be chosen according to the inter-spike intervals within the input spike trains and the expected interval between patterns. The second important parameter is the STDP lookback window, which determines the maximum duration of the patterns that are being searched for. If too short, it might prevent learning long patterns, and if too long and exceeding the minimum interval between patterns, it might aggregate patterns. However, a limitation of the current implementation is that patterns that are the same but of different durations cannot be learned separately. This could be addressed by implementing learnable time constants where the neurons adjust not only their threshold, but also the duration for which they shall remain inhibited.

With the advent of very dense neural probes generating very large amounts of neural data, there is currently no technological solution to perform spike-sorting directly within the probe, as this would require algorithms compatible with very low power hardware. In this context, our results on the spike-sorting datasets aim at bridging this gap. The frugal SNN proposed in this study is a first proof of concept of fully unsupervised and online-compatible recognition of spatiotemporal neural patterns within multielectrode array data that can eventually be implemented in low-power hardware. In an earlier study74, we proposed a more complex SNN having two processing layers connected by many synapses and an attention mechanism that was capable of processing data from only a single channel. Our current study proposes a significantly simpler and much more frugal single-layer architecture yet capable of processing multiple channels simultaneously. Beyond spike-sorting, the proposed SNN architecture could also be used to extract other types of spatio-temporal neural patterns. Indeed, neural data is constituted of two types of signals, spiking activity reflecting the emission of action potentials, and local field potentials (LFPs) that are slower signals reflecting all other type of transmembrane neural currents and in particular synaptic activity. Here, we show the network’s capability in handling also slowly propagating neural data with the example of the propagating waves of activity in the developing embryonic spinal cord. This may open perspectives for the automatic detection and classification of other types of slow neural patterns collected by brain implants, a key topic for instance in epilepsy48. Eventually, such frugal SNN processing scheme could thus benefit intelligent brain implants embedding fully unsupervised extraction of LFP and spiking activity for versatile applications.

Methods

Network architecture

LTS neurons

As illustrated in Fig. 3a, the network consisted of a single layer of a few LTS neurons that were connected to input spike trains through negative synaptic weights initialized randomly according to a uniform distribution and further clipped between [-1,0] at all times. LTS neurons, that are a type of Integrate-and-Fire (IF) neurons, have the property to be inhibited during the presence of a stimulus and generating a rebound after the end of the stimulus (Fig. 3b). Therefore, as the spike trains are passed through the network, incoming currents (spikes x weights) due to the presence of a pattern hyperpolarized all the LTS neurons. Once the incoming currents stopped due to the end of the pattern, the LTS neurons generated a potential rebound. A Winner-Take-All (WTA) mechanism chose among the neurons that have crossed their respective thresholds, the one with the steepest rebound to generate a postsynaptic spike. The LTS neurons were modeled by the following equations:

$${\tau }_{m}\frac{{dV}}{{dt}}=-V+q+g{I}_{{stim}}$$
(1)
$$\frac{{\tau }_{m}}{\varepsilon }\frac{{dq}}{{dt}}=-q+f\left(V\right)$$
(2)
$${withf}\left(V\right)=\left\{\begin{array}{c}{\alpha }_{n}{VifV} < 0\\ {\alpha }_{p}{ifV}\ge 0\end{array}\right.$$
(3)

where \(V\) is the LTS neuron potential, \(q\) is an adaptation variable that triggers the rebound after inhibition, \({\tau }_{m}\) is the membrane time constant that is chosen depending on the type of data, \(\varepsilon\) is a constant that makes \(q\) vary slower than \(V\), \({I}_{{stim}}\) is the stimulus current (spikes x weights) of the timestep and \(g\) is a constant. Whenever the network produces a postsynaptic spike, both \(V\) and \(q\) were reset to 0 for all neurons. Table 1 illustrates the LTS neurons parameters used for the different types of data the network was tested on.

Table 1 Core parameters of the LTS neurons for the different types of data tested

The membrane time constant \({\tau }_{m}\) was chosen according to the size of the patterns expected in the input data. The artificial patterns and vowels data contained patterns that lasted for about ~500 ms on average. Neural data, on the other hand contained patterns that lasted several seconds. The parameter \(\varepsilon\) was chosen according to the inter-pattern interval in the data.

Plasticity rules

Learning took place whenever a postsynaptic spike was output by the network. At the occurrence of every postsynaptic spike, the following plasticity rules enabled the network to learn:

Classical STDP strengthened the synapses connecting the neuron that generated a postsynaptic spike and the input spike trains that exhibited spiking activity within a certain pre-time window, thereby implementing Long-Term Potentiation (LTP). It also weakened the synapses connecting the same post-synaptic neuron and the input spike trains that did not exhibit any spiking activity within the pre-time window, thereby implementing Long-Term Depression (LTD). We chose to implement a simple version of this rule, which is define as follows:

$$\varDelta {w}_{{ij}}=\left\{\begin{array}{c}{w}_{{LTP}},{if}\exists {t}_{i}\in {S}_{i}{such \,that}\,{t}_{j}-{T}_{{STDP}} < {t}_{i}\le {t}_{j}\\ {w}_{{LTD}},{if}\nexists {t}_{i}\in {S}_{i}{satisfying}\,{t}_{j}-{T}_{{STDP}} < {t}_{i}\le {t}_{j}\end{array}\right.$$
(4)

where \({w}_{{ij}}\) is the synapse connecting input spike train \(i\) and the LTS neuron \(j\) that spiked, \({t}_{i}\) is the time of occurrence of the presynaptic spike and \({t}_{j}\) is the time of occurrence of the postsynaptic spike, \({S}_{i}\) is the set of presynaptic spike times for the input spike train i, \({T}_{{STDP}}\) is the duration of the window preceding \({t}_{j}\) that determines the relevant temporal context for STDP, \({w}_{{LTP}}\) = -0.1 as we use negative weights, and \({w}_{{LTD}}\) = 0.06. \({T}_{{STDP}}\) was set to 500 ms for the artificial patterns and the vowel data, 8 seconds for the neural data, and 3.5 ms for the spike-sorting data.

Lateral STDP, another STDP rule was used to govern lateral inhibition between LTS neurons. It weakened the synapses connecting all neurons other than the postsynaptic neuron that spiked, and the input spike trains that exhibited spiking activity within the same pre-time window. This was to prevent multiple neurons from learning the same pattern. However, this update rule was much weaker than the classical STDP rule, keeping in mind that patterns might share common spiking activity. For the postsynaptic neuron j that spiked, the lateral STDP rule is defined as follows:

$$\varDelta {w}_{{ij}}={w}_{{potentiation}},{if}\exists {t}_{i}\in {S}_{i}{such \,that}\,{t}_{j}-{T}_{{STDP}} < {t}_{i}\le {t}_{j}$$
(5)

For all other postsynaptic neurons \(k\ne j\), the lateral STDP rule is defined as:

$$\varDelta {w}_{{ik}}={w}_{{inhibition}},\forall k\in N,k\ne j,{if}\exists {t}_{i}\in {S}_{i}{such \,that}\,{t}_{j}-{T}_{{\mbox{STDP}}} < {t}_{i}\le {t}_{j}$$
(6)

where \({w}_{{ik}}\) is the synapse connecting input spike train \(i\) and each non-spiking postsynaptic neuron \(k\), \(N\) is the set of all postsynaptic neurons, \({w}_{{inhibition}}\,\)= 0.0002 and \({w}_{{potentiation}}\,\)= −0.001. This formulation drove the network towards a more selective and refined connectivity pattern based on the temporal spiking relationships.

Intrinsic Plasticity, unlike STDP, is a form of plasticity implemented on the neurons and not the synapses connecting the spike trains and the neurons. It helped neurons adapt their thresholds based on the size of the pattern learned. The thresholds of all output neurons were initialized at a low value, to promote learning at the beginning of training and as training progressed, each neuron increased its threshold Th according to the size of the pattern learnt and reached an equilibrium threshold indicative of the size of the pattern learnt. Every time a postsynaptic LTS neuron j emitted a spike, its threshold Thj was decreased as follows:

$${{Th}}_{j}\leftarrow {{Th}}_{j}-{F}^{\Delta {Th}}*{{Th}}_{j}$$
(7)

where \({F}^{\Delta {Th}}\) is a factor by which the threshold is decreased (see Table 2).

Table 2 Threshold update parameters of the LTS neurons for the different types of data tested

For each pre-synaptic spike received from the input spike train i within a coincidence time window before the post-synaptic spike of the LTS neuron j, the threshold of the LTS neuron j was increased by a value that was obtained by multiplying the synaptic weight \({w}_{{ij}}\) between the input train i and the LTS neuron j by the factor \({\Delta {Th}}_{{pair}}\) :

$${{Th}}_{j}\leftarrow {{Th}}_{j}+{\sum}_{i\in S}{{w}_{{ij}}*\Delta {Th}}_{{pair}}$$
(8)

where S is the set of spike trains with pre-synaptic spikes occurring within the coincidence window. The thresholds of all neurons were initialized at 20 and then were clipped between [20,3500] at all times.

Encoding

The process of transforming multichannel data into spike trains is a pivotal step in training SNNs for learning tasks. The effectiveness of this encoding directly influences the network’s ability to classify and interpret data. The encoding method determines how well the temporal and spatial dynamics of the data are captured and represented as spikes. Here, we encoded each channel of the data into a collection of spike trains while retaining the original geometry of the data. As shown in Fig. 1a, each channel is normalized and discretized into 20 receptive fields. The continuous signal, ranging from 0 to 1, was divided into 20 equal intervals representing the sensitivity of each receptive field. At each timestep, depending on the value of the signal, a spike was encoded by the field corresponding to the signal value. Two additional spikes were encoded both above and below the central spike making a total of five spikes per timestep. There were therefore 24 spike trains representing each channel of the data. The artificial patterns did not have an encoding step as they already represented the final spike trains ready to be passed into the network.

Short-term plasticity

To ensure that learning by LTS neurons is not driven by the background noise of all channels, we implemented a mechanism called Short-Term-Plasticity (STP) that quickly suppressed all the spike trains that corresponded to noise/silence. After the unwanted spike trains were suppressed, the retained spike trains were the ones that encoded rich vowel information. To implement STP, we assigned a weight \({w}_{{STP}}\) to every spike train. This weight, which was initialized to 1 for all spike trains, is a probability of the spike train to encode a signal. The input spike trains were subjected to STP before training and as they are processed through time, the weights of the spike trains encoding noise are quickly decreased. Once the weight of any spike train fell below 0.75, we stopped STP and mapped the weights of all spike trains below a certain threshold to 0 and the others to 1. This threshold was 0.92 for the vowels and 1 for the neural data. Furthermore, for each group of 24 spike trains corresponding to a certain signal, we checked if at least 60% of the spikes were retained after STP and if not, the remaining spikes were also mapped to 0 in order to clean up residual spikes potentially corresponding to noise. STP is governed by the following equations:

$$\begin{array}{c}\frac{d{w}_{{STP}}}{{dt}}=\frac{1}{{\tau }_{{\mbox{stp}}}}\left(1-{w}_{{STP}}\right)\\ \,\end{array}$$
(9)
$$\frac{d{w}_{{STP}}}{{dt}}=\frac{1}{{\tau }_{{\mbox{stp}}}}\left(1-{w}_{{STP}}\right)-{w}_{{STP}}*{f}_{d}$$
(10)

where \({\tau }_{{\mbox{stp}}}\,\)= 2000 ms is the STP time constant and \({f}_{d}\) = 0.003 is the depression factor. The first of these two equations is the weight update rule for spike trains that do not have spikes in the timestep and the second equation is the weight update rule for spike trains that have spikes in the timestep. Post STP, only the spikes corresponding to the spike trains encoding relevant data beyond noise were retained (see Fig. 1g,h).

Vowel data

The vowels were recorded with a microphone (SHURE Beta 58 A) and Audacity software at a sampling rate of 44.1 kHz. A native French male was asked to repeat eleven French vowels 50 times. The recorded audio was subjected to a frequency transform using the SPTK library to obtain 25 Mel Cepstral coefficients. The first Mel reflecting mostly the amplitude of the sound, and thus being not specific to which vowel was pronounced, was discarded. The other 24 Mels were normalized between 0 and 1, smoothed, quantized and encoded as spike trains (see Fig. 1c,e,g) into an array of binary values. Prior to encoding, we chose to smoothen the Mels with a sliding 2nd order Butterworth filter below 5 Hz to make the network more robust to different occurrences of the same pattern. Unlike the artificial patterns, which had spikes only during the duration of each pattern and no spikes before or after the pattern, the vowels’ spike trains had spikes corresponding to noise/silence across all Mels. Therefore, the encoded spike trains were first subjected to STP, to eliminate spikes corresponding to noise and to only retain spikes corresponding to peaks and throughs. These spike trains were then passed into the network as input.

Multiunit neural data

Neural data was reused from a previously published study89. They corresponded to rhythmic activity waves propagating across a whole embryonic OF1 mouse hindbrain-spinal cord preparation at stage E13 laid down on a 60-channel microelectrode array (Ayanda Biosystems, Lausanne Switzerland) arranged as 4 columns of 15 microelectrodes (Fig. 1d, left). The detailed procedure to acquire these data has been detailed previously89 and was in accordance with protocols approved by the European Community Council and conformed to National Institutes of Health Guidelines for care and use of laboratory animals. In short, after dissection and meninges removed, the neural tissue was maintained on the electrode array with a custom net and continuously superfused with aCSF (in mM: 113 NaCl, 4.5 KCl, 2 CaCl22H2O, 1 MgCl26H2O, 25 NaHCO3, 1 NaH2PO4H2O, and 11 D-glucose) at a rate of 2 ml/min. Neural data were acquired at 10 kHz using a MEA1060 amplifier from Multi Channel Systems (MCS), with x1200 gain and 1–3000 Hz bandpass filters, connected to two synchronized Power 1401 acquisition systems (Cambridge Electronic Design LTD, Cambridge, UK). Each channel was then bandpass filtered between 200 Hz and 2 kHz to retain high-frequency components (Fig. 1d, middle). Once filtered, we extracted multiunit activity by computing the mean and standard deviation of each channel and considered as spikes those datapoints that were at least 3 standard deviations above or below the mean. Each channel was then downsampled by a binning factor of 100 where each bin was replaced by the total number of spikes in the bin. Finally, a Gaussian kernel (n = 501, σ = 51 time bins) was convolved to each channel to obtain smoothed spike envelopes of the original neural data (Fig. 1d, right). These spike envelopes were then normalized between 0 and 1 and encoded as spike trains and subjected to STP in a manner similar to the vowels. The final spike trains obtained after STP were then passed into the network for learning.

Spike-sorting data

For the simulated spike-sorting data, we used a script to simulate neural activity from a fixed set of neurons that were positioned randomly within a 3D volume that represented a layer of neural tissue overlaid on a multielectrode array (MEA) (see Fig. 2c). The electrical activity of each neuron was modeled using random cosine-Gaussian waveforms, which are commonly employed to capture the combined effects of periodic oscillations and spatially localized activation90. These waveforms were defined by the equation:

$$w(t)=A\cdot \cos \left(\frac{2\pi t}{{t}_{s}}+\phi \right)\cdot \exp \left(-{\left(\frac{2.3548t}{{t}_{g}}\right)}^{2}\right)$$

where \(A\) represents the amplitude of the waveform, \({t}_{s}\) is the time period of the cosine component, \(\phi\) is the phase shift, and \({t}_{g}\) is the width of the Gaussian envelope. This equation describes a cosine signal modulated by a Gaussian function, where the exponent terms governs the rate of decay of the waveform over time.

The real spike sorting dataset was a publicly available real MEA dataset dedicated to validate spike sorting algorithms87. The dataset was obtained from simultaneous loose-patch and multi-electrode array (MEA) recordings of mouse retinal ganglion cells. The MEA consists of a 16 × 16 grid of microelectrodes with an inter-electrode spacing of 30 μm, offering high-density extracellular recordings where individual action potentials were typically detected by several neighboring electrodes. We considered an 8×8 subgrid of electrodes surrounding the juxta-cellular neuron for which the ground truth was known (see Fig. 2f). Both the simulated and real spike sorting data were sampled at 20 kHz and then filtered between 300 Hz and 3 kHz. Once filtered, the spike-sorting datasets were encoded by a delta encoding technique inspired by the previously proposed Address-Event Representation (AER) technique86. At each timestep \(t\), each signal is compared to its past values at \(t-i\) for \(i\) from \(1\) to \(k\). The signal is encoded into \(2k\) spike trains: \(k\) for positive and \(k\) for negative deltas. If the signal increased by more than a threshold \(\delta\) between \(t-i\) and t (i.e., signal\([t]\) − signal\([t-i]\ge \delta\)), a spike was generated at time \(t\) in the corresponding positive delta spike train. If the signal decreased by at least \(\delta\) between \(t-i\) and t (i.e., signal\([t-i]\) − signal\([t]\ge \delta\)), a spike was generated at time \(t\) in the corresponding negative delta spike train (see Fig. 2a,b). The threshold value \(\delta\) was different for each channel and each \(i\) value. For each channel, the differences of all the timesteps with their \({i}^{{th}}\) previous timestep were calculated. Then, the median absolute difference was computed on the absolute values of these differences. These values were then multiplied by a global multiplier to finally obtain the \(\delta\) thresholds for each channel and each \(i\) value. The value of this multiplier was 10 for the simulated dataset and 25 for the real dataset; indicative of the average SNR of the datasets. This encoding technique did not necessitate the STP step as it encoded spikes only in the presence of patterns. The delta-encoded spike trains were then passed into the network for learning.

Inference and evaluation

To assess the classification performance of the network, we first matched the truth spike trains and the output spike trains to get truth-output pairs. To perform this matching, we convolved all the truth spike trains and output spike trains with a Gaussian kernel (n = 31, σ = 3 time steps) and then computed the cross-correlation between each of the truth spike trains and the output spike trains. In case of spike-sorting data, we performed the matching between the ground truth spike trains and the output neurons’ spike trains by maximizing the number of correct hits.

For each pair of truth-output, the f-score was computed as:

$$\begin{array}{c}{F}_{{ij}}=\frac{2*{H}_{{ij}}}{{T}_{i}+{O}_{j}}\end{array}$$

where \({T}_{i}\) was the number of spikes of the \(i\) th truth spike train, \({O}_{j}\) was the number of spikes emitted by the \(j\) th output neuron and \({H}_{{ij}}\) was the number of output spikes coinciding with a truth spike within a coincidence window. The coincidence window was 400 ms for the artificial patterns and vowel data, 2.5 s in the case of the multiunit neural data, and 4 ms in the case of spike-sorting data. These values corresponded to the time needed by a LTS neuron to generate its rebound and cross its threshold. We also computed a global f-score across all truth neurons and all output neurons as:

$$F=\frac{2*H}{T+O}$$

where \(T\) was the total number of truth spikes, \(O\) was the total number of output spikes and \(H\) was the total number of hits. In the case of the vowels, a confusion matrix was also computed to evaluate the classification performance of the model.

When comparing our SNN to other spike-sorting algorithms of the SpikeForest framework, we used the metrics used on the SpikeForest platform71 (https://spikeforest.flatironinstitute.org/metrics).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.