Abstract
Biological and artificial systems encode information through complex nonlinear operations across multiple timescales. A clear understanding of the interplay between this multiscale structure and the nature of nonlinearities at play is, however, missing. Here, we study a general model where the input signal is propagated to an output unit through a processing layer via nonlinear activation functions. We focus on two widely implemented paradigms: nonlinear summation, where signals are first nonlinearly transformed and then combined; nonlinear integration, where they are combined first and then transformed. We find that fast-processing capabilities systematically enhance input-output mutual information, and nonlinear integration outperforms summation in large systems. Conversely, a nontrivial interplay between the two strategies emerges in lower dimensions as a function of interaction strength, heterogeneity, and sparsity of conections between the units. Finally, we reveal a tradeoff between input and processing sizes in strong-coupling regimes. Our results shed light on relevant features of nonlinear information processing with implications for both biological and artificial systems.
Similar content being viewed by others
Introduction
The ability to encode and process information from the external world is essential to maintain robust functioning in biological systems1. These goals are usually achieved through complex internal machinery that involves nonlinear operations. For example, multi-molecular reactions drive sensing and adaptation in chemical networks2,3,4, gene regulatory dynamics is controlled by protein-mediated interactions leading to multi-stable phases corresponding to different cell fates5,6, phase coexistence phenomena sustain noise reduction and functional organization in cellular environments7, and complex interaction networks underlie the computational capabilities of neural populations8,9. As such, extracting information from a given input to generate a desired output is a fundamental problem that spans several fields, from signal processing in biochemical systems10,11 to designing and training artificial neural networks12. Many of these systems share the idea that inputs need to be processed via different types of nonlinear activation functions to enable non-trivial learning tasks. Despite remarkable results, understanding the key determinants of how the type of nonlinearity shapes information processing is an active area of theoretical research13,14,15. Recent works have investigated the performance of computation instantiated by biological media, making an effort to bridge artificial and biochemical processing16 and highlighting the pivotal role of nonlinear encoding17,18,19 and multiple timescales20,21,22.
Information theory provides us with tools to quantitatively study information-processing capabilities of various systems ranging from stochastic processes23,24 to biological scenarios25,26,27. While the impact of timescales on information propagation has been understood in general frameworks28,29, the role of internal nonlinear mechanisms remains unclear without focusing on specific models. One of the main difficulties resides in the lack of general analytical approaches, with results obtained for large systems and phenomenological models often relying on Gaussian approximations of various forms30,31.
In terms of nonlinear processing, two paradigmatic schemes have been extensively implemented in a variety of different contexts32,33,34,35,36: nonlinear summation, in which incoming signals are first nonlinearly transformed and then summed before affecting the target; and nonlinear integration, in which signals are integrated before the nonlinear processing step. Although they underscore different underlying physical processes, summation and integration have often been used interchangeably to describe neural systems37,38,39,40,41,42,43,44,45,46, especially when there is no clear biological reason to choose one over the other. A similar dichotomy is present in gene networks, where, given the absence of microscopic models substantiating either one of these two schemes, there is no consensus on the use of summation or integration, with various works reporting opposite approaches47,48,49,50. In these scenarios, despite their ability to describe the systems’ dynamics, summation and integration may present striking differences when considering processing performances. These differences may be especially relevant in the presence of multiple timescales28. Even in those cases in which the implementation of nonlinear summation or integration is biologically motivated, it would be crucial to understand their effect on information processing. For instance, dendrites are believed to perform integration in neural circuits35; in gene networks, gene-protein interactions appear in the form of a nonlinear summation when derived from first principles49; many adaptive dynamical systems are usually modeled via nonlinear integration schemes due to the presence of a slow feedback dynamics32,34; controlled stochastic chemical networks naturally implement summation along multiple reaction channels51,52.
In this work, we study the information-theoretic features associated with nonlinear summation and integration in a generic multiscale information-processing system. To this end, we consider a possibly high-dimensional signal from an input unit, then processed by a processing unit, and finally encoded into an output unit. Interactions among the units form a general multilayer network structure that supports the propagation of the input information28,53,54. Crucially, each unit may operate on a different timescale and is composed of an arbitrary number of individual degrees of freedom (dofs), such as neurons in neural networks, chemical species in a signaling architecture, or genes in a genetic network. Operations between the units are implemented by an activation function that can perform nonlinear summation or integration of the incoming signals from the connected unit. We first find the exact expression for the probability distribution (pdf) of the system in different timescale regimes. Then, we employ this result to characterize the mutual information between input and output. We find that, in the absence of a processing unit, there exists a crossover from a region in which nonlinear summation is favored to one in which integration leads to higher mutual information. Crucially, we also show that the presence of an intermediate processing unit enhances encoding performances when acting on a faster timescale than the output unit. Further, we study the effect of the system’s dimensionality, finding that fast nonlinear integration schemes are associated with higher input-output mutual information in large multiscale systems, emerging as the backbone of accurate processing in this scenario. On the other hand, nonlinear summation is beneficial in small systems with highly heterogeneous couplings. Finally, we show that nonlinear integration may lead to the spontaneous emergence of bimodality in the output layer even for Gaussian inputs, underpinning its role in implementing dynamical input discrimination that can be tuned by tinkering with internal parameters.
Results
Multiscale information-processing systems
We consider a general information-processing system with three different stochastic units: input I, processing P, and output O. Each unit is composed of Mμ degrees of freedom (dofs) with a shared timescale τμ and whose activity is denoted by \({{{\bf{x}}}}_{\mu }\in {{\mathbb{R}}}^{{M}_{\mu }}\), with μ = I, P, O. All dofs within the same unit are linearly coupled with an interaction matrix \({\hat{A}}_{\mu }\in {{\mathbb{R}}}^{{M}_{\mu }\times {M}_{\mu }}\); conversely, the coupling from unit ν to μ is implemented via a nonlinear activation function \({{{\boldsymbol{\phi }}}}_{\mu \nu }\in {{\mathbb{R}}}^{{M}_{\mu }}\) that depends, in principle, on all dofs within ν and an interaction matrix \({\hat{A}}_{\mu \nu }\in {{\mathbb{R}}}^{{M}_{\mu }\times {M}_{\nu }}\). The system’s dynamics is described by the following set of Langevin equations:
where gμν is the interaction strength between unit ν and μ, ξμ a vector of Gaussian white noises, and \({\hat{D}}_{\mu }={\hat{\sigma }}_{\mu }{\hat{\sigma }}_{\mu }^{T}\in {{\mathbb{R}}}^{{M}_{\mu }\times {M}_{\mu }}\) defines a diagonal diffusion matrix. For simplicity, we take \({\hat{D}}_{\mu }\) to be the identity matrix. We also assume that the input evolves independently, as this is the case for a relevant class of biophysical scenarios31,55,56,57. Then, the input is passed to the processing unit through a directional coupling, i.e., gPI ≠ 0 and gIP = 0. After the processing step, the signal arrives at the output unit, again through a directional coupling, i.e., gOP ≠ 0 and gPO = 0.
To investigate how the mechanisms implementing internal nonlinear processing affect the information content of the system, we study the mutual information between input and output units,
where \({h}_{O| I}={\langle {p}_{O| I}{\log }_{2}{p}_{O| I}\rangle }_{O}\). Here, pIO(xI, xO) is the joint pdf of input and output dofs, pI(xI) and pO(xO) are their respective marginal pdfs, and HO is the Shannon entropy58 of the output dofs distribution computed in bits. IIO quantifies the information shared between I and O, therefore acting as an unbiased proxy for processing accuracy in this paradigmatic setting58.
As demonstrated in28, if the dynamics of the input unit are the fastest at play (τI ≪ τP, τO), no mutual information can be generated between I and O. Conversely, a slow input is a necessary condition to have a non-zero IIO. We still have the freedom to set the timescales of processing and output units, distinguishing two relevant cases: a fast-processing system (τP ≪ τO) and a slow-processing one (τP ≫ τO). However, a crucial role is also played by the specific type of nonlinearity at hand, encoded in the vector ϕμν. As discussed in the introduction, we distinguish two widely used but distinct cases, corresponding to different processing schemes: nonlinear summation (ns)36,39,40,45,46,59 and integration (int)32,34,35,37,38,60,61,62. By setting a hyperbolic tangent as activation function, a customary modeling choice for recurrent neural networks36,45,59, we have the following forms for the i-th component of interaction terms between units:
where all nodes in unit ν contribute to the dynamics of node i in unit μ through the nonlinear activation function and the set of weights \({A}_{\mu \nu }^{ij}\), with j = 1, …, Mν, mediating the coupling. These two cases represent different physical processes. For a nonlinear summation, the signals generated by each dof in unit ν are first nonlinearly transformed, and then linearly projected by means of the interaction matrix \({\hat{A}}_{\mu \nu }\). In contrast, for nonlinear integration, the signals from unit ν are first linearly combined via the weight matrix \({\hat{A}}_{\mu \nu }\), and then the resulting integrated signal is nonlinearly transformed by the activation function and passed to the i-th dof of unit μ.
Exact solution for fast and slow processing units
The first contribution of this work is to provide an analytical solution for the joint distribution of the whole system, pIPO, that can be exploited to evaluate the input-output mutual information IIO, and the output pdf pO. While IIO informs us on the processing performance of the system, pO contains information on the ability to perform input discrimination. pIPO satisfies the following Fokker-Planck equation63:
where \({{{\mathcal{L}}}}_{\mu }\) is the Fokker-Planck operator associated with the unit μ = I, P, O, as detailed in the Supplementary Notes 1 and 2. Although general exact expressions are out of reach without approximations, the limits of fast and slow processing can provide useful insights into system operations, provided the presence of a slow input unit. From28,29, we know that in these two limiting regimes the joint pdf of input, processing, and output dofs is the product of conditional distributions. As we show in the “Methods” and the Supplementary Note 3 and 4, at the steady-state (i.e., when ∂tpIPO = 0) we have the steady-state or stationary distributions:
where the superscript “st” (omitted on the l.h.s.) stands for stationary, and “eff” indicates a pdf that solves an effective operator obtained from the ensemble average over dofs faster than its corresponding unit. We use the superscript “fp” and “sp” to indicate that these quantities are evaluated respectively for fast and slow processing. Let us inspect all these terms one by one. \({p}_{I}^{{{\rm{st}}}}\) is the multivariate Gaussian distribution of the input dofs with mean mI and covariance matrix \({\hat{\Sigma }}_{I}\) that solves the Lyapunov equation \({\hat{A}}_{I}{\hat{\Sigma }}_{I}+{\hat{\Sigma }}_{I}{\hat{A}}_{I}^{T}=2{\hat{D}}_{I}\). By exploiting the fact that intra-unit interactions are linear, all the conditional distributions may be written as:
with \({{{\mathcal{N}}}}_{\mu }\) a Gaussian distribution over xμ, \({\hat{\Sigma }}_{\nu }\) satisfying its corresponding Lyapunov equation, and the average containing the dependence on the conditional variable as follows:
Notice that the functional form of Eq. (8) depends on the nonlinear processing mechanism considered in Eq. (3). However, when an effective operator is involved, calculations become harder. By using a convergent expansion of the hyperbolic tangent, we show that:
with again \({\hat{A}}_{O}{\hat{\Sigma }}_{O}+{\hat{\Sigma }}_{O}{\hat{A}}_{O}^{T}=2{\hat{D}}_{O}\) and
where we employed the shorthand notation \({{{\mathcal{F}}}}^{i}({{\bf{x}}},{{\bf{y}}})={{\mathcal{F}}}({x}^{i},{y}^{i})\). In particular, \({{\mathcal{F}}}\) is a nontrivial nonlinear function defined in the “Methods” and in the Supplementary Note 4, and we introduced the following integrated quantities:
From Eq. (10), we notice that the dependence on xI enters solely through mO∣P, defined in Eq. (8). The main difference resides in the fact that, in the case of summation, the nonlinear function \({{\boldsymbol{{{\mathcal{F}}}}}}\) has to be averaged with processing weights \({\hat{A}}_{OP}\), while in the case of integration, \({{\boldsymbol{{{\mathcal{F}}}}}}\) must be directly evaluated on integrated quantities.
Putting all these results together, we obtain an analytical expression for the joint pdf of the whole system, pIPO. We stress that pIPO is a highly nonlinear distribution. However, our factorization into conditional Gaussian distributions incorporates the nonlinearities only into their means, allowing in particular for efficient sampling (see “Methods”). Furthermore, the structure of the resulting conditional dependencies is crucially different between fast and slow processing units, with fundamental implications for the mutual information between the input and the output. To obtain general results, we focus on the case of random interactions, an approach that has provided fundamental insights in several fields32,33,36,59,64,65,66. We take interactions within the same unit μ to be distributed as \({A}_{\mu }^{ij} \sim {{\mathcal{N}}}(0,{\sigma }_{\mu }/\sqrt{{M}_{\mu }})\) with diagonal elements \({A}_{\mu }^{ii}=1\) for all i = 1, …, Mμ, so that, as unit dimensions increase, it remains linearly stable if σμ < 1 (see ref. 67 and Supplementary Note 3). Interactions from unit ν to μ are instead distributed as \({A}_{\mu \nu }^{ij} \sim {{\mathcal{N}}}(0,{\sigma }_{\mu \nu })\), and all results are obtained by averaging over realizations of these random matrices. Intuitively, while gμν describes the overall interaction strength from ν to μ, σμν models the intrinsic coupling heterogeneity.
In Fig. 1, we show stochastic trajectories and probability distributions at the steady state of the output degree of freedom for slow and fast processing, both in the case of summation and integration. For simplicity of computation and visualization, we will consider a one-dimensional output unit throughout this manuscript. While there is no striking difference between slow and fast processing at the dynamical level, nonlinear summation and integration lead to two very different distributions in the output node. Integrating incoming signals from one unit to the other favors the spontaneous emergence of a pronounced switching behavior that reflects into a bimodal distribution, a signature of input discrimination. The last part of this manuscript will be dedicated to quantitatively substantiating this observation.
a Scheme of the model for a slow processing unit. From left to right, units are ordered with decreasing timescales. I indicates the input, P the processing unit, and O the output. Links with empty dots denote interactions from a slow to a fast unit. b Output distribution for a slow processing unit in the presence of nonlinear summation, where the activity of the components of each unit is first integrated and then summed. The dashed line is obtained by exact sampling, while the shaded area represents the histograms obtained from Langevin trajectories with a timescale ratio Δτ = 10−2 between the units. c The stochastic trajectory of the one-dimensional output unit in this regime. d Same as (b) in the presence of nonlinear integration. e Stochastic trajectory of the one-dimensional output in this case. f Scheme of the model for a fast processing unit, with the same ordering as in (a). The link with a filled dot denotes interactions from a faster to a slower unit. g Output distribution for a fast processing unit in the presence of nonlinear summation, where the activity of the components of each unit is first summed and then integrated. h Stochastic trajectory for the one-dimensional output in this case. i Same as panel (g) in the presence of nonlinear integration. j Stochastic trajectory for the one-dimensional output in this case. In this figure, the unit dimensions are MI = 5, MP = 3, and MO = 1. Interactions between units are distributed as \({{\mathcal{N}}}(0,1)\), whereas intra-unit interactions follow \({{\mathcal{N}}}(0,0.9/\sqrt{{M}_{\mu }})\).
Enhanced information by fast processing units
We can now exploit the exact factorization of the joint pdf of the system to evaluate the accuracy of processing the stochastic input and encoding it into the one-dimensional output, by means of the mutual information IIO in Eq. (2). To establish a baseline for the full processing scheme described in the previous section, we first consider the simpler scenario of an input signal xI that is directly passed to a one-dimensional output unit xO. Once again, we focus on the limiting case of a slow input (τI ≫ τO) in which information can be transferred from the input to the output unit28,29. The joint steady-state distribution of input and output dofs reads \({p}_{IO}^{{{\rm{np}}}}={p}_{I}^{{{\rm{st}}}}{p}_{O| I}^{{{\rm{np,st}}}}\), where the superscript “np” stands for “no processing”. Here, \({p}_{O| I}^{{{\rm{np,st}}}}\) is a Gaussian distribution whose variance is independent on xI (see Eq. (7) and “Methods” for details). Thus, \({h}_{O| I}^{{{\rm{np}}}}\) does not depend on xI and is equal to
so that the mutual information simply reads
Therefore, evaluating \({I}_{IO}^{{{\rm{np}}}}\) amounts to computing the Shannon entropy of the output distribution, which can be done using standard estimators (see “Methods”)68.
In Fig. 2a–c, we compare the behavior of \({I}_{IO}^{{{\rm{np}}}}\) with nonlinear summation and nonlinear integration, as a function of the input-output coupling strength, gOI, and the standard deviation of their interactions, σOI, that accounts for coupling heterogeneity. As expected, in both cases information increases with gOI. Crucially, we also find that, while nonlinear integration performs better at small σOI, nonlinear summation becomes dominant at large σOI. This nontrivial switch signals the fact that, in the presence of large elements in the interaction matrix, \({I}_{IO}^{{{\rm{np}}}}\) is favored by nonlinear summation. Additionally, as shown in Fig. 2d, e, the output distribution with nonlinear integration becomes bimodal for large σOI due to the saturating effect of the hyperbolic tangent—a phenomenon much more pronounced when all the inputs are summed together in the argument of the activation function. We also find (Fig. 2f, g) that mutual information saturates to a finite value as the input gets closer to linear instability, a feature already observed in models of neural populations19. In particular, an input closer to linear instability appears to be always beneficial for nonlinear summation, further suggesting that large input values—either from xI itself or due to specific large couplings—are better represented in the output by summing separate activation functions. With nonlinear integration, instead, linear instability and large values of the input may decrease \({I}_{IO}^{{{\rm{np}}}}\), as they may push the activation function to saturation.
a–c Mutual information between input and output IIO in a system without a processing unit, as a function of the coupling strength gOI and the standard deviation σOI of the interaction matrix \({\hat{A}}_{OI}\) for nonlinear summation (superscript “ns'', teal), nonlinear integration (superscript “int'', orange) and their difference in (c). At small σOI, information is higher in the presence of nonlinear integration, whereas at large σOI, nonlinear summation wins. The pentagon and triangle represent specific parameter values analyzed in (d–e). The sketch on top of the panels represent different computational schemes. (d-e) Probability distribution of the one-dimensional output, pO, for the specific values highlighted in (a–c). At large σOI, the output distribution may become bimodal with nonlinear integration, due to the saturation of the activation function for large arguments. f, g Mutual information IIO as a function of the input stability for two different choices of parameters. As the input gets closer to linear instability, σI → 1, the mutual information grows for nonlinear summation. For nonlinear integration, instead, it saturates at possibly lower values at large σOI. h–j Mutual information IIO with the addition of a one-dimensional processing unit evolving on a fast timescale for different values of interaction heterogeneity σOP. For sufficiently large processing-output couplings gOP, the information (solid lines) is larger than the corresponding input-output system, where the processing is removed (dashed lines). Here, gPI = 2. (k-m) Same, with a processing unit slower than the output. Now, the mutual information is always smaller than in the input-output system. In this Figure, unless otherwise specified, the input dimension is MI = 50 and its interaction heterogeneity is σI = 0.9.
Then, we add back a processing unit to understand its effect on the mutual information between input and output units. We note that the system with or without processing is fundamentally different, as the processing unit evolves on its own timescale and therefore alters both the form of the Fokker-Planck operators and the structure of the associated joint probability distributions. In the case of fast processing, the joint steady-state distribution of input and output dofs is obtained from Eq. (5) by integrating over xP, i.e., \({p}_{IO}^{{{\rm{fp}}}}={p}_{I}^{{{\rm{st}}}}{p}_{O| I}^{{{\rm{eff,st}}}}\). Thus, as before, the variance of the Gaussian distribution \({p}_{O| I}^{{{\rm{eff,st}}}}\) is independent of xI (Eq. (9)), and the mutual information can be written following Eqs. (12) and (13) (using fp as a superscript). In the presence of a slow processing unit, instead, from Eq. (6) we have:
Although an expression for \({h}_{O| I}^{{{\rm{sp}}}}\) cannot be easily obtained in this case, we can efficiently sample \({p}_{O| I}^{{{\rm{sp}}},{{\rm{st}}}}\) by using Eq. (14) to compute the mutual information \({I}_{IO}^{{{\rm{sp}}}}\) from Eq. (2), as detailed in the “Methods”. In order to compare the results with the ones in the absence of a processing unit, we replace the previous output with a one-dimensional processing, i.e., we take gOI → gPI and σOI → σPI. Then, we add an additional one-dimensional layer that now constitutes the output of the system. In this way, by removing the processing unit, we get back the original input-output system, allowing a direct comparison between the two. We study the behavior of the mutual information as we change the processing-output coupling, gOP, and interaction heterogeneity, σOP.
In Fig. 2h–j we show that, with a fast processing unit, for sufficiently large gOP the mutual information \({I}_{IO}^{{{\rm{fp}}}}\) is larger than that of the corresponding input-output system. The coupling value for which such crossing takes place decreases with σOP, hinting at the fact that a system with either large gOP or σOP can outperform its counterpart without processing. On the other hand, this is not possible with a slow processing unit, for which the presence of a one-dimensional processing layer seems to be detrimental, or immaterial at best (Fig. 2k–m). Indeed, we find that \({I}_{IO}^{{{\rm{sp}}}}\le {I}_{IO}^{{{\rm{np}}}}\), approaching this value with nonlinear summation at large gOP. Once more, whether nonlinear integration or summation leads to more accurate input encoding primarily depends on gOP and σOP.
Enhanced information by nonlinear integration
While a one-dimensional processing can be advantageous or detrimental depending on its timescale, increasing its dimensionality can modify this picture. We now explore this direction, starting with the case in which processing and input units have the same large dimension MI = MP = 50. In Fig. 3, we compare the mutual information between input and output for the case of nonlinear summation, \({I}_{IO}^{{{\rm{ns}}}}\), and integration, \({I}_{IO}^{{{\rm{int}}}}\), for both slow and fast processing. We first take σPI = σOP = 1 to study the effects of the couplings gPI and gOP. Figure 3a–c and d–f, respectively show that, independently of the internal timescale ordering, nonlinear integration leads to higher mutual information with respect to summation. Notice that, as in the previous one-dimensional case, for both nonlinear schemes we find that a fast processing unit systematically outperforms a slow one (see Fig. 3g–h). Therefore, from now on, we will only consider the fast-processing scenario. Furthermore, in Fig. 3i–j, we show that \({I}_{IO}^{{{\rm{int}}}}\) displays a nontrivial peak as a function of gPI, whereas \({I}_{IO}^{{{\rm{ns}}}}\) saturates to lower values. This observation signals the existence of an optimal value of input-output coupling that helps maximize the encoding performance with (fast) high-dimensional processing units and nonlinear integration.
a–c Mutual information between input and output IIO in a system with a slow processing unit as a function of the coupling strengths gPI and gOP for nonlinear integration (superscript “int'', orange), nonlinear summation (superscript “ns'', teal), and their difference in (c). Nonlinear integration produces higher information than nonlinear summation. The sketch on the left of (a) indicates a system with a slow processing unit, as explained in Fig. 1a. The sketch on top of the panels represent different computational schemes. d–f Same, but for a fast processing unit, with the sketch on the left of (d) indicating a system with a fast processing unit (see Fig. 1b). g–h Difference of mutual information between the fast and slow processing scenarios for nonlinear summation and integration. IIO is systematically higher with a fast processing unit. This effect is particularly relevant for an activation function implementing nonlinear integration. The sketch on top graphically represents the difference between the two schemes shown in here. i, j Mutual information for nonlinear summation and integration at fixed σOP = 1. Integration always outperforms nonlinear summation, and displays a nontrivial peak of IIO at intermediate values of gPI. k, l Same as (i–j) but for σOP = 10. The situation is reversed in this case, with nonlinear summation leading to a larger mutual information. In this figure, unless otherwise specified, the standard deviations of the interaction matrices are σPI = σOP = 1, σI = σP = 0.9, and the input and processing dimensions are MI = MP = 50. Results are obtained by averaging over 103 realizations of the random interaction matrices.
So far, we have focused on the case of small heterogeneity of processing-output interactions, where nonlinear integration displays a computational advantage. However, if we keep both MP and MI fixed, the situation is reversed at larger σOP and large gPI (Fig. 3k, l). In this regime, \({I}_{IO}^{{{\rm{ns}}}}\) saturates at values that are larger than the peak of \({I}_{IO}^{{{\rm{int}}}}\). This suggests once more the presence of a nontrivial interplay between the two schemes as a function of interaction heterogeneity and coupling strengths.
Crucially, even in these strong-coupling and large-heterogeneity regimes, the computational advantage of nonlinear integration is restored at sufficiently large processing sizes (Fig. 4a–d). Intuitively, for small MP, this may be due to large elements of the interaction matrices affecting only a few terms in the nonlinear summation scheme. On the other hand, the same elements may push nonlinear integration into the saturation regime if not balanced by any opposite signals, therefore masking the other interactions. Overall, we find that nonlinear summation provides higher mutual information at small MP, and with large couplings and heterogeneity, but this effect becomes less and less prominent and eventually disappears as MP increases (Fig. 4e). Importantly, this effect depends on the sparsity of the connections between the units. In this case, we rewrite Eq. (3) as
where \({C}_{\mu \nu }^{i}\) is the number of connections from unit ν to the i-th node of unit μ. In Fig. 4e, we show that increasing this sparsity by reducing the probability of connection between the units, punit, is qualitatively equivalent to effectively reducing the processing unit dimension, as expected. Hence, for fixed MP, a system with sparser inter-unit connections will favor nonlinear summation, in line with our previous considerations. Furthermore, our results are qualitatively robust with respect to changes in the processing unit topology. In particular, in Fig. 4f, we consider a Barabasi-Albert topology with Gaussian weights for \({\hat{A}}_{P}\), to introduce a processing structure with hubs and a nontrivial degree distribution. Although this sparser intra-unit topology tends to favor nonlinear integration, we find in general the same interplay between the two nonlinear schemes as in the fully connected case. In the Supplementary Note 4, we show that these results remain robust in the presence of other topologies of the processing unit, as well as in the case where punit is not constant but varies across nodes. While our results remain limited to a set of paradigmatic topologies, the observed robustness seems to indicate the fundamental role of key topological parameters, such as the degree of the processing layer and the number of inter-layer connections, in determining the processing performance of complex nonlinear systems.
a, b Difference between the mutual information between the input and the output IIO for nonlinear integration and nonlinear summation as a function of the coupling between the input and the processing gPI for two different choices of parameters. In a fast-processing system, the advantage of nonlinear integration depends on the interplay between the dimensionality of the processing unit, MP, and the heterogeneity of \({\hat{A}}_{OP}\), quantified by σOP. For small values (σOP = 1), we always find \({I}_{IO}^{{{\rm{int}}}} > {I}_{IO}^{{{\rm{ns}}}}\). c, d At large heterogeneity (σOP = 10), instead, such an advantage is achieved only for large enough MP, and depends on both couplings gPI and gOP. The sketches next to b and d represent nonlinear integration (orange) and nonlinear summation (teal), and the intensity of the colorbar quantifies the advantage of one over the other. e Difference of mutual information between nonlinear integration and summation as a function of coupling strengths gPI and gOP, for different processing dimensions and sparsity of inter-unit couplings. We are considering a fully connected processing unit. In general, increasing the processing dimension MP determines whether nonlinear integration or nonlinear summation leads to a higher mutual information between the input and the output. If MP is small, \({I}_{IO}^{{{\rm{ns}}}}\) is typically higher in a strong-coupling regime, whereas \({I}_{IO}^{{{\rm{int}}}} > {I}_{IO}^{{{\rm{ns}}}}\) at large MP. Increasing the sparsity of the connections between units effectively reduces the processing dimension (here, we set the probability of a connection between nodes of different units to punit = 0.5). f Same as (e), but changing the topology of the processing unit. We observe that this modification does not qualitatively affect the results, although more sparse processing networks favor nonlinear integration. In this Figure, the standard deviations of the interaction matrices are σPI = 1, σI = σP = 0.9, the input dimension is MI = 50, and the interaction heterogeneity is σOP = 10 in (e, f). Results are obtained by averaging over 103 realizations of the random interaction matrices.
Interplay between input and processing dimensionality
The results in Fig. 4 suggest that the size of the processing unit deeply affects the information between input and output dofs. To further explore this effect, we now focus on fast nonlinear integration and study the interplay between input and processing dimensionalities. In Fig. 5a, we show that, at a given MI, there exists an optimal value of \({M}_{P}={M}_{P}^{* }\) that maximizes \({I}_{IO}^{{{\rm{int}}}}\). \({M}_{P}^{* }\) decreases by increasing the input size, so that smaller inputs are optimally processed by large processing units. This effect emerges at sufficiently strong coupling regimes, as we show in Fig. 5b and in the Supplementary Note 4. We also find that the optimal processing dimension \({M}_{P}^{* }\) increases with σOP (Fig. 5c).
a Mutual information between input and output IIO for nonlinear integration and a fast processing unit. For a given input dimension MI, there exists an optimal processing dimensionality \({M}_{P}^{* }\) that maximizes the IIO (dashed lines). The coupling strength between processing and output is gOP = 10. b, c Optimal processing dimensionality as a function of the input dimension MI for two different values of interaction heterogeneity σOP. Error bars represent one standard deviation over realizations of the random interaction matrices. The emergence of an optimal processing dimension is more evident at sufficiently strong couplings and is characterized by a decrease of \({M}_{P}^{* }\) as MI increases. The optimal processing dimension \({M}_{P}^{* }\) also increases with σOP. d, e Same as (a) for smaller gOP and different interaction heterogeneity σOP. At intermediate values (gOP = 5) and low heterogeneity (σOP = 1), \({I}_{IO}^{{{\rm{int}}}}\) is higher for smaller input dimensions and \({M}_{P}^{* }\) stays almost unchanged independently of MI. Increasing σOP to 2, large processing units become favored at small input dimensions and vice-versa. In this Figure, the standard deviations of the interaction matrices are σPI = 1, σI = σP = 0.9, and the input-processing coupling is gPI = 10. Results are obtained by averaging over 2 × 104 realizations of the random interaction matrices.
In particular, for small gOP and σOP, information is typically higher for smaller input sizes (Fig. 5d), and the optimal processing dimension \({M}_{P}^{* }\) remains small at any MI. Slightly increasing the interaction heterogeneity σOP while keeping fixed gOP drastically alters \({I}_{IO}^{{{\rm{int}}}}\) (Fig. 5e), revealing the nontrivial interplay between MI and \({M}_{P}^{* }\). Heuristically, this provides evidence that a nonlinear embedding of a low-dimensional input in a higher-dimensional processing space favors information encoding. On the contrary, \({I}_{IO}^{{{\rm{int}}}}\) is maximal at small MP for large MI, so that information processing of high-dimensional inputs is favored by a nonlinear compression of the input in a lower-dimensional processing space. This behavior may reveal quantitative insights into optimal operation regimes and diverse strategies to encode information.
Emergent output bimodality
Nonlinear integration may also be advantageous from a dynamical perspective. In Fig. 6a, we consider the case of a fast processing unit and compare the bimodality of the output pdf obtained for nonlinear summation and integration by means of the Staple’s bimodality coefficient (see “Methods”), respectively denoted by \({b}_{O}^{{{\rm{ns}}}}\) and \({b}_{O}^{{{\rm{int}}}}\). We show that integration is associated with higher bimodality coefficients, i.e., more pronounced bimodality, thus enabling more accurate input discrimination in the output distribution. We note, however, that this effect is purely dynamic, as higher bimodality does not always imply larger input-output mutual information (see for instance Fig. 2d, e and Supplementary Note 5). The presence of a stochastic Gaussian input and random interaction matrices makes it particularly hard to pinpoint the input features that the system is discriminating. Crucially, however, this emergent bimodality may be tuned by introducing suitable processing biases in how the signal of certain nodes is encoded. We can add this ingredient in Eq. (3) by applying the substitution \({x}_{\nu }^{j}\to {x}_{\nu }^{j}-{\theta }_{\nu }^{j}\), where the bias is introduced as:
where the superscript “b” indicates the presence of the bias. Note that θν can be, in principle, different, for each unit. In Fig. 6b–e, we consider a system in the presence of a fast processing unit and include the presence of a random bias θν whose elements are drawn from \({{\mathcal{N}}}(0,1)\) for each ν. By comparing the same realization of random interaction matrices \({\hat{A}}_{\mu }\) and \({\hat{A}}_{\mu \nu }\) (as discussed for Fig. 3) without (Fig. 6b, c) and with (Fig. 6d, e) θν, we find that the presence of a bias triggers an unbalance in the output bimodality. Notice that this emergent imbalance can be different between summation and integration due to the intrinsic randomness of all system’s components, as shown in Fig. 6. Thus, nonlinear integration enables more pronounced tunable output bimodality, allowing the system to statistically select one of the two emerging modes.
a Bimodality coefficient of the output probability distribution, bO, as a function of input and processing dimensions, respectively MI and MP, in a system with a fast processing unit, as represented by the sketch next to the panel. We indicate the case of nonlinear summation in teal and with the superscript “ns'', and the one of nonlinear integration in orange with the superscript “int''. Here, gPI = gOP = 10, σOP = σPI = 1, and σI = σP = 0.9. Results are averaged over 103 realization of random matrices. Both activation functions, implementing nonlinear summation or nonlinear integration, may lead to a bimodal output distribution, particularly for smaller dimensions. Notwithstanding, on average nonlinear integration enhances bimodality in all parameter ranges explored. b, c One-dimensional output distributions for MI = 5 and MP = 10 in the case of nonlinear summation (teal) and nonlinear integration (orange). d, e Same random matrix realization as in (b, c), but after introducing a random bias in the processing nonlinearity. The sketch next to (b, c) represents the two computational schemes considered. The bias allows for tuning the emergent output bimodality, allowing the system to select one of the modes. Importantly, due to its randomness, the bias may have opposite effects depending on the type of activation function.
Discussion
In this work, we studied nonlinear processing through different activation functions in a paradigmatic information-processing system. By leveraging the presence of multiple timescales, we analytically obtained the joint distribution of the system and computed the input-output mutual information. We compared two nonlinear processing schemes, summation and integration, which have been employed in several contexts. In systems implementing nonlinear summation, inputs are first nonlinearly transformed and then averaged, while inputs are first averaged and then transformed in systems supporting nonlinear integration. We showed that fast processing units outperform slow-processing ones, leading to higher input-output mutual information. Furthermore, we found that coupling strengths, interaction heterogeneity, and processing dimensionality— modulated by the number of connections between the units—determine whether nonlinear integration or summation is beneficial to the information shared between input and output units. Finally, we highlighted a nontrivial tradeoff between input and processing dimensions emerging in strong-coupling regimes.
Overall, our paradigmatic approach allowed us to investigate the emergence of accurate encoding in information-processing architectures. In particular, we highlighted the advantages of nonlinear integration in large multiscale systems and nonlinear summation in smaller ones. The information-theoretic differences between these two schemes may help in understanding optimal coding strategies and why certain biological systems implement integration or summation. The fact that nonlinear summation performs better with fewer degrees of freedom and more heterogeneous couplings may be especially relevant for those biochemical systems where a limited number of chemical species support signal propagation. Conversely, large neuronal networks may perform better by integrating incoming signals, with the specific topology of the interaction network determining the paths along which signals are dynamically propagated8,9,69. In this direction, a deeper understanding of the interplay between the nonlinearities and specific topologies is needed and has not been explored in this work, which has solely focused on the case of random interactions. A fundamental extension would be to use real-world networks as backbone structures to build processing layers. On the one hand, these investigations might shed light on the ability of biological systems to process information; on the other hand, they might be informative for designing bio-inspired networks with optimal processing abilities. However, the design of information-processing systems faces the challenge of optimizing both nonlinear functions and network structure to implement given target functions. This goal will necessarily require the development of specific training algorithms that work for large stochastic systems. Similarly, studying scenarios with several output nodes is currently out of reach, since estimating the output entropy and mutual information in high dimensions is not feasible with the available numerical estimators.
Future works will need to consider other types of activation functions, evaluating their performances in terms of input-output information and the relative timescales between the units. Furthermore, it will be important to consider systems where different units implement different activation functions, allowing for more heterogeneity in terms of computational capabilities. In these scenarios, it will be interesting to systematically investigate how input, processing, and output dimensions shape information in specific real-world systems. Along this line, architectures with several processing units, possibly acting on a diverse range of timescales, may be necessary to deal with bio-inspired models and more structured inputs. This setting will also enable a natural implementation of multiple tasks whose presence might change the definition of processing performance, framing it in the context of decision-making and possibly connecting it with recent advances at the interface of information processing, decisions, and large language models across different scales70. Our work will stand as a fundamental step for these explorations, unraveling how different types of dynamical nonlinearities underlie information and computation in real-world systems.
Methods
Exact solution for fast processing units
For a system with a fast processing unit, the timescale separation τI ≫ τO ≫ τP leads to the steady-state or stationary joint probability distribution (i.e., the solution obeying \({\partial }_{t}{p}_{IPO}^{{{\rm{st}}}}=0\))
where we omitted the superscript “st” on the l.h.s. for brevity, and
as we show explicitly in the Supplementary Note 4. The covariance matrices obey their respective Lyapunov equations, e.g., \({\hat{A}}_{I}{\hat{\Sigma }}_{I}+{\hat{\Sigma }}_{I}{\hat{A}}_{I}^{T}=2{\hat{D}}_{I}\), with \({\hat{D}}_{I}={{\rm{diag}}}\left({D}_{I}^{1},\ldots ,{D}_{I}^{{M}_{I}}\right)\), and similarly for the processing and output unit. Importantly, provided that the deterministic system is stable—i.e., that the eigenvalues of \({\hat{A}}_{\mu }\) have all positive real parts—these solutions exist and are well-defined63. Since we take \({A}_{\mu }^{ii}=1\), when interactions are randomly distributed as \({{\mathcal{N}}}(0,{\sigma }_{\mu }/\sqrt{{M}_{\mu }})\), in the large Mμ limit, the μ-th unit is stable if σμ < 167. The input-dependent mean of the processing, mP∣I(xI) is defined as in Eq. (8). Instead, the output distribution obeys an effective operator whose shape depends on whether nonlinear integration or nonlinear summation is employed. We find that, for nonlinear summation, the effective mean is given by
with
where we introduced the functions
For nonlinear integration, the calculations are more involved. As reported in the main text,
where
and
We present the detailed derivation of all these expressions in Supplementary Note 4.
Exact solution for slow processing units
For a system with a slow processing unit, the timescale separation τI ≫ τP ≫ τO leads to the steady-state joint probability distribution
again omitting the superscript “st” on the l.h.s, with
as we show explicitly in the Supplementary Note 4. With respect to the previous case, the crucial difference is that now \({p}_{O| P}^{{{\rm{st}}}}\) is conditioned on the processing rather than on the input, due to the timescale structure. Furthermore, the input-dependent mean of the processing, mP∣I(xI) (see Eq. (8)) has the same form as the processing-dependent mean of the output, mO∣P(xP). We present the detailed derivation of all these expressions in the Supplementary Note 4.
Direct input–output connections
We consider here the case of a system with only an input unit, xI, and an output unit, xO, with MO nodes. In the presence of a slow input, τI ≫ τO, the steady-state or stationary solution of the Fokker-Planck equation for the joint probability distribution, \({\partial }_{t}{p}_{IO}^{{{\rm{st}}}}=0\), reads
omitting “st” on the l.h.s., where
and
for all i = 1, …, MO, as we explicitly show in the Supplementary Note 3.
Exact sampling scheme for fast processing
In general, although our approach allows us to factorize the joint distribution \({p}_{IPO}^{{{\rm{st}}}}\) into a product of Gaussian distributions for all timescale orderings, the input-output distribution, i.e., the one obtained from the marginalization over the processing dofs, remains highly nonlinear due to the nontrivial dependencies in the means of each Gaussian factor. However, our factorization allows for its efficient sampling, as all the nonlinearities appear as conditional dependencies. For a fast processing unit, for instance, we have
so that
is easy to compute. Thus, we only need to evaluate HO numerically to estimate IIO = HO − hO∣I. The procedure for Nsam samples is as follows:
-
1.
sample \({\{{{{\bf{x}}}}_{I}\}}_{i = 1}^{{N}_{{{\rm{sam}}}}}\) from the independent Gaussian distribution of the input;
-
2.
compute the means \({{{\bf{m}}}}_{O| I}({\{{{{\bf{x}}}}_{I}\}}_{i})\) for each sample i;
-
3.
for all i, sample xO from the multivariate Gaussian with covariance \({\hat{\Sigma }}_{O}\) and means \({{{\bf{m}}}}_{O| I}({\{{{{\bf{x}}}}_{I}\}}_{i})\).
Then, the entropy HO of the output distribution can be estimated from the samples \({\{{{{\bf{x}}}}_{O}\}}_{i}\) (see also Supplementary Note 4).
Exact sampling scheme for slow processing
In the case of a slow processing unit, due to the timescale ordering, the conditional dependencies are crucially different. The input-output distribution,
cannot be easily computed. Thus, the entropy of the conditional distribution hO∣I(xI) is not known analytically. To address this issue, we exploit the fact that we can efficiently sample \({p}_{O| I}^{{{\rm{st}}}}\), allowing us to estimate directly the conditional entropy \({H}_{O| I}={\langle {h}_{O| I}\rangle }_{I}\) with importance sampling. We detail below the sampling steps for a fixed number of input samples Nsam,I and Nsam output samples per input:
-
1.
sample a fixed input \({{{\bf{x}}}}_{I}^{(i)} \sim {{{\mathcal{N}}}}_{I}(0,{\hat{\Sigma }}_{I})\) for i = 1, …, Nsam,I;
-
2.
for each input sample \({{{\bf{x}}}}_{I}^{(i)}\), compute \({{{\bf{m}}}}_{P| I}\left({{{\bf{x}}}}_{I}^{(i)}\right)\), and extract the samples \({{{\bf{x}}}}_{P}^{(i,j)}\) from \({{{\mathcal{N}}}}_{P}\left({{{\bf{m}}}}_{P| I}\left({{{\bf{x}}}}_{I}^{(i)}\right),{\hat{\Sigma }}_{P}\right)\) for j = 1, …, Nsam;
-
3.
for each processing sample \({{{\bf{x}}}}_{P}^{(i,j)}\), compute the mean \({{{\bf{m}}}}_{O| P}\left({{{\bf{x}}}}_{P}^{(i,j)}\right)\) and extract the corresponding output \({{{\bf{x}}}}_{O}^{(i,j)}\) from \({{{\mathcal{N}}}}_{O}\left({{{\bf{m}}}}_{O| P}\left({{{\bf{x}}}}_{P}^{(i,j)}\right),{\hat{\Sigma }}_{O}\right)\);
-
4.
for each input sample \({{{\bf{x}}}}_{I}^{(i)}\), estimate the entropy \({h}_{O| I}\left({{{\bf{x}}}}_{I}^{(i)}\right)\) of the conditional distribution \({p}_{O| I}^{{{\rm{st}}}}\) from the output samples \({\{{{{\bf{x}}}}_{O}\}}_{i,j}\);
-
5.
estimate the conditional entropy HO∣I via importance sampling.
Then, as in the case of fast processing, we can estimate the entropy HO from the output samples, and then the mutual information IIO = HO − HO∣I (see also the Supplementary Note 2).
Measures of bimodality
We measure the bimodality of the output distribution \({p}_{O}^{{{\rm{st}}}}\) by computing Sarle’s bimodality coefficient, defined as:
where nsamples is the number of samples at hand, q(x) = 3(n−1)2/[(n − 2)(n − 3)], s is the sample skewness,
and κ is the excess kurtosis,
Code availability
The code employed to perform simulations and compute the input-output mutual information is available on Zenodo (https://doi.org/10.5281/zenodo.15324149).
References
Tkačik, G. & Bialek, W. Information processing in living systems. Annu. Rev. Condens. Matter Phys. 7, 89–117 (2016).
Barkai, N. & Leibler, S. Robustness in simple biochemical networks. Nature 387, 913–917 (1997).
Aoki, S. K. et al. A universal biomolecular integral feedback controller for robust perfect adaptation. Nature 570, 533–537 (2019).
Flatt, S., Busiello, D. M., Zamuner, S. & De Los Rios, P. Abc transporters are billion-year-old Maxwell demons. Commun. Phys. 6, 205 (2023).
Mochizuki, A. An analytical study of the number of steady states in gene regulatory networks. J. Theor. Biol. 236, 291–310 (2005).
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. 32, 381–386 (2014).
Klosin, A. et al. Phase separation provides a mechanism to reduce noise in cells. Science 367, 464–468 (2020).
Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V. Computation through neural population dynamics. Annu. Rev. Neurosci. 43, 249–275 (2020).
Dubreuil, A., Valente, A., Beiran, M., Mastrogiuseppe, F. & Ostojic, S. The role of population structure in computations through neural dynamics. Nat. Neurosci. 25, 783–794 (2022).
Jordan, J. D., Landau, E. M. & Iyengar, R. Signaling networks: the origins of cellular multitasking. Cell 103, 193–200 (2000).
Cheong, R., Rhee, A., Wang, C. J., Nemenman, I. & Levchenko, A. Information transduction capacity of noisy biochemical signaling networks. science 334, 354–358 (2011).
Szandała, T. Review and comparison of commonly used activation functions for deep neural networks. Bio-Inspired Neurocomputing 203–224 (Springer, 2021).
Karlik, B. & Olgac, A. V. Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1, 111–122 (2011).
Apicella, A., Donnarumma, F., Isgrò, F. & Prevete, R. A survey on modern trainable activation functions. Neural Netw. 138, 14–32 (2021).
Nwankpa, C., Ijomah, W., Gachagan, A. & Marshall, S. Activation functions: comparison of trends in practice and research for deep learning. Preprint at https://doi.org/10.48550/arXiv.1811.03378 (2018).
Dack, A., Qureshi, B., Ouldridge, T. E. & Plesa, T. Recurrent neural chemical reaction networks that approximate arbitrary dynamics. Preprint at https://doi.org/10.48550/arXiv.2406.03456 (2024).
Hayou, S., Doucet, A. & Rousseau, J. On the impact of the activation function on deep neural networks training. In Chaudhuri, K. & Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, vol. 97 of Proceedings of Machine Learning Research, https://proceedings.mlr.press/v97/hayou19a.html. 2672–2680 (PMLR, 2019).
Floyd, C., Dinner, A. R., Murugan, A. & Vaikuntanathan, S. Limits on the computational expressivity of non-equilibrium biophysical processes. Nat. Commun. 16, 7184 (2025).
Barzon, G., Busiello, D. M. & Nicoletti, G. Excitation-inhibition balance controls information encoding in neural populations. Phys. Rev. Lett. 134, 068403 (2025).
Cavanagh, S. E., Hunt, L. T. & Kennerley, S. W. A diversity of intrinsic timescales underlie neural computations. Front. Neural Circuits 14, 615626 (2020).
Golesorkhi, M. et al. The brain and its time: intrinsic neural timescales are key for input processing. Commun. Biol. 4, 970 (2021).
Mariani, B. et al. Disentangling the critical signatures of neural activity. Sci. Rep. 12, 10770 (2022).
Parrondo, J. M., Horowitz, J. M. & Sagawa, T. Thermodynamics of information. Nat. Phys. 11, 131–139 (2015).
Nicoletti, G. & Busiello, D. M. Mutual information disentangles interactions from changing environments. Phys. Rev. Lett. 127, 228301 (2021).
Graf, I. R. & Machta, B. B. A bifurcation integrates information from many noisy ion channels and allows for millikelvin thermal sensitivity in the snake pit organ. Proc. Natl Acad. Sci. 121, e2308215121 (2024).
Mattingly, H. H., Kamino, K., Machta, B. B. & Emonet, T. Escherichia coli chemotaxis is information limited. Nat. Phys. 17, 1426–1431 (2021).
Bauer, M. & Bialek, W. Information bottleneck in molecular sensing. PRX Life 1, 023005 (2023).
Nicoletti, G. & Busiello, D. M. Information propagation in multilayer systems with higher-order interactions across timescales. Phys. Rev. X 14, 021007 (2024).
Nicoletti, G. & Busiello, D. M. Information propagation in Gaussian processes on multilayer networks. J. Phys. Complex. 5, 045004 (2024).
Tostevin, F. & Ten Wolde, P. R. Mutual information between input and output trajectories of biochemical networks. Phys. Rev. Lett. 102, 218101 (2009).
Nicoletti, G. & Busiello, D. M. Tuning transduction from hidden observables to optimize information harvesting. Phys. Rev. Lett. 133, 158401 (2024).
Pham, T. M. & Kaneko, K. Dynamical theory for adaptive systems. J. Stat. Mech. Theory Exp. 2024, 113501 (2024).
Moran, J. & Tikhonov, M. Defining coarse-grainability in a model of structured microbial ecosystems. Phys. Rev. X 12, 021038 (2022).
Herron, L., Sartori, P. & Xue, B. Robust retrieval of dynamic sequences through interaction modulation. PRX Life 1, 023012 (2023).
Breuer, D., Timme, M. & Memmesheimer, R.-M. Statistical physics of neural systems with nonadditive dendritic coupling. Phys. Rev. X 4, 011053 (2014).
Clark, D. G. & Abbott, L. Theory of coupled neuronal-synaptic dynamics. Phys. Rev. X 14, 021001 (2024).
Maheswaranathan, N., Williams, A., Golub, M., Ganguli, S. & Sussillo, D. Universality and individuality in neural dynamics across large populations of recurrent networks. Adv. Neural Inf.n Process. Syst. 32 (2019).
Driscoll, L. N., Shenoy, K. & Sussillo, D. Flexible multitask computation in recurrent networks utilizes shared dynamical motifs. Nat. Neurosci. 27, 1349–1363 (2024).
Kadmon, J. & Sompolinsky, H. Transition to chaos in random neuronal networks. Phys. Rev. X 5, 041030 (2015).
Engelken, R., Wolf, F. & Abbott, L. F. Lyapunov spectra of chaotic recurrent neural networks. Phys. Rev. Res. 5, 043044 (2023).
Sanzeni, A., Histed, M. H. & Brunel, N. Response nonlinearities in networks of spiking neurons. PLoS Comput. Biol. 16, e1008165 (2020).
Hennequin, G., Ahmadian, Y., Rubin, D. B., Lengyel, M. & Miller, K. D. The dynamical regime of sensory cortex: stable dynamics around a single stimulus-tuned attractor account for patterns of noise variability. Neuron 98, 846–860 (2018).
Beiran, M. & Ostojic, S. Contrasting the effects of adaptation and synaptic filtering on the timescales of dynamics in recurrent networks. PLoS Comput. Biol. 15, e1006893 (2019).
Muscinelli, S. P., Gerstner, W. & Schwalger, T. How single neuron properties shape chaotic dynamics and signal transmission in random neural networks. PLoS Comput. Biol. 15, e1007122 (2019).
Hadjiabadi, D. et al. Maximally selective single-cell target for circuit control in epilepsy models. Neuron 109, 2556–2572 (2021).
Rajan, K., Harvey, C. D. & Tank, D. W. Recurrent network models of sequence generation and memory. Neuron 90, 128–142 (2016).
Inoue, M. & Kaneko, K. Entangled gene regulatory networks with cooperative expression endow robust adaptive responses to unforeseen environmental changes. Phys. Rev. Res. 3, 033183 (2021).
Matsushita, Y. & Kaneko, K. Homeorhesis in Waddington’s landscape by epigenetic feedback regulation. Phys. Rev. Res. 2, 023083 (2020).
Coussement, L. et al. A transcriptional clock of the human pluripotency transition. bioRxiv 2025–03 (2025).
Miyamoto, T., Furusawa, C. & Kaneko, K. Pluripotency, differentiation, and reprogramming: a gene expression dynamics model with epigenetic feedback regulation. PLoS Comput. Biol. 11, e1004476 (2015).
Yan, J. et al. Kinetic uncertainty relations for the control of stochastic reaction networks. Phys. Rev. Lett. 123, 108101 (2019).
De Los Rios, P. & Barducci, A. Hsp70 chaperones are non-equilibrium machines that achieve ultra-affinity by energy consumption. Elife 3, e02218 (2014).
De Domenico, M. et al. Mathematical formulation of multilayer networks. Phys. Rev. X 3, 041022 (2013).
Ghavasieh, A., Nicolini, C. & De Domenico, M. Statistical physics of complex information dynamics. Phys. Rev. E 102, 052304 (2020).
Ma, W., Trusina, A., El-Samad, H., Lim, W. A. & Tang, C. Defining network topologies that can achieve biochemical adaptation. Cell 138, 760–773 (2009).
Rahi, S. J. et al. Oscillatory stimuli differentiate adapting circuit topologies. Nat. methods 14, 1010–1016 (2017).
Yi, T.-M., Huang, Y., Simon, M. I. & Doyle, J. Robust perfect adaptation in bacterial chemotaxis through integral feedback control. Proc. Natl Acad. Sci. 97, 4649–4653 (2000).
Cover, T. M. Elements of Information Theory (John Wiley & Sons, 1999).
Sompolinsky, H., Crisanti, A. & Sommers, H.-J. Chaos in random neural networks. Phys. Rev. Lett. 61, 259 (1988).
Lukoševičius, M. & Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009).
Tanaka, G. et al. Exploiting heterogeneous units for reservoir computing with simple architecture. In Proc. 23rd International Conference on Neural Information Processing, ICONIP 2016, Kyoto, Japan, 16–21 October 2016, Proceedings, Part I 23, 187–194 (Springer, 2016).
Malik, Z. K., Hussain, A. & Wu, Q. J. Multilayered echo state machine: a novel architecture and algorithm. IEEE Trans. Cybern. 47, 946–959 (2016).
Risken, H. Fokker-Planck Equation (Springer, 1996).
Tubiana, J. & Monasson, R. Emergence of compositional representations in restricted Boltzmann machines. Phys. Rev. Lett. 118, 138301 (2017).
Barbier, J., Krzakala, F., Macris, N., Miolane, L. & Zdeborová, L. Optimal errors and phase transitions in high-dimensional generalized linear models. Proc. Natl Acad. Sci. 116, 5451–5460 (2019).
Mézard, M. Mean-field message-passing equations in the Hopfield model and its generalizations. Phys. Rev. E 95, 022117 (2017).
May, R. M. Will a large complex system be stable? Nature 238, 413–414 (1972).
Vasicek, O. A test for normality based on sample entropy. J. R. Stat. Soc. Ser. B Stat. Methodol. 38, 54–59 (1976).
Ji, P. et al. Signal propagation in complex networks. Phys. Rep. 1017, 1–96 (2023).
Capraro, V., Di Paolo, R., Perc, M. & Pizziol, V. Language-based game theory in the age of artificial intelligence. J. R. Soc. Interface 21, 20230720 (2024).
Acknowledgements
G.N. acknowledges funding provided by the Swiss National Science Foundation through its Grant CRSII5_186422. The authors acknowledge the support of the Munich Institute for Astro-, Particle and BioPhysics (MIAPbP), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC-2094 - 390783311, where this work was first conceived during the MOLINFO workshop. D.M.B. is funded by the program STARS@UNIPD with the project “ActiveInfo.”
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
G.N. and D.M.B. designed the study, performed calculations and numerical simulations, interpreted the results, and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Physics thanks the anonymous reviewers for their contribution to the peer review of this work. [A peer review file is available].
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nicoletti, G., Busiello, D.M. Fast nonlinear integration drives accurate encoding of input information in large multiscale systems. Commun Phys 8, 437 (2025). https://doi.org/10.1038/s42005-025-02339-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s42005-025-02339-z








