Introduction

Decision-making is a fundamental process underlying the dynamics of many complex systems in science and technology. At the macroscopic level, the emergent collective behaviour of these systems is often shaped by decisions made by individual agents at the microscopic scale. Examples span from animal groups coordinating their movements1,2, to autonomous vehicles navigating traffic3, to robotic swarms performing distributed tasks4,5 and decision processes in economics6. Often these processes involve a predefined control goal. Understanding how such microscopic decisions translate into macroscopic behaviour is crucial for the analysis, design and control of such systems7. A versatile, and in many contexts, highly efficient tool to study emergent behaviours on large length and time scales are continuum equations comprising a set of coupled partial different equations (PDEs). These not only allow to elucidate the type of instabilities and emerging dynamical states but can also help to identify relevant mechanisms and parameters ranges in dynamical systems with complex microscopic interaction rules. Yet, the precise translation of microscopic decision rules into continuum descriptions remains a significant challenge in physics and control theory.

Traditional field theories for complex systems often focus on collective behaviours arising from simple, typically reciprocal pairwise interaction rules. Major examples from the physics community are critical phenomena and phase separation in fluid and magnetic systems8,9 and, more recently, active10 and bio-matter11. In the last years, much attention has been devoted to nonreciprocal generalizations12,13 characterized by asymmetrical couplings occurring, e.g., in heterogeneous bacterial systems14,15,16, synthetic colloidal mixtures in nonequilibrium environments17,18, macroscopic predator-prey systems19, systems with vision cones20,21, but also neural22 and social23,24 networks, and in quantum optics25,26,27,28,29. These theories have demonstrated intriguing collective dynamics12,13,30,31,32,33,34,35, and unusual material properties36; however, they do not take into account crucial ingredients of decision-making.

Incorporating decision-making with predefined control objectives introduces distinct challenges: agents must make decisions based on local observations to achieve some desired control goal, a fundamental characteristic of distributed feedback control strategies7. Here, we demonstrate that the decision-making process induces inherent nonreciprocity and non-pairwise couplings into the system dynamics by explicitly incorporating control objectives, e.g. considering the positions of agents relative to their distance from the goal region. This presents challenges for conventional field-theoretic approaches. Here, we propose a framework to address these challenges, using the paradigmatic example of the shepherding control problem, where a group of agents, the herders, must coordinate themselves to control the collective dynamics of a second group of agents, the targets, in a desired way37.

The shepherding task considered here exemplifies key features of distributed control problems: (i) reliance on local observations and (ii) real-time decision-making to achieve global objectives38,39. In our setup, we analyze a representative shepherding task where NH herders must collect and contain NT targets within a pre-assigned goal region, which we assume to be a circle centered at the origin (see Fig. 1). To focus on the fundamental aspects of decision-making, we assume that targets exhibit no intrinsic collective properties, such as cohesion, velocity alignment, or coherent response40, which would otherwise simplify the task41. We model the targets as Brownian particles that experience a physical (and, thus, reciprocal) volume exclusion between themselves (and with the herders) through soft repulsion within a distance σ, and are repelled by herders within a distance λ (see Methods).

Fig. 1: Schematic illustration of the shepherding task.
figure 1

a Initial random configuration of herders (dogs) and targets (sheep), with herders and targets randomly distributed in space and the goal region marked by a green star. b Final configuration showing successful containment of targets around the goal region (green star) by the coordinated action of the herders.

The key challenge in solving the shepherding control problem lies in the distributed decision-making that determines the dynamics of the herders that must achieve a collective task through local observations. Each herder faces two fundamental decisions that mirror challenges in many real-world control scenarios7,42: what target(s) to influence, and how to influence them. We incorporate these decision-making elements into the dynamics of each herder i through a control input, denoted ui (see Methods). This control input takes the form of a feedback term that determines the herder’s action based on the relative positions of the observed targets with respect to the goal region.

Following established approaches37,38,39, each herder i selects one target with position \({{{{\bf{T}}}}}_{i}^{*}\) at each time; specifically, it chooses the target furthest from the origin (i.e., the goal region) within its circular sensing region of radius ξ. To facilitate the later derivation of a continuum description that explicitly incorporates decision-making, we employ an approximation of this selection mechanism that was first proposed in previous work by some of the authors37. Specifically, we express the position of the selected target \({{{{\bf{T}}}}}_{i}^{*}\) as a weighted average of the position of the targets within the sensing region of the herder i. The approximation reads

$${{{{\bf{T}}}}}_{i}^{*}=\frac{{\sum }_{a\in {N}_{i,\xi }}{e}^{\gamma (| {{{{\bf{T}}}}}_{a}| -| {{{{\bf{H}}}}}_{i}| )}{{{{\bf{T}}}}}_{a}}{{\sum }_{a\in {N}_{i,\xi }}{e}^{\gamma (| {{{{\bf{T}}}}}_{a}| -| {{{{\bf{H}}}}}_{i}| )}},$$
(1)

where Ni,ξ represents the set of targets within the sensing region of the i-th herder, Ta and Hi are the two-dimensional (2D) Cartesian coordinates of targets and herders respectively, and γ≥0 is a parameter that controls the selection specificity.

We wish to highlight that the selection rule approximation given by Eq. (1) for continuum descriptions was first introduced by Lama et al.37 but its application was limited to stationary target distributions, with no explicit form derived for the resulting continuum dynamics. Our work significantly extends the results presented therein by developing a complete and explicit theoretical framework for dynamic multi-agent systems with decision-making capabilities.

Also, notice that both the selection rule in Eq. (1) and the resulting control input, given in Eq. (2) below, incorporate not only pairwise distances between agents, but also each agent’s position relative to the goal. The resulting control has, therefore, a three-body character. The specific expression in Eq. (1), known as a soft-max function and widely used in neural networks for output selection43, enables continuous tuning of the selection rule: as γ → , it recovers the selection of the furthest target, while for γ → 0, \({{{{\bf{T}}}}}_{i}^{*}\) approaches the center of mass of all observed targets.

Once a target is selected, the herder positions itself at a distance δ behind it to drive it towards the goal. While more sophisticated approaches exist44,45, in our minimal shepherding model the dynamics of herder i is subject to an instantaneous feedback control input that can be expressed as:

$${{{{\bf{u}}}}}_{i}=-\left({{{{\bf{H}}}}}_{i}-{{{{\bf{T}}}}}_{i}^{*}-\delta {\widehat{{{{{\bf{T}}}}}}}^{*}_{i}\right)$$
(2)

where \({\widehat{{{{{\bf{T}}}}}}}^{*}_{i}={{{{\bf{T}}}}}_{i}^{*}/| {{{{\bf{T}}}}}_{i}^{*}|\) is the unit vector pointing from the origin to \({{{{\bf{T}}}}}_{i}^{*}\). If no targets are within the sensing radius of herder i, then ui = 0. This feedback law linearly attracts the herder to the position \({{{{\bf{T}}}}}_{i}^{*}+\delta {\widehat{{{{{\bf{T}}}}}}}^{*}_{i}\), enabling it to push the target towards the origin when δ is smaller than the target’s repulsion range λ. Similar to how γ controls target selection, δ provides continuous control over trajectory planning: when δ = 0, the herder approaches \({{{{\bf{T}}}}}_{i}^{*}\) from an arbitrary direction, generally failing to push it towards the origin, while for δ > 0, the herder systematically drives the target towards the origin. Notice that also the trajectory planning, like the selection rule, has a three-body character, depending on the positions of the herder, the selected target, and the goal region.

These two parameters, γ and δ, thus enable us to continuously tune both aspects of decision-making in our system. As demonstrated later in Fig. 3, for sufficiently large γ and appropriate values of δ, the herders successfully confine the targets around the origin, achieving the shepherding goal. Conversely, when γ = 0 and δ = 0, the system evolves to a homogeneous configuration (on average). As detailed in the Methods, our model includes white uncorrelated Gaussian noise for both herders and targets, as well as a reciprocal soft repulsion term between all agents within a distance σ, representing the agents’ diameter, which we assume equal for herders and targets.

A key feature of our system, which will be fundamental in the derivation of the field equations, is its nonreciprocal nature. The dynamics violates the action-reaction principle of Newtonian mechanics, which is permissible since this principle only needs to hold at the microscopic level, not at the coarse-grained level of the agents17. While nonreciprocal effects have been extensively studied in physics12,46, our model reveals a profound connection between nonreciprocity and decision-making: the very act of making decisions introduces fundamental asymmetries into the dynamics. Not only are targets repelled by herders while herders are attracted to targets, but herders also exhibit decision-making capabilities - selecting targets (γ) and choosing their positions with respect to a goal region (δ) - that have no counterpart in the targets’ dynamics. This intrinsic connection between decision-making and nonreciprocity suggests that nonreciprocal field theories are a natural framework for describing systems with distributed decision-making.

In this work, we show how the continuous tuning of decision-making rules through parameters γ and δ enables the derivation of field equations that capture the macroscopic properties of the shepherding problem. Our analysis reveals that different decision-making strategies, from random to highly selective target choice and from undirected to goal-oriented pushing, emerge in the continuum description. The resulting nonreciprocal field theory is distinctly different from other nonreciprocal scalar theories such as the nonreciprocal Cahn-Hillard (NRCH) theory for phase-separation in asymmetric systems12,30,33 and predator-prey models19. Our approach does not only successfully describes the shepherding dynamics, including the transition from homogeneous to configurations where targets are confined, but also provides a general framework for incorporating decision-making in the presence of a pre-defined control goal into continuum theories of collective behaviour, allowing us to reproduce a variety of collective states. In our system, the pre-defined goal corresponds to a spatially fixed region at the center of the system. This results in an external field that breaks translational invariance: an aspect that, as shown in the paper, is of key importance for the structure of the resulting field equations. The presence of a predefined goal region is indeed a common feature in a variety of distributed control problems47, from bacterial systems to robotic swarm coordination48, crowd management and autonomous transportation systems. As such, our framework opens new perspectives for analyzing and designing distributed control strategies across diverse fields of applications.

Results

A field theory for decision-driven shepherding

Our main result is a first-principles continuum theory that captures the essential physics of shepherding, derived from microscopic rules of agents’ behavior to predict the emergence of macroscopic patterns. Specifically, we derive a mean field framework for shepherding dynamics based on two coupled partial differential equations (PDEs) that describe the spatio-temporal dynamics of the two species of herders and targets. Each species is represented by scalar, conserved density fields ρA, where A = H denotes herders and A = T denotes targets. The dynamics incorporate diffusion, conservative interactions, and, most importantly, decision-making elements. For simplicity, we focus on one-dimensional (1D) motion along the x-direction, though generalization to 2D is straightforward. To lowest order, the equations take the form (see Methods)

$${\partial }_{t}{\rho }^{{{{\rm{T}}}}}=\nabla \cdot \left[{D}^{{{{\rm{T}}}}}({\rho }^{{{{\rm{T}}}}})\nabla {\rho }^{{{{\rm{T}}}}}+{\tilde{k}}^{{{\small{\rm{T}}}}}{\rho }^{{{{\rm{T}}}}}\nabla {\rho }^{{{{\rm{H}}}}}\right]$$
(3)
$${\partial }_{t}{\rho }^{{{{\rm{H}}}}}=\nabla \cdot \left[{D}^{{{{\rm{H}}}}}({\rho }^{{{{\rm{H}}}}})\nabla {\rho }^{{{{\rm{H}}}}}-{v}_{1}(x){\rho }^{{{{\rm{H}}}}}{\rho }^{{{{\rm{T}}}}}-{v}_{2}(x){\rho }^{{{{\rm{H}}}}}\nabla {\rho }^{{{{\rm{T}}}}}\right]$$
(4)

The intraspecies coupling in both equations is described by the renormalized diffusivities DA(ρA) which arise from noise and short-range agent-agent repulsion (see Methods). Additionally, the cross-coupling term with coefficient \({\tilde{k}}^{\small {{\mbox{T}}}} > 0 \) in Eq. (3) combines the effects of both the reciprocal short-range repulsion and the nonreciprocal long-range repulsion exerted by the herders on the targets. This coupling results in the gradient of ρH generating a current for ρT.

The decision-making capability of the herders is captured by the last two terms in Eq. (4) that involve the space-dependent functions v1(xγδ) and v2(xγδ). Examples of these functions are plotted in Fig. 2(a), while their analytical expressions are derived in the Methods. The role of the corresponding terms is sketched in Fig. 2(e–g). The function v2, which multiplies the term ρH ρT, combines two contributions: the short-range reciprocal repulsion between herders and targets and the long-range attraction exerted on a herder by targets at position x. In our parameter regime, v2 remains strictly positive, causing herders to move preferentially towards regions of higher target concentration [as shown moving rightward in Fig. 2(f)].

Fig. 2: Field theory description of shepherding dynamics.
figure 2

a The coupling functions v1 and v2 generating shepherding dynamics; v2 maintains a constant sign reflecting consistent attraction between species, whereas v1 changes sign depending on the position x of the herders with respect to the goal region x = 0, encoding goal-directed behaviour. b Spatiotemporal evolution of the density difference ρT − ρH: starting from a homogeneous distribution, the density profiles evolve and saturate to a shepherding configuration where ρH effectively confines ρT in a bounded region around the origin. (c, d) Steady state values of ρT and ρH for γ = 0, δ = 0. c showing homogeneous distribution and for δ > 0 and γ > 0. d Showing confined configuration. (eg) Representative one-dimensional configurations of targets (sheep) and herders (dog) illustrating the corresponding nonreciprocal couplings in Eq. (4). e The herder observes a symmetric distribution of targets ( ρT = 0), which generates no motion for the herder. f The herder observes an asymmetric distribution of targets ( ρT ≠ 0), and moves towards higher targets concentration; this is captured by the coupling  − v2ρH ρT. In (g), the target distribution is symmetric again ( ρT = 0); however, the herder also possesses the additional information of the position of the goal region (fence) where to collect the targets. The herder will then move so as to complete the task; this is captured by the coupling term  − v1(x)ρTρH in Eq. (4).

In contrast, function v1, which multiplies the bilinear term ρHρT, changes its sign at x = 0, the center of the goal region [indicated by a fence in Fig. 2(g)]. In particular, v1 is negative for x < 0 and positive for x > 0. This change of sign reflects how the direction of chasing depends on the target’s location relative to the goal. We may interpret v1 as a task-dependent speed of herders when a non-zero concentration of targets ρT is observed. The consequence is illustrated in Fig. 2(g). Despite observing a symmetric (homogeneous) distribution of targets, the herder preferentially chases the rightmost target due to its greater distance from the fence, representing the goal region. While our agent-based model yields explicit expressions for v1 and v2 (see Methods), we expect the overall dynamics to remain robust under small changes of these functions, provided their essential properties – particularly their signs and symmetries – are preserved.

Mathematically, the structure of the decision-making terms in Eq. (4), particularly the coupling term v1ρTρH, emerges from the core properties of our agent-based model. These reflect how herders select which target to pursue (controlled by γ) and how they choose their chasing position (determined by δ). In both decisions, herders consider the positions of the targets relative to their own positions and, most importantly, to the location of the goal region. The equations are derived by expanding the soft-max function in Eq. (1) for \({{{{\bf{T}}}}}_{i}^{*}\) in the small γ regime, where γ 1/ξ (see Methods).

In the absence of decision-making (γ = δ = 0), Eq. (3) remains unchanged, while Eq. (4) reduces to

$${\partial }_{t}{\rho }^{{{{\rm{H}}}}}=\nabla \cdot \left[{D}^{{{{\rm{H}}}}}({\rho }^{{{{\rm{H}}}}})\nabla {\rho }^{{{{\rm{H}}}}}-{v}_{2}^{0}{\rho }^{{{{\rm{H}}}}}\nabla {\rho }^{{{{\rm{T}}}}}\right]$$
(5)

where \(v_1(x)\vert_{\gamma,\delta=0}=0\), \(v_2(x)\vert_{\gamma,\delta=0}=v_2^0\) being a positive constant, and the negative sign in front of the last term reflects the attractive nature of the force acting on a herder by a target (see Methods and Supplementary Information Section II.A for further details on the derivation). Even in this simplest case, the PDE system (3)–(5) is nonreciprocal in the sense that the cross-couplings between targets and herders have different signs (or values).

While similar scalar nonreciprocal theories, such as the nonreciprocal Cahn-Hilliard models, have been studied extensively12,30,33 in relation to parity-time symmetry breaking and traveling states, our model exhibits a distinctive feature: for nonzero γ and δ, the current experienced by herders emerges from the control-oriented, three-body coupling that depends on the goal region’s position. This three-body coupling gives rise to the final terms in Eq. (4). These terms have not only space-dependent prefactors, they also involve the bilinear nonreciprocal term v1(x)ρTρH, which is absent in previous theories without goal-oriented decision-making. The asymmetry between these terms and those in the simpler target dynamics in Eq. (3) constitutes a fundamental, source of nonreciprocity in the system, which remains unexplored at the field level.

Field theory captures agent-based behaviour

Our field equations capture the complex spatial patterns and dynamics observed in agent-based simulations, validating the theoretical framework. To provide a microscopic perspective of the emerging shepherding dynamics, we present simulation snapshots from the underlying agent-based model (see Methods) in Fig. 3. We explore several representative values of the two control parameters: γ, which determines the selectivity, and δ, which controls the trajectory planning, focusing on cases where the number of herders equals the number of targets (NH = NT) (for other cases, see Supplementary Information Section I.C). The insets of Fig. 3 show the corresponding time-averaged density profiles relative to the origin, \({\hat{\rho }}^{{{{\rm{T}}}}}(r)\) and \({\hat{\rho }}^{{{{\rm{H}}}}}(r)\), where r is the radial distance, obtained by averaging over the polar angle θ.

Fig. 3: Long-time configurations and density profiles of the agent-based system under varying control parameters.
figure 3

ad System states for different combinations of selectivity (γ) and trajectory planning (δ) parameters, with insets showing time-averaged and angle-averaged density histograms of targets \({\hat{\rho }}^{{{{\rm{T}}}}}(r)\) and herders \({\hat{\rho }}^{{{{\rm{H}}}}}(r)\) as a function of the radial distance from the origin, r. At γ = δ = 0 we observe a disordered state with (on average) homogeneous density distribution (c). The other panels show that, as soon as δ or γ are non zero, inhomogeneous configurations are reached, where herders tend to surround targets. The inset of (b) shows the fit of the target profile to a hyperbolic tangent (black dashed line) and indicates with a shaded grey rectangle the target-herder interface of width Δ, see Methods for further details.

In the absence of decision-making (γ = δ = 0) the system reaches a disordered state [Fig. 3(c)] with homogeneous densities characterized by constant profiles. This behaviour is also observed at the continuum level (see Fig. 2(c)). Note that even in this uncontrolled case, the underlying equations of motion remain nonreciprocal. However, for the system parameters considered here, this nonreciprocity does not lead to overall symmetry breaking (see Supplementary Information Section II.B), but only induces microstructural changes (clustering) on the agent-based level, which are not the focus here and will be the subject of future investigation.

Fig. 4: Measures of shepherding performance at the microscopic and the continuum level.
figure 4

Comparison of the values of χ (fraction of contained targets) and σ/Δ (sharpness of the herder-target interface) in the (γδ) plane for (a, b) the agent-based model and (c, d) the field theory. For small values of γ, where the linearization made to derive the field equations is justified, the level curves show that the results of the agent-based model and of the field theories are consistent. Some differences appear around γ ~ 1/ξ (red dashed line) where the non-linearity of the selection rule (1) becomes non negligible, hence where the linear approximation performed to derive the field equations starts to fail. A notable difference is that the combined maximum value of the two order parameters is reached for large γ and δ for the agent-based model, and for large γ and small δ on a field level.

The situation changes when one or both of the control parameters γ, δ are nonzero: the system develops inhomogeneities, as illustrated by the density profiles in Fig. 3(a), (b), (d). This behaviour reflects a radial symmetry-breaking (relative to the case γ = δ = 0), where the target density \({\hat{\rho }}^{{{{\rm{T}}}}}(r)\) reaches its maximum at the center and continuously decreases to zero radially outward. The most pronounced effect, which is key to obtain in real life scenarios such as in swarm robotics applications, occurs when γ is large compared to the sensing radius and δ is non-zero, enabling herders to effectively guide their targets towards the desired direction. We then observe a concentrated disk of targets around the center, bounded by a sharp agglomeration of herders [cf. Fig. 3(b)].

Interestingly, even when only one control parameter is nonzero, similar but less pronounced inhomogeneities emerge [Fig. 3(a, d)]. When γ = 0 but δ > 0, herders are attracted to the local center of mass of targets (rather than to a specific target), but approach them with  a controlled distance strategy; this leads the herders to position themselves at the back of the barycenter of the observed targets, eventually creating an accumulation of herders around the targets. Conversely, when δ = 0 but γ > 0, herders approach targets from random directions but intelligently select optimal targets; this leads the herders to move towards the observed target with the largest distance from the origin, eventually generating a spatial inhomogeneity.

The continuum theory captures the transition from homogeneous to spatially structured states when decision-making is present. Two exemplary long-time sets of density profiles (from the numerical solution of Eq. (3) and (4)) are shown in Fig. 2(c, d), revealing the emergence of containment as we observed in their agent-based counterparts, despite differences in geometry (1D versus 2D) and specific values of γ and δ. Agent-based simulations in Supplementary Information Section I.D, where we consider a rectangular rather than circular goal region, further substantiate our claim that 1D field theories can capture the emergence of containment of the 2D agent-based dynamics. However, it is important to note that there are significant quantitative differences (for details, see the discussion in the next Paragraph). Indeed, in the absence of decision-making [Fig. 2(c)], the field theory produces a stable homogeneous state (note that temporal instabilities are absent at the parameters considered, see Supplementary Information Section II.B). With decision-making, the spatiotemporal evolution of the density difference ρT(xt) − ρH(xt) in Fig. 2(b) clearly demonstrates the emergence of inhomogeneities and finally the saturation into a steady state with density profiles shown in Fig. 2(d). In this state, the targets are concentrated within the goal region around x = 0, confined by herders accumulating at larger distances, creating a sharp interface between these two spatial domains. This configuration matches the expected outcome of successful shepherding, see Fig. 1.

Importantly, the appearance of the inhomogeneous steady state is a direct consequence of the nonreciprocal coupling arising from decision-making in our field equations. A stable homogeneous steady state solution exists only in the absence of decision-making (i.e., γ = δ = 0, v1 = 0) (see Supplementary Information Section II.B). This solution ceases to be a steady state solution for any nonzero value of γ, δ. Indeed, evaluated at the homogeneous solutions \({\rho }^{{\mbox{A}}}={\rho }_{0}^{{\mbox{A}}}(t)\), Eq. (4) reads

$${\partial }_{t}{\rho }_{0}^{{{{\rm{H}}}}}(t)=-\frac{d}{dx}{v}_{1}(x;\gamma,\delta ){\rho }_{0}^{{{{\rm{T}}}}}{\rho }_{0}^{{{{\rm{H}}}}}.$$
(6)

Thus, when v1 becomes space-dependent (which occurs for nonzero γ, δ), no homogeneous steady state exists. This holds at any order in the gradient expansion of the densities (see Supplementary Information Section II.A).

Phase diagrams of shepherding states

Our analysis clearly reveals the emergence of inhomogeneous steady states whose details depend on two decision-making parameters. To quantify similarities and differences within these steady states, we analyze two key measures. First, we calculate the fraction χ of targets confined within a circle of radius R around the origin, a metric commonly used to evaluate shepherding performance37,45. Here, R corresponds to the position of the interface shown in Figs. 3(b), 2(d) for the continuum case. Second, we evaluate the width of the interfacial region, Δ, where the density \({\hat{\rho }}^{{{{\rm{T}}}}}\) transitions from a non-zero value near the origin to zero at large distances. For details on the definition and calculation of R and Δ in both agent-based and continuum descriptions, see Methods.

Results for both the levels of description for χ and the inverse width (sharpness) σ/Δ (where σ is the repulsion distance representing the agents’ physical size) as functions of the control parameters γ and δ are shown in Fig. 4. Starting from the agent-based results [Fig. 4(a, b)], we see that both quantities approach their minimum value in the limits γ → 0, δ → 0, as expected. The roles of γ and δ are comparable for relatively small values, particularly when γ 1/ξ (red dashed line). Only beyond this value the nonlinearity of the selection rule given in Eq. (1) becomes relevant. The level-sets of χ, represented by δ(γ), show non-monotonic behaviour in γ around this value. The largest values of χ and σ/Δ occur when both γ and δ are large, confirming their roles as order parameters for the shepherding state37.

For comparison, continuum results for χ and σ/Δ are shown in Fig. 4(c–d). As in the agent-based case, χ and σ/Δ reach their minimum values at γ → 0, δ → 0. As γ and δ increase from small values, χ increases monotonically with both parameters. This differs from the agent-based case [Fig. 4(a)], where the level sets of χ, δ(γ), exhibit non-monotonic behaviour around γ = 1/ξ. This difference arises because the continuum equations employ a linearization with respect to γ (see Methods), valid for γ 1/ξ, a constraint absent in the agent-based case.

Notably, the sharpness reaches its maximum values at small values of δ, a counterintuitive result explained by the competing roles of v1 and v2 at the continuum level (see Supplementary Information Section II.C). Hence, it is important to note that while our field theory predicts the absence of homogeneous steady states for nonzero γ and/or δ, and thereby the key qualitative feature of the shepherding dynamics, it does not fully capture all aspects of the spatial distributions observed in agent-based simulations. These discrepancies highlight limitations in our current framework that could be addressed in future refinements, possibly by incorporating higher-order terms in the expansion of the selection rule or modifying the mean-field assumptions.

Design principles for decision-driven collective behaviour in natural and artificial systems

Our field theory framework reveals fundamental design principles for the emergence of decision-driven collective behaviour in multi-agent systems that extend beyond the basic containment task that we focused on so far. Through appropriate choice of the functions v1 and v2 in Eq. (4), we can generate a variety of controlled collective states that could serve as building blocks to capture behaviour observed in both natural and artificial engineered systems. These principles emerge more clearly when we examine how different choices of these coupling functions lead to distinct collective behaviours at long times, as illustrated in Fig. 5.

Fig. 5: Diverse collective state on the field level and at the agent-based level upon varying the control input and, thereby, the decision-making rules.
figure 5

ad Possible choices of the function v1(x), which we set as a square wave with different periods (ac) or a constant (d). eh The corresponding steady state behaviours of the field equations. (i-l) The corresponding steady state behaviours of the agent-based model; see Supplementary Information Section III.B for details on the agent-based and field equations. By suitable choosing v1(x), starting from a homogeneous state, we can achieve (i, e) confinement of the targets, as discussed above, (f, j) expulsion of the targets, e.g. the herders expel the targets out from a region around the origin, g, k static patterns, and (h, l) traveling patterns (see Supplementary Information Video 3 and 4.); in (h) we also plot in shaded dashed lines the densities at a previous time to show that the pattern is moving coherently in time.

The function v1(x), which multiplies the product of density fields ρTρH in Eq. (4), plays a particularly crucial role as it encodes the task-dependent speed of herders. For simplicity, we set v1(x) to a square wave with different periods or phase shifts, while v2(x) is set to a positive constant (that, in this setting, represents an effective attraction of herders by targets). By choosing different spatial profiles for v1(x), we can generate (see Supplementary Videos):

  • Containment: When v1(x) changes sign from positive to negative at x = 0, the targets remain confined to the goal area centered around the origin [Fig. 5(e)];

  • Expulsion: Reversing the sign pattern, targets are expelled from a predefined region [Fig. 5(f)];

  • Static patterns: When v1(x) alternates between positive and negative values periodically in space, stable, stationary stripe-like arrangements of targets are induced, [Fig. 5(g)];

  • Traveling patterns: When v1(x) is set to a constant, non-zero value throughout space \({v}_{1}(x)={\tilde{v}}_{1}\), targets and herders create traveling waves that propagate through the system [Fig. 5(h)].

These examples indicate that already simple manipulations of the control coupling functions that encode basic rules for how agents make decisions, can induce a range of behaviours. For example, changes of sign of v1(x) and the location of the corresponding zeros can be used to realize desired static patterns without changing other coupling parameters. Such patterns are of interest, e.g., in swarm robotics49 and in engineered bacterial populations50,51. Moreover, already the simplest choice for v1(x), namely a constant value, generates a traveling pattern (see Supplementary Information Section III.C for further details). We note that stripe-like and travelling patterns also emerge from other PDEs well established in the physics literature, such as reaction-diffusion systems52 and variants of the nonreciprocal Cahn-Hilliard model studied recently12,30,33.

The key insight presented here is that, in decision-making systems, such diverse behaviours can emerge solely from varying decision-making rules of the herders. For example, we can generate travelling patterns in a parameter regime where such patterns are absent without decision-making (see Supplementary Information Section III.C). This perspective is further substantiated by agent-based simulation results (in the case of a rectangular goal region) shown in the bottom  row of Fig. 5. To generate these results we did not change the microscopic interactions between the two types of agents, but only adapted the decision-making rules of the herders (see Supplementary Information Section III.B), using the same two-step strategy as in the shepherding problem: a proper selection rule, and a proper trajectory planning that finally determines v1(x) (see Supplementary Information Section III.B for further details). The fact that we can create such different collective behaviours by only modifying v1(x) suggests a unifying principle that may be common to both natural and engineered systems. Most notably, these behaviours emerge naturally from our field equations without requiring changes to their fundamental structure, indicating that our framework captures essential features of decision-driven collective behaviour that extend beyond the specific task we considered.

Discussion

Our analysis uncovers a direct connection between microscopic decision-making and collective behaviour: agents’ decision-making capabilities inherently generate both nonreciprocity and, in presence of a pre-defined goal, specific types of coupling in their continuum description. This led us to discover a class of nonreciprocal interactions that has remained unexplored in the related physics literature12,30.

The coupling we derive, which originates from feedback control at the microscopic level (2) and involves a goal that, in essence, breaks translational invariance, manifests as shepherding behaviour at the continuum scale [Fig. 2(b)]. These results provide a mathematical framework connecting individual decision-making to collective behaviour, unifying concepts from physics and control theory. In the field of shepherding, that we have chosen as a paradigmatic example, our continuum framework can be systematically extended and refined: For example, it would be very easy to incorporate other types of microscopic interactions such as, e.g., cohesion or alignment, that would lead to additional couplings of the form [ρA ρB] familiar from Cahn-Hillard-like theories. Furthermore, applications in higher dimensions could reveal additional instabilities. Indeed, given recent experiments in human coordination38, we expect a polar instability induced by oscillatory motion of herders around targets. Another conceptually interesting question concerns changes imposed by considering nonlinear terms in the expansion of the selection rule.

The mathematical framework we developed here, where agents select whom to influence and how based on some control goal, can have implications beyond the specific shepherding task considered. Similar decision-making processes might play a role in biological systems, from marine predators53 and killer whales54 containing fish schools for hunting, to terrestrial cases like chimpanzee packs defending territories55 and ant colonies coordinating excavation56. Similar principles might be present in social systems as well, particularly in opinion dynamics,57 where influencers shape followers’ opinion consensus by targeting certain individuals58, and extend to human coordination38,59, or engineered systems like swarm robotics37,60, where artificial agents can be designed according to this paradigm. However, experimental validation would be needed to determine whether these natural systems actually employ similar mechanisms, an interesting open problem left for future study.

From a more conceptual perspective, our framework addresses an open challenge at the frontier between control theory and complex systems: steering collective behaviour in large-scale multi-agent systems to solve distributed control tasks. While advances have been made in controlling complex networks7, distributed control of mobile agents faces unique challenges. Recent approaches in the field of active matter span optimal control61,62, interaction design20, feedback mechanisms63, and machine learning64,65. Technically, at the agent level, two fundamental obstacles persist: the scaling of the number of dynamical equations with agent numbers and the time-varying nature of interaction matrices66,67. These challenges are particularly acute in shepherding, where the control of the targets is achieved indirectly through herder dynamics68.

Here, we propose to tackle these challenges in complex, multi-agent control systems by deriving a continuum macroscopic description where the functions v1(x) and v2(x) emerge as natural control inputs for shaping collective behaviour. This aligns with growing interest from both physics69 and control-theory communities to derive suitable partial differential equations that capture collective dynamics while incorporating precise control goals70. The resulting PDEs provide insights that hold across different system sizes and are insensitive to specific microscopic initial conditions - which are often impossible to prescribe in real-world applications anyway.

The generality of our framework, is necessarily bounded by the specific hypotheses made in our microscopic model and derivation. Clearly, there are a number of problems in which decision-making occurs without broken translational invariance, such as in consensus finding71, pattern formation in swarms49, and various socio- and economic problems that have been very recently analyzed through the lens of a field theory72,73. Still, our framework, which incorporates decision-making on the continuum level through an appropriately defined feedback current, may have relevance also for these more general decision-making problems.

In the present work we have provided some first examples of how deliberate manipulation of the control input v1 and v2 leads to a variety of collective behaviours. Future research could focus on application of machine learning techniques to actually learn the functions v1 and v2 directly from experimental data of natural or artificial systems exhibiting decision-driven collective behaviour. Similar strategies have been recently used to predict suitable PDEs for active and biological matter69. A related intriguing question is to which extent such learned functions could provide new insight into microscopic decision rules and interactions. The framework could also be extended to incorporate more sophisticated decision-making mechanisms, such as those including memory effects or adaptive learning. Furthermore, investigating the interplay between noise, decision-making, and collective dynamics could reveal dynamical phases and transition mechanisms. Ultimately, such PDEs should predict not only steady-state behaviour but also spatiotemporal patterns and even hierarchical self-organisation as it was recently found in biological communicating systems74. The connection between nonreciprocity and decision-making uncovered here also suggests new ways to design and control collective behaviour in synthetic systems, promising both deeper understanding of decision-driven collective phenomena and new possibilities for engineering complex systems with desired behaviours.

Methods

Agent-based model

The agent-based model underlying the field equations (3) and (4) was chosen as a modified version of the minimal shepherding model considered in earlier work by some of the authors37. It involves all of the essential ingredients for shepherding dynamics and is, at the same time, particularly suitable for the derivation of field equations.

Specifically, we consider a binary mixture of mobile, disk-like agents of type T or H (with 2D Cartesian position vectors Ta and Hi, respectively) moving in a square box of side length L with periodic boundary conditions. We discuss here the case of a circular goal region around the origin; the case with a rectangular goal region, essentially representing a 1D problem, is discussed in Supplementary Information Section I.D.

The positions evolve in time according to overdamped Langevin dynamics, that is, both species are subject to Gaussian white noises representing the effects of (e.g., environmental or heterogeneity-induced) disorder on the coarse-grained scale (similar approaches were taken, e.g., in swarm robotics75,76, collective behaviours in biology77,78, and in crowd dynamics systems79). Further reasoning particularly for herder’s noise is outlined in Supplementary Information Section I.A. We neglect inertial effects as it is often done in the robotics and control literature on shepherding39,80. The targets’ and herders’ dynamics are given by

$${\dot{{{{\bf{T}}}}}}_{a}=\, {k}^{{{{\rm{SR}}}}}\left({\sum}_{b\ne a}^{{N}_{{{{\rm{T}}}}}}{{{{\bf{F}}}}}_{{{{\rm{SR}}}}}^{{{{\rm{rep}}}}}({{{{\bf{T}}}}}_{a},{{{{\bf{T}}}}}_{b})+{\sum}_{i=1}^{{N}_{{{{\rm{H}}}}}}{{{{\bf{F}}}}}_{{{{\rm{SR}}}}}^{{{{\rm{rep}}}}}({{{{\bf{T}}}}}_{a},{{{{\bf{H}}}}}_{i})\right) \\ +{k}^{T}{\sum}_{i=1}^{{N}_{{{{\rm{H}}}}}}{{{{\bf{F}}}}}_{{{{\rm{LR}}}}}^{{{{\rm{rep}}}}}({{{{\bf{T}}}}}_{a},{{{{\bf{H}}}}}_{i})+\sqrt{2D}{{{{\mathcal{N}}}}}_{a}$$
(7)

and

$${\dot{{{{\bf{H}}}}}}_{i}={k}^{{{{\rm{SR}}}}}\left({\sum}_{j\ne i}^{{N}_{{{{\rm{H}}}}}}{{{{\bf{F}}}}}_{{{{\rm{SR}}}}}^{{{{\rm{rep}}}}}({{{{\bf{H}}}}}_{i},{{{{\bf{H}}}}}_{j})+{\sum}_{a=1}^{{N}_{{{{\rm{T}}}}}}{{{{\bf{F}}}}}_{{{{\rm{SR}}}}}^{{{{\rm{rep}}}}}({{{{\bf{H}}}}}_{i},{{{{\bf{T}}}}}_{a})\right)+{k}^{{{{\rm{H}}}}}{{{{\bf{u}}}}}_{i}+\sqrt{2D}{{{{\mathcal{N}}}}}_{i}$$
(8)

respectively, where \({{{{\mathcal{N}}}}}_{a,i}\) are zero-mean white (i.e., delta-correlated) noises with unit variances, and the diffusion constants D are assumed to be equal. All agents interact through symmetric (reciprocal) short-range (SR) repulsive forces \({{{{\bf{F}}}}}_{{{{\rm{SR}}}}}^{{{{\rm{rep}}}}}\) of amplitudes kSR > 0 and range σ, the latter representing the (uniform) particle diameter. In addition, the targets [see (7)] are subject to a long-range (LR) repulsive force \({{{{\bf{F}}}}}_{{{{\rm{LR}}}}}^{{{{\rm{rep}}}}}\) from the herders, with amplitude kT > 0 and range λ > σ. The LR repulsion reflects the tendency of a target to move away from an approaching herder. For both, SR and LR types of repulsion, as is common in the Literature (e.g.,81,82), we assume a linearly decreasing dependence on the distance, such that, e.g.,

$${{{{\bf{F}}}}}_{{{{\rm{LR}}}}}^{{{{\rm{rep}}}}}({{{{\bf{T}}}}}_{a},{{{{\bf{H}}}}}_{i})=\frac{{{{{\bf{d}}}}}_{ai}}{{d}_{ai}}(\lambda -{d}_{ai})\Theta (\lambda -{d}_{ai})$$
(9)

where dai = dai = Ta − Hi, and Θ(x) = 1 if x > 0, 0 otherwise. This linear ansatz is particularly useful for our later derivation of field equations12. For the SR repulsion we have the same functional form of the interaction, but the range λ is replaced by the particle diameter σ.

Beyond SR repulsion, the equation of the herders (8) contains the essential ingredient to realize shepherding, that is, the feedback control input ui defined in Eq. (2). As described in the Introduction, the feedback term represents the (linear) attraction of a herder towards a strategic position behind its selected target. Specifically, this position is given by \({{{{\bf{T}}}}}_{i}^{*}+\delta {\widehat{{{{{\bf{T}}}}}}}^{*}_{i}\), where \({{{{\bf{T}}}}}_{i}^{*}\) [see Eq. (1)] is the position of the selected target (within a sensing radius ξ and with selectivity γ), and \(\delta {\widehat{{{{{\bf{T}}}}}}}^{*}_{i}\) represents a displacement along the target’s position vector (\({\widehat{{{{{\bf{T}}}}}}}^{*}_{i}\) being the unit vector along \({{{{\bf{T}}}}}_{i}^{*}\)). The parameter δ determines how far behind the target the herder should position itself to effectively steer the target’s dynamics towards the origin. (As long as 0 < δ < λ the herder can successfully navigate the selected target towards the origin.)

Equations (7), (8), combined with Eqs. (1) and (2), constitute the fundamental components of our agent-based model. A distinctive feature of these equations is that the coupling between targets and herders cannot be derived from a Hamiltonian and exhibits strong nonreciprocity in two key aspects: first, there is an inherent asymmetry in the interaction forces – targets experience repulsion from herders while herders are attracted to targets, which appears to violate Newton’s third law (though this apparent contradiction is resolved when considering the coarse-grained nature of our description). Second, the selective nature of the herders’ attraction introduces another form of nonreciprocity – herders interact only with specific targets chosen according to their selection rule and guide them toward the goal, whereas targets respond uniformly to all herders without any such selectivity.

To achieve effective confinement of targets (see Fig. 1), we assume that the sensing radius of the herders exceeds that of target-herder repulsion (ξ > λ), consistent with earlier literature37,39,45. Details of the simulations are provided next.

Numerical calculations based on the agent-based model

Technical details

All simulations were carried out in MATLAB using an Euler-Maruyama integration scheme. The simulations are performed in a square periodic box of size L × L with L = 80σ, where σ = 1 defines the particle diameter. The diffusion constant is set to D = 1, which characterizes the strength of noise in the system. Using the characteristic time scale τ = σ2/D, we employ a time step of dt = 1 10−3τ. For initial conditions, we randomly and uniformly distribute NH = 400 herders and NT = 400 targets in the simulation box, resulting in equal initial densities of \({\rho }_{0}^{{{{\rm{T}}}}}={\rho }_{0}^{{{{\rm{H}}}}}=0.0625/{\sigma }^{2}\) for both agent types (see Supplementary Information Section I.B for further details). The system typically reaches stationary states after a transient period of ttran = 200τ, and we run the simulations for a total duration of t = 300τ.

Results for different values of the ratio NH/NT are reported in Supplementary Information Section I.C.

The agent-agent interactions are characterized by three distinct force contributions, each with specific parameters. For the short-range repulsive forces, which act between all agents regardless of type, we choose the particle diameter as σ = 1 and amplitude kSR = 100/τ. The long-range target-herder repulsion is characterized by an interaction radius λ = 2.5σ and amplitude kT = 3/τ. The long-range herder attraction has a sensing radius ξ = 5σ and force amplitude kH = 3/τ. The snapshots in Fig. 5(i–l) are obtained with the same numerical values, and with γ = 5/σ and δ = λ/2; the dynamical equations are presented and discussed in Supplementary Information Section III.A.

Density profiles and order parameters

The radial density profiles \({\hat{\rho }}^{{{{\rm{T}}}}}(r)\) and \({\hat{\rho }}^{{{{\rm{H}}}}}(r)\) are computed by constructing histograms of the spatial distribution of targets and herders, respectively, as a function of their radial distance r from the origin. These profiles are obtained by time-averaging over an interval of tavg = 100τ after the system has reached a steady state.

The radial densities are computed using an array of equispaced points from the origin to L/2 (half the box length), with a step size of Δr = ξ/4. The normalization of these density profiles follows the condition \(\int_{0}^{L/2}\hat{\rho }(r)r\,dr=\chi (L/2)\), where χ(L/2) represents the fraction of targets contained within a circle of radius L/2 centered at the origin. The profiles \({\hat{\rho }}^{{{{\rm{T}}}}}(r)\) and \({\hat{\rho }}^{{{{\rm{H}}}}}(r)\) are obtained by averaging over the polar angle θ, which is appropriate given our focus on the radial structure of the confined state. Final results are obtained by averaging over an ensemble of 48 independent simulations.

We now define the two order parameters χ and σ/Δ considered in Fig. 4. First, to retrieve information on the width of the target-herder interface, Δ, we fit the calculated density profile of the targets to a hyperbolic profile ρf(r). This choice is inspired by theoretical studies on the vapor-liquid interface in phase-separating fluids83. The explicit expression for ρf(r) is given by

$${\rho }_{f}(r)={\rho }_{f}(r;c,R,\Delta )=\frac{c}{2}\left(1-\tanh \left(\frac{x-R}{\Delta }\right)\right)$$
(10)

where c, R, Δ are the fitting parameters representing, respectively, the maximum values of the profile, the position of the interface, and its width. We use the inverse of the width, σ/Δ, as an estimate of the sharpness of the target-herder interface.

The confinement order parameter χ(R) measures the fraction of targets located within a distance R from the origin. In Fig. 4, the reference radius R=R* is chosen as the fitting parameter R of the density profile obtained at maximum values of γ and δ, where the spatial segregation between targets and herders is most distinct. The quantity χ is then calculated by averaging the instantaneous fraction of targets inside a circle of radius R, both over time (tavg = 100τ) and over an ensemble of 48 independent simulations. In Supplementary Information Section I.E we show that different choices of R* do not impact the qualitative behaviour of our results.

Derivation of the field equations

Starting from the agent-based stochastic equations (7), (8) we now derive corresponding partial differential equations (PDEs) describing the spatial-temporal dynamics of the two density fields ρA, with A = H,T. The final structure of these equations is given in Eqs. (3) and (4). As in the Results Section we consider 1-dimensional dynamics along the x-direction; the derivation can be easily generalized to higher dimensions.

Due to the conservation of the number of particles, both density fields, defined as \({\rho }^{{{{\rm{T}}}}({{{\rm{H}}}})}(x,t)=\mathop{\sum }_{a(i)=1}^{{N}^{{{{\rm{T}}}}({{{\rm{H}}}})}}\delta \left(x-{x}_{a(i)}^{{{{\rm{T}}}}({{{\rm{H}}}})}(t)\right)\) where \({x}_{a}^{{\rm{T}}} ={T}_{a} ({x}_{i}^{{\rm{H}}}={H}_{i})\) are the 1D Cartesian coordinates of target agent a (herder agent i), obey a continuity equation, that is,

$${\partial }_{t}{\rho }^{{{{\rm{T}}}}({{{\rm{H}}}})}(x,t)=\nabla \cdot {j}^{{{{\rm{T}}}}({{{\rm{H}}}})}(x,t)+D{\nabla }^{2}{\rho }^{{{{\rm{T}}}}({{{\rm{H}}}})}(x,t)$$
(11)

where = ∂/∂x and the currents jT(H)(xt) take into account the various force terms in Eqs. (7), (8).

The (piecewise linear) pair forces comprising the (SR or LR) repulsion in Eqs. (7) and (8) can be handled straightforwardly12 (see Supplementary Information Section 2.A for details). Here we sketch the main idea. Each of the pair terms is of the form \({F}_{i}=\mathop{\sum }_{j=1}^{M}{F}_{ij}\), where M is the number of interacting neighbors. Under a mean field assumption, Fi gives rise to an average force that, when neglecting nontrivial spatio-temporal correlations, has the form 〈F(xt)〉 = ∫F(xy)ρ(yt) dy where the integral extends over the interaction zone around x. For both the SR and LR forces, F(xy) represents a translationally invariant kernel, i.e., F(xy) = F(x − y). The resulting contribution to the current then has the form  − 〈F〉(xt)ρ(xt). To compute the integrals in 〈F〉, we perform a gradient expansion of ρ(yt) around x, i.e., \(\rho (y)=\rho (x)+\nabla \rho (x)(y-x)+{{{\mathcal{O}}}}(\nabla {\rho }^{2})\). Zeroth-order terms vanish due to translational invariance of the kernel. Keeping only linear terms in the gradients, we obtain the following currents stemming from repulsive forces

$${j}_{{{{\rm{rep}}}}}^{{{{\rm{T}}}}}(x,t)=\, {\rho }^{{{{\rm{T}}}}}(x,t)\left({\alpha }^{{{{\rm{SR}}}}}\nabla {\rho }^{{{{\rm{tot}}}}}(x,t)+{\alpha }^{{{{\rm{LR}}}}}\nabla {\rho }^{{{{\rm{H}}}}}(x,t)\right)\\ {j}_{{{{\rm{rep}}}}}^{{{{\rm{H}}}}}(x,t)=\, {\rho }^{{{{\rm{H}}}}}(x,t){\alpha }^{{{{\rm{SR}}}}}\nabla {\rho }^{{{{\rm{tot}}}}}(x,t) $$
(12)

where ρtot(xt) = ρT(xt) + ρH(xt) is the combined density field, and αSR = kSRσ3/3 and αLR = kTλ3/3 are positive constants related to the respective interaction radii and coupling constants in Eqs. (7), (8).

With careful consideration, the third term on the right-hand side of Eq. (8), which describes herders’ decision-making abilities, can also be handled using a mean-field-like approach and a density gradient expansion. This term depends on two key components defined in previous equations: the feedback control input ui [Eq. (2)] and the position of the selected target \({{{{\bf{T}}}}}_{i}^{*}\) [Eq. (1)]. Unlike typical classical systems where forces have a pairwise character, the resulting force here represents a three-body coupling between the herder’s position, the selected target’s position, and the origin. This leads to a mean-field force 〈F(xt)〉 = ∫F(xy)ρ(yt) dy with a kernel that lacks translational invariance and depends nonlinearly on both coordinates x and y. To proceed, we (i) linearize the (exponential) weight function appearing in the numerator of Eq. (2) (valid for γ 1/ξ), and (ii) neglect the denominator. The kernel can then be expressed as

$$F(x,y)=-{k}^{{{{\rm{H}}}}}(1+\gamma (| y| -| x| ))(x-(y+{{{\rm{sign}}}}(y)\delta ))$$
(13)

This simplified form where the selection rule has been linearized in x and y, still lacks translational invariance. A notable consequence is that even the zeroth-order term in the gradient expansion of ρT(y) around ρT(x) contributes to the average force. Combining this with the first-order (linear) gradient term, we obtain the following expression for finite values of the decision-making parameters γ and δ:

$${j}_{{{{\rm{dm}}}}}^{{{{\rm{H}}}}}(x,t)=-{g}_{1}(x;\gamma,\delta ){\rho }^{{{{\rm{H}}}}}(x,t){\rho }^{{{{\rm{T}}}}}(x,t)-{g}_{2}(x;\gamma,\delta ){\rho }^{{{{\rm{H}}}}}(x,t)\nabla {\rho }^{{{{\rm{T}}}}}(x,t).$$
(14)

Here, g1/2(xγδ) are space-dependent functions whose explicit expression is given by

$${g}_{1}(\gamma,\delta,x)=\\ {{{\rm{sign}}}}(x){k}^{{{{\rm{H}}}}}\left\{\begin{array}{ll}2\delta \xi+\frac{2}{3}\gamma {\xi }^{3},\hfill &\xi \le | x| \le L/2-\xi \hfill \\ \left[\delta \left(1-2\gamma | x| \right)(| x| -\xi )+\delta (| x|+\xi )+\gamma x({\xi }^{2}-{x}^{2})+\frac{2}{3}\gamma | x{| }^{3}\right],&| x| < \xi \\ \left[\delta \left(1-2\gamma (L/2-| x| )\right)((L/2-| x| )-\xi )+\delta ((L/2-| x| )+\xi )\right.&\\ \left.+\gamma (L/2-| x| )({\xi }^{2}-{(L/2-| x| )}^{2})+\frac{2}{3}\gamma {(L/2-| x| )}^{3}\right],&| x| > L/2-\xi \end{array}\right.$$
(15)
$${g}_{2}(\gamma,\delta,x)=\\ {k}^{{{{\rm{H}}}}}\left\{\begin{array}{ll}\frac{2}{3}(\delta \gamma+1){\xi }^{3},\hfill &\xi \le | x| \le L/2-\xi \\ \left[\delta (1-\gamma | x| )({\xi }^{2}-{x}^{2})+\frac{2{\xi }^{3}}{3}(\delta \gamma+1)-\frac{2}{3}(\gamma | x| )({\xi }^{3}-| x{| }^{3})+(\frac{\gamma }{2})({\xi }^{4}-{x}^{4})\right],&| x| < \xi \\ \left[\delta (1-\gamma (L/2-| x| ))({\xi }^{2}-{(L/2-| x| )}^{2})+\frac{2{\xi }^{3}}{3}(\delta \gamma+1)\right.\hfill &\\ \left.-\frac{2}{3}(\gamma (L/2-| x| ))({\xi }^{3}-{(L/2-| x| )}^{3})+(\frac{\gamma }{2})({\xi }^{4}-{(L/2-| x| )}^{4})\right],&| x| > L/2-\xi \hfill \end{array}\right.$$
(16)

Note that the functions v1 and v2 in (4) are scaled versions of g1 and g2; furthermore, to obtain v2, we have to subtract from g2 the (constant) contribution arising from the SR repulsion between herders and targets αSR (see below). For further details on the derivation of \({j}_{{{{\rm{dm}}}}}^{{{{\rm{H}}}}}(x,t)\) see Supplementary Information Section II.A. Figure 2(a) illustrates the behaviour of v1 and v2. The consistently positive values of v2 across all positions demonstrate the attractive force that targets exert on the herders. In contrast to v2, function v1 changes its sign at x = 0, with positive (negative) values for positive (negative) values of x. The resulting current acting on the herders thus depends on where the targets are relative to the goal region, consistent with our decision-making strategy. We stress that such a term is absent in more conventional nonreciprocal field theories of mixtures, e.g., the Cahn-Hillard model12.

In the special case γ = δ = 0 (no target selection and no trajectory planning) these functions reduce to the constants g1 = 0 and \({g}_{2}={g}_{2}^{0}\) with \({g}_{2}^{0}=2{k}^{{\mbox{H}}}{\xi }^{3}/3 > 0\), yielding the current

$${j}_{{{{\rm{dm}}}}}^{{{{\rm{H}}}}}(x,t){| }_{\gamma=\delta=0}=-{g}_{2}^{0}{\rho }^{{{{\rm{H}}}}}(x,t)\nabla {\rho }^{{{{\rm{T}}}}}(x,t)$$
(17)

Note that, even in this uncontrolled case, where herders are unable to make decisions, the resulting field equations (11) for the two density field are nonreciprocal due to the different signs and values of the cross coupling terms, as are the corresponding Langevin equations (7), (8) in this limit.

Finally, inserting the currents (12) and (14) into the continuity equations (11) we obtain the PDEs

$${\partial }_{t}{\rho }^{{{{\rm{T}}}}}=\nabla \cdot \left[{\rho }^{{{{\rm{T}}}}}\left({\alpha }^{{{{\rm{SR}}}}}\nabla {\rho }^{{{{\rm{tot}}}}}+{\alpha }^{{{{\rm{LR}}}}}\nabla {\rho }^{{{{\rm{H}}}}}\right)\right]+D{\nabla }^{2}{\rho }^{{{{\rm{T}}}}}$$
(18)

and

$${\partial }_{t}{\rho }^{{{{\rm{H}}}}}=\nabla \cdot \left[{\rho }^{{{{\rm{H}}}}}{\alpha }^{{{{\rm{SR}}}}}\nabla {\rho }^{{{{\rm{tot}}}}}-{g}_{1}(x;\gamma,\delta ){\rho }^{{{{\rm{T}}}}}{\rho }^{{{{\rm{H}}}}}-{g}_{2}(x;\gamma,\delta ){\rho }^{{{{\rm{H}}}}}\nabla {\rho }^{{{{\rm{T}}}}}\right]+D{\nabla }^{2}{\rho }^{{{{\rm{H}}}}}$$
(19)

where we have omitted the arguments of the density fields. We non-dimensionalize these equations by choosing the Brownian time, τ ≡ σ2/D, as characteristic time scale and the particle diameter, σ, as characteristic length scale. Densities are scaled with the overall density ρ0 = (NT + NH)/L. We further introduce dimensionless and renormalized diffusion coefficients \({D}^{{{{\rm{A}}}}}({\rho }^{{{{\rm{A}}}}})=D\frac{\tau }{{\sigma }^{2}}+{\alpha }^{{{{\rm{SR}}}}}\frac{\tau {\rho }_{0}}{{\sigma }^{2}}{\rho }^{{{{\rm{A}}}}}\) with A = {T, H}, which describe both, the effect of the noise and the intra-species short-range (SR) repulsion. With \({\tilde{k}}^{{{\small{\rm{T}}}}}\equiv ({\alpha }^{{{{\rm{LR}}}}}+{\alpha }^{{{{\rm{SR}}}}})\frac{\tau {\rho }_{0}}{{\sigma }^{2}}\), \({v}_{1}\equiv {g}_{1}\frac{\tau {\rho }_{0}}{\sigma }\), and \({v}_{2}=({g}_{2}-{\alpha }^{{{{\rm{SR}}}}})\frac{\tau {\rho }_{0}}{{\sigma }^{2}}\) (and \({v}_{2}^{0}=({g}_{2}^{0}-{\alpha }^{{{{\rm{SR}}}}})\frac{\tau {\rho }_{0}}{{\sigma }^{2}}\)), we then arrive at the field equations given in (3), (4) and (5).

Numerical calculations in the continuum

All simulations were carried out in MATLAB. The partial differential equations are integrated using a pseudospectral code combined with an operator splitting technique84, which allows to accurately treat the linear part of the spatial operator (the linear diffusion). The non-linear parts of the spatial operator (non reciprocal interactions and reciprocal SR repulsion) are treated as source terms: at every time-step, they are evaluated in real space using the values from the previous step. Then they are transformed into the Fourier space, where they are considered as source terms and integrated using a fourth-order Runge-Kutta time integration scheme.

In the actual calculations, the periodically repeated, one-dimensional segment \([-L/2,L/2]\subset {\mathbb{R}}\) with L = 80σ is divided into N = 201 equispaced grid points. The time step is chosen as dt = 1 10−4τ. The simulations are run for a total duration of t = 150τ, which is sufficient to reach a steady state. For Fig. 5(h), dt = 1 10−3τ, t = 750τ. The initial conditions are assumed to be slightly perturbed disordered states with \({\rho }_{0}^{{{{\rm{T}}}}}={\rho }_{0}^{{{{\rm{H}}}}}={\rho }_{0}/2\), where \({\rho }_{0}={0.5}/\sigma\).

In our continuum model, some parameters can be directly adopted from the considered agent-based simulation parameters. Exceptions are, first, the sensing radius ξ which we set to ξ = 2.5σ to mitigate the effects of neglecting the denominator in Eq. (1). Second, the diffusion constant D is set to D = 5 to increase numerical stability (notice that this changes the time scale τ). Third, the amplitude of the short-range repulsion is set to kSR = 15/τ so that \({v}_{2}^{0} > 0\). To reproduce Fig. 2(b, d) we set γ = 2.5/σ, δ = λ/2. To reproduce Fig. 5(i–l), we set \({v}_{2}(x)={\tilde{v}}_{2}=const\), while v1(x) is described by a square wave with different periods and amplitude \({\tilde{v}}_{1}\) (details are given in the Supplementary Information Section III.A). Specifically, we set \({\tilde{v}}_{1}={k}^{{{{\rm{H}}}}}(2\delta \xi+(2/3)\gamma ({\xi }^{3}))(\sigma /D){\rho }_{0}\) and \({\tilde{v}}_{2}=[{k}^{{{{\rm{H}}}}}(2/3)(\delta \gamma+1)({\xi }^{3})-{\alpha }^{{{{\rm{SR}}}}}]({\rho }_{0}/D)\), which would be the (constant) absolute values of v1(x) and v2(x) in the range ξx≤ L/2 − ξ; within these expressions, we set γ = 2.5/σ, δ = λ/2. Only to reproduce Fig. 5(h) we use kT = 4/τ, while for remaining part of the manuscript, kT = kH = 0.6/τ.

The order parameters are calculated in analogy to the corresponding agent-based calculations, as discussed previously in this Section. For the quantity χ(R*) in Fig. 4(c), we use the definition \(\chi (R^*)=\frac{1}{{{{\mathcal{C}}}}}\int_{-{R}^{*}}^{{R}^{*}}{\rho }^{{{{\rm{T}}}}}(x)\,dx\), with \({{{\mathcal{C}}}}\) being a proper normalization factor, and R* being the value of the fitting parameter R obtained from the steady state target density profile for the maximum value of γ and δ.