Introduction

The brain is a multiscale system that dynamically evolves over time to produce behavior. Dynamical systems modeling, which formalizes how neural activity changes over time using differential equations, has played a central role in neuroscience1,2. Such models have uncovered numerous biophysical mechanisms of the brain, from the early work of Hodgkin and Huxley3 who described how ion currents generate action potentials, to Wilson and Cowan4 who described how excitatory-inhibitory neuronal interactions shape population-level dynamics.

More recent work has extended this approach beyond single cells and neuronal populations to characterize whole-brain dynamics5,6,7,8,9,10. From a dynamical systems perspective, large-scale brain activity unfolds over time as a trajectory within a high-dimensional state space, where each dimension typically represents the activity of a brain region. This state space is shaped by an attractor landscape, or hills and valleys that guide the trajectory of neural activity (Fig. 1). Research has shown that trajectories visit a small number of recurring brain states, each defined by a unique pattern of regional activity and interactions11,12,13,14,15. These brain states function as attractors, or deep valleys in the landscape, where neural activity is most likely to converge16,17,18,19. In turn, neural dynamics are characterized by transitions between or perturbations around these attractors.

Fig. 1: Schematics of the geometry of neural dynamics on the attractor landscape.
Fig. 1: Schematics of the geometry of neural dynamics on the attractor landscape.The alternative text for this image may have been generated using AI.
Full size image

A state space is defined where each dimension represents the activity of a brain region spanning the cortex. The hills and valleys represent the attractor landscape with valleys indicating the attractors. Each attractor corresponds to a recurring brain state that is identified from large-scale patterns of regional activity and interaction. The circles represent the neural activity at a specific moment. The trajectory of neural activity (indicated with black arrow) is largely determined by the landscape but can also be affected by external perturbations. For example, the red circle is more likely to fall toward the attractor based on the intrinsic landscape but may move away from the attractor when perturbed by external forces, such as stimuli, task demands, or behavior. The speed and direction of the movement on this landscape defines the geometry of neural dynamics. Example brain state figures are adapted from Song et al.24.

While attractor landscapes govern neural dynamics, it is important to understand how they relate to our internal states, including physiological (e.g., hunger, stress, arousal) and cognitive states (e.g., attention, emotion, motivation)20. Prior studies have provided initial evidence of this connection, showing that certain brain states are more or less likely to occur when a person is focused versus unfocused21, performing better or worse on a task22,23, or engaged versus disengaged in a movie24. However, the computational mechanism of this connection remains unknown25. One way to characterize such mechanisms is through the geometry of neural dynamics—the structure and flow of trajectories that describe how brain activity evolves over time along the attractor landscape (Fig. 1). This geometry can be quantified by the speed and direction with which brain activity moves toward or away from attractors. We hypothesized that the geometry of neural dynamics changes over time in relation to internal state fluctuations. Supporting this idea, Munn et al.26 showed evidence of flattened attractor landscape during phasic bursts in noradrenergic locus coeruleus activity (related to arousal)27 and steepened attractor landscape during phasic bursts in cholinergic basal forebrain activity (related to vigilance or attentional focus)28. This suggests a possibility that one’s internal state at a given moment may correspond to whether the brain occupies a flatter or steeper region of the attractor landscape, which in turn shapes the trajectory of neural dynamics.

Here, we propose that the geometry of neural dynamics on the attractor landscape characterizes moment-to-moment and context-to-context variations in internal states. In this study, we specifically test this in relation to measures of sustained attention. Dynamical systems models were fit to whole-brain fMRI data collected during rest, tasks, and movie-watching. Our model separates neural activity into two components: one that is intrinsic, driven internally by regional activity and interactions, and the other extrinsic, driven externally from the stimulus. By simulating neural trajectories from the model, we identify attractors toward which the neural activity converges. At each moment, we estimate the speed and direction of these simulated trajectories, using them to infer the steepness of the local attractor landscape. Importantly, we relate the change in geometries to behavioral measures of attention which were collected with the fMRI data. These measures include participants’ continuous ratings of engagement while watching comedy sitcoms and button response times during controlled sustained attention tasks.

Prior work has typically fit a set of static model parameters from neural activity dynamics, meaning variations across time have often been reduced to a single model that is agnostic to temporal change. Here, we reproduce evolving neural geometry from a model of fixed parameters and relate these dynamics to fluctuations in internal states. With this approach, we test a hypothesis that the geometry of large-scale cortical dynamics along the attractor landscape systematically reflects changes in attentional states during task and movie-watching contexts.

A dynamical systems model of large-scale cortical activity

We applied a large-scale parametric dynamical systems model, developed and validated by Singh et al.9 and Chen et al.19, to fit the time series of BOLD activity measured in human cortex with fMRI. The model defines the rate of change in neural activity \(\begin{array}{c}{\dot{x}}_{t}\end{array}\), as a function of the neural activity \({x}_{t}\) (\(\begin{array}{c}{x}_{t}\in {{\mathbb{R}}}^{n}\end{array},{n}=\) number of neural units) and time-aligned experimental variables \({u}_{t}\) (\(\begin{array}{c}{u}_{t}\in {{\mathbb{R}}}^{p}\end{array},{p}=\) number of experimental variables), such as task, stimulus, or behavior. In our model, \(X\) corresponds to the BOLD activity time series of 200 parcels covering the cortex29. \(U\) corresponds to audiovisual and semantic features extracted from the stimuli that participants watched and heard inside the scanner, which were reduced to 100 principal component dimensions. Mathematically, the model can be described as \(\begin{array}{c}{\dot{x}}_{t}=F\left({x}_{t}\right)+G({u}_{t})\end{array}\), where \(F\) and \(G\) are deterministic functions that represent, respectively, the impact of intrinsic activity and input-driven perturbations on the evolution of neural activity.

The followings are the specifics of our model, where \(F\left({x}_{t}\right)={W}{\psi }_{\alpha }({x}_{t})-D\odot {x}_{t}\) and \(G\left({u}_{t}\right)=\, \beta {u}_{t}\) with \(W\), \(D\), \(\alpha\), and \(\beta\) being model parameters (Fig. 2).

$${\dot{x}}_{t}\approx \frac{x\left(t+\varDelta t\right)-x\left(t\right)}{\varDelta t}=W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}+\beta {u}_{t}$$
(1)
$${\psi }_{\alpha }\left({x}_{t}\right)=\sqrt{{\alpha }^{2}+{(b{x}_{t}+0.5)}^{2}}-\sqrt{{\alpha }^{2}+{(b{x}_{t}-0.5)}^{2}}$$
(2)

Given that \(\begin{array}{c}\varDelta t\end{array}\) equals to 1 TR in our data, the equation can be simplified as follows.

$${\hat{x}}_{t+1}={x}_{t}+W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}+\beta {u}_{t}$$
(3)
Fig. 2: Model of large-scale cortical dynamics.
Fig. 2: Model of large-scale cortical dynamics.The alternative text for this image may have been generated using AI.
Full size image

A Model schematics. x represents the activity time series of a cortical parcel and \(u\) represents input from the stimulus. Model parameters include directional interactions between neural units (\(W\)), self-decay that determines autocorrelation (\(D\)), and the stimulus-to-brain relationship (\(\beta\)). Although only two units are visualized for simplicity, the model was fit on the time series of 200 cortical parcels. B Model optimization. The model was trained to minimize the difference between the observed and predicted neural activity patterns of consecutive time steps. Green denotes parameters that are estimated during training, black lines denote observed neural activity pattern at a time step (\({x}_{t}\)), with orange indicating sigmoidal bound from −1 to 1 given the nonlinear transfer function, and blue denotes stimulus embeddings at the corresponding time step (\({u}_{t}\)).

The goal of the model is to predict neural activity pattern of the consecutive time step \(({\hat{x}}_{t+1})\), by minimizing the prediction error (i.e., difference between the predicted \({\hat{x}}_{t+1}\) and observed \({x}_{t+1}\)) using algorithmic optimization (here, stochastic gradient descent). The prediction of the next time step is based on the neural activity \({x}_{t}\) plus the signals received by connections from other parcels \((W{\psi }_{\alpha }\left({x}_{t}\right))\) minus the self-decay \((D\odot {x}_{t})\) plus the neural activity driven by the external inputs \((\beta {u}_{t})\). In turn, fitting this model corresponds to decomposing intrinsic and extrinsic (i.e., input-driven) neural dynamics.

The weight matrix (\(\begin{array}{c}W\in {{\mathbb{R}}}^{n\times n}\end{array}\)) represents directional interaction between neural units, namely the effective connectivity. Nonlinearity is introduced by a parametrized sigmoidal transfer function (\(\begin{array}{c}{\psi }_{a}\end{array}\)), which maps neural activity to a bounded output at a range from −1 to 1. The slope of the transfer function differs for every parcel, parameterized by \(\alpha\) \((\alpha \in {{\mathbb{R}}}^{n})\). \(b\) is fixed at 20/3 following prior studies9,19. A self-decay (\(\begin{array}{c}D\in {{\mathbb{R}}}^{n}\end{array}\)) captures a return to baseline in absence of external inputs or interactions, with \(\odot\) denoting element-wise product. Having high decay rate corresponds to having low temporal autocorrelation, indicating less persistence in neural activity. \(\beta\) represents linear transformation from the stimulus embedding space to neural activity pattern space. Note that this is an individualized model where a set of parameters are estimated for each fMRI run of each participant.

We analyzed two openly available fMRI datasets, the SONG dataset24 (N = 27) and the HCP dataset30,31,32 (N = 119). In both datasets, each participant underwent multiple sessions of rest, task, and movie-watching conditions. We extracted time series of stimulus features for the movie-watching runs, including low-level visual and visuo-semantic features of video frames, low-level audio of the sounds, and audio-semantic features of the speech and dialogue in the movies. These features were projected onto 100 principal components that explained the largest variance across runs. Only low-level visual and visuo-semantic features were extracted from the SONG dataset task runs because the stimuli were purely visual. Given that no stimulus was provided for resting-state runs in either dataset, we fit a simplified model that only considers intrinsic but not extrinsic neural dynamics: \(\begin{array}{c}{\dot{x}}_{t}=W{\psi }_{\alpha }({x}_{t})-D\odot {x}_{t}\end{array}\) (removing the input term \(\beta {u}_{t}\) from the equation).

Model parameters recapitulate functional brain connectivity and stimulus encoding

We first tested whether our model, optimized to predict neural activity of successive time steps, captured parameters sensitive to individual and cognitive state differences. These properties are critical, as their emergence indicates the model’s biological plausibility. We analyzed the movie-watching runs of the SONG and HCP datasets because they were fit on the full model that includes the input term (see Supplementary Fig. S1 for the range of model parameter estimates).

Model performance was estimated based on how well the model predicted neural activity of the next time steps, specifically \({r}^{2}({\hat{x}}_{2:T},\,{x}_{2:T})\). The total explained variance was, on average, \({r}^{2}\) = 0.452 ± 0.043 across a total of 80 runs in the SONG dataset and \({r}^{2}\) = 0.561 ± 0.053 across 476 runs in the HCP dataset (Fig. 3A). This indicates that our model explained approximately half of the variance in neural activity. We then decomposed the explained variance into three components. The interareal interaction explained \({r}^{2}\) = 0.041 ± 0.009 (SONG), 0.055 ± 0.015 (HCP), local recurrence explained \({r}^{2}\) = 0.284 ± 0.051, 0.237 ± 0.060, and external input-driven activity explained \({r}^{2}\) = 0.017 ± 0.003, 0.015 ± 0.004 of the variances.

Fig. 3: Model validation with the HCP dataset.
Fig. 3: Model validation with the HCP dataset.The alternative text for this image may have been generated using AI.
Full size image

A Model performance was primarily assessed based on the explained variance of how well the model predicted neural activity of the next time step given the current time step (the explicit training objective). The histogram includes \({r}^{2}\) estimates from all movie-watching runs of 119 participants included in the HCP dataset. B, C Model parameters were compared to descriptive statistics, specifically \(W\) to functional connectivity (FC) estimates and \(\beta\) to coefficients estimated from the stimulus-to-brain encoding models. These aspects of the model were not explicitly optimized. D Example participant’s \(W\) estimate compared to their FC matrix. A representative participant’s data was selected for visualization (whose cosine similarity between \(W\) and FC was closest to the mean in (B)). E Individual differences were assessed by comparing parameter similarities between different movie-watching runs of the same participant (black-purple) to those of different participants (black-grey). Cognitive state differences were assessed by comparing parameter similarities of different participants’ same movie-watching runs (black-green) to those of different movie-watching runs (black-grey). F Density functions comparing similarities in parameter estimates between the same vs. different individuals (top) and the same vs. different cognitive states elicited by the same vs. different movies (bottom). Lines on top of the density functions connect the means of the two distributions, with asterisks indicating p < 0.0001. Supplementary Fig. S3 shows the same model validation results, analyzed with the SONG dataset.

We validated model performance across runs. Model parameters estimated from one run of a participant were better at predicting that same participant’s brain activity in other runs than predicting other participants’ brain activity (SONG: within: \({r}^{2}\) = 0.418 ± 0.037, across: \({r}^{2}\) = 0.402 ± 0.011; paired t-test t(79) = 4.049, p = 0.0001; HCP: within: \({r}^{2}\) = 0.526 ± 0.052, across: \({r}^{2}\) = 0.508 ± 0.015; t(475) = 9.375, p = 2.8e-19; Supplementary Fig. S2). This indicates that the model captured individual-specific patterns of brain dynamics and explained more variance within individuals than across individuals.

We further validated the model by comparing model parameters, \(W\) and \(\beta\), to their analogue descriptive statistics that are commonly used in the field. \(W\) was compared to an undirected functional connectivity (FC), estimated as the parcel-by-parcel Fisher’s transformed Pearson’s correlation coefficients. \(\beta\) was compared to regression coefficients estimated from a linear encoding model, that predicts neural activity from stimulus time series: \(\begin{array}{c}{x}_{t}=\left({{\rm{encoding\; coefficient}}}\right)\times \,{u}_{t}+({{\rm{residual}}})\end{array}\) \(({{\rm{encoding\; coefficient}}}\in {{\mathbb{R}}}^{n\times p},\,{{\rm{residual}}}\in {{\mathbb{R}}}^{n})\). We found that the estimated \(W\) was highly comparable to FC for both the SONG (cosine similarity = 0.929 ± 0.009; z = 1178.36, p < 0.0001 compared to shuffled chance distribution) and HCP datasets (cosine similarity = 0.913 ± 0.014; z = 2830.05, p < 0.0001) (Fig. 3B, D). Likewise, \(\beta\) was highly comparable to encoding coefficients for both the SONG (cosine similarity = 0.324 ± 0.100; z = 448.58, p < 0.0001) and HCP datasets (cosine similarity = 0.243 ± 0.036; z = 1022.18, p < 0.0001) (Fig. 3C).

Do these parameters capture differences between individuals as well as differences in cognitive states? Models were considered sensitive to individual differences if the parameters estimated from runs of the same person were more similar compared to parameters of different individuals (Fig. 3E). Both \(W\) and \(\beta\) were sensitive to individual differences, and \(W\) more strongly reflected individual differences than \(\beta\) (Wilcoxon rank-sum test between same vs. different individual pairs; \(W\): z = 14.957, p = 1.4e-50 for SONG, z = 44.667, p = 0.0 for HCP; \(\beta\): z = 2.714, p = 0.007 for SONG, z = 13.828, p = 1.7e-43 for HCP) (Fig. 3F). This aligns with prior findings that functional connectivity is stable within an individual and distinctive across individuals, thus driven more by trait than state differences33,34.

Models were considered sensitive to cognitive state differences if parameters estimated from runs of the same movie stimulus were more similar compared to runs of different stimuli (Fig. 3E). Both \(W\) and \(\beta\) were sensitive to cognitive state differences, and \(\beta\) more strongly reflected cognitive state differences than \(W\) (\(W\): z = 21.414, p = 9.9e-102 for SONG, z = 75.509, p = 0.0 for HCP; \(\beta\): z = 45.189, p = 0.0 for SONG, z = 242.537, p = 0.0 for HCP) (Fig. 3F). The corresponding descriptive statistics—the FC and encoding coefficients—closely followed this trend. These results indicate that the parameters estimated from our dynamical systems model are comparable to validated descriptive statistics and hold biological plausibility.

The model reveals stable attractors organized along the cortical hierarchy

In dynamical systems, the differential equation determines and predicts how the system’s trajectory would evolve over time from a given initial state, assuming that no internal or external perturbation exists beyond what is parameterized. Depending on the nature of the system, the trajectory may eventually converge to a set of stable fixed point attractors, settle onto stable periodic orbits known as limit cycles, bounce between saddle points when attractors are absent, or exhibit chaotic behavior19.

We hypothesized that our data-driven models would reveal a finite set of attractors, or stable patterns of neural activity toward which neural trajectories are likely to converge. To find attractors, we ran forward simulations of the model starting from multiple initial states, each corresponding to the observed 200-parcel neural activity pattern at every time step of the fMRI time series (Table 1). The observed neural activity at each time step was simulated 5000 timesteps forward in time, which we considered was sufficient for convergence. This method simulates the intrinsic drift, or the likely path that neural activity would follow in the absence of external perturbations beyond those parameterized by the model. If the forward simulation reached a state where no further change occurred over the last ten iteration steps, we considered it to have converged to a fixed point, representing the bottom of a valley in the inferred attractor landscape (Fig. 1). Fixed points estimated across all time steps were grouped as the same attractor if their pairwise Euclidean distance was less than 0.1 (Fig. 4A).

Fig. 4: Attractor landscapes of large-scale cortical activity.
Fig. 4: Attractor landscapes of large-scale cortical activity.The alternative text for this image may have been generated using AI.
Full size image

A Four types of forward simulation results visualized from example fMRI runs. Black dots indicate 100 randomly sampled initial states (i.e., 100 time steps sampled from the observed neural activity). Each line indicates a trajectory taken over the course of a forward simulation (5000 iterations). Red dots indicate the end states of the simulations, corresponding to the attractors. Principal component analysis was conducted per run to visualize the results in a 3D principal component (PC) space. B Attractors identified from all runs of the HCP dataset, color-coded based on the outcome of k-means clustering into 4 clusters and projected onto the shared PC space for visualization. Dots correspond to attractors estimated from the 1666 HCP runs. Lines connect the pairs of cluster centroids that have anticorrelated patterns. C Neural patterns of the two identified attractor clusters. D Neural patterns of the top two gradients identified from Margulies et al.36. \(\rho\) values in between (C and D) indicate rank correlation values of the top and bottom neural patterns. Colors indicate loadings on the first two axes of the PC and gradient analyses. DMN default mode network, UNI unimodal network, VIS visual network, SM somatosensory-motor network.

Table 1 Forward simulation

We focused this analysis on the HCP dataset, given that it contained a larger number of total runs (1666 runs) compared to the SONG dataset (188 runs). We included rest, task, and movie-watching runs in the analysis. When conducting forward simulations on all runs respectively, a majority of runs converged onto a set of point attractors: many converged onto 2 (41.06%) or 4 attractors (51.08%), and some converged onto 6 (7.02%) or 8 attractors (0.66%) (Fig. 4A). The stability of the attractors in all of these runs was confirmed by computing the Jacobian matrix of the model and verifying that all eigenvalues occupy the unit circle, indicating that small perturbations around an attractor would return the system to that attractor. Only 3 out of 1666 runs (0.18%) did not converge to fixed point attractors but exhibited oscillatory limit cycles. This indicates that when large-scale cortical activity evolves according to the dynamics specified by the model equations, it is most likely to fall into a set of attractors. Note that the model algorithm is designed to identify an even number of attractors, each representing opposing patterns of neural activity (Supplementary Text S1).

We predicted that these attractors would tile the core gradients of cortical organization that span between the transmodal default mode network (DMN) areas and the unimodal sensory and motor areas35,36. Furthermore, we expected that these attractors would correspond to canonical brain states—distinct and recurring patterns of neural activity—that have been replicated in multiple studies24,37. To test these hypotheses, we combined the activity patterns of all fixed point attractors estimated in every run. We applied a k-means clustering to find 4 attractor clusters, because a majority of runs exhibited either four or less attractors (Fig. 4B).

The mean activity patterns of these attractor clusters (Fig. 4C) were compared to the known cortical gradients estimated by Margulies et al.36 (Fig. 4D). This previous work applied a nonlinear dimensionality reduction algorithm on hundreds of participants’ resting-state functional connectivity data to find top gradients that explained the largest variances. The primary gradient distinguished unimodal from transmodal areas, and the secondary gradient distinguished sensory from motor areas, which were argued to be an “intrinsic coordinate system” of the human brain38 and replicated by multiple research groups39.

Our two attractor clusters were highly comparable to the top cortical gradients (\(\rho\) values = 0.850 and 0.896, p values < 0.0001). Specifically, the attractor clusters served as axes that separated (i) the DMN from the unimodal (UNI) sensory and motor areas and (ii) the visual network (VIS) from the sensorimotor network (SM). We did not find meaningful clusters of attractors nor resemblance to gradients when the attractors were estimated from the randomly circular-shifted fMRI data (Supplementary Fig. S4). We also found no significant participant-level differences in attractor positions, indicating that the attractor locations were largely consistent across participants (Supplementary Text S2). These results provide a dynamical systems explanation for the recurrence of a small number of brain states over time: attractors correspond to brain states, the positions of these attractors are constrained by the brain’s functional network organization, and neural dynamics unfold along this attractor landscape.

Geometry of neural dynamics differs across fluctuating attentional states during task and movie-watching

We characterized the attractor landscapes of large-scale neural dynamics, or the likely path that neural activity will take when following the model’s equations (Fig. 4A). Would the geometry of these paths vary depending on a person’s attentional state at a given moment? In other words, would one’s attentional state—from a state of being focused on a task versus zoning out to being engaged in a movie versus being bored—relate to where the brain state is positioned within the hills and valleys?

Revisiting the forward simulation in Table 1, we identified two vectors at each time step based on the observed neural activity \({x}_{t}\): one defining the intrinsic drift, or the flow governed by regional connections and self-decays (called the “intrinsic vector” defined by \(W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}\), shortened as \(\vec{WD}\)), and the other defining the flow nudged by the external inputs at the corresponding moment (called the “extrinsic vector” defined by \(\beta {u}_{t}\), shortened as \(\vec{B}\)). To quantify this, for every time step, we extracted the intrinsic and extrinsic vectors and calculated their angles and magnitudes (Fig. 5A). The angle indicates the direction of the vector with respect to the position of the attractor it eventually converged, with high angle indicating the vector directing away from the attractor. The magnitude represents the degree of change from the initial state, with high magnitude indicating a fast-moving vector. Because these measures were estimated at each time step, we were able to extract their time series over the course of the fMRI run. In essence, they provide a geometric characterization of neural dynamics on the attractor landscape.

Fig. 5: Neural dynamics toward attractors and their relationship with attention.
Fig. 5: Neural dynamics toward attractors and their relationship with attention.The alternative text for this image may have been generated using AI.
Full size image

A Geometric measures were estimated from neural activity at every time step at the first instance of forward simulation. In \({x}_{t+1}={x}_{t}+W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}+\beta {u}_{t}\), we decomposed a vector representing the intrinsic drift (\(W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}\)) and a vector driven by external inputs (\(\beta {u}_{t}\)). The magnitude of the vector was calculated using the Euclidean norm. The direction of the vector was calculated using the cosine angle with respect to the position of the attractor. Because these four measures were calculated at every time step, we were able to extract their respective time series for each fMRI run. B Correlations between attention measures and the angle (blue) and magnitude (purple) of the intrinsic vectors across movie-watching and attention task runs, compared to the respective chance distribution (two runs in each condition; N = 27). Light grey areas indicate permuted chance distributions, black dots indicate correlation values between attention measures and the estimated angle or magnitude, and colored lines indicate the mean of these correlation values. Asterisks denote statistical significance shown in Table 2. Mag: magnitude. C Proportions of identified attractor clusters. The identified attractor at each time step was categorized to either one of the two ends of the primary gradient (default mode network [DMN] or unimodal network [UNI]) or the two ends of the secondary gradient (visual network [VIS] or somatosensory-motor network [SM]). The proportion of the attractor cluster was calculated at each run (black dot), which was averaged across runs to be summarized as a bar graph. Error bar indicates the standard error of the mean. D Correlations between individuals’ attention measures and the angle and magnitude of the intrinsic vector, categorized based on the positions of the attractors. Black dots indicate estimates from every run of every participant. Colored bars indicate the mean of correlation values that are significantly different from the permuted chance distribution (corrected for false discovery rate, q < 0.05), whereas empty bars indicate non-significance. Because VIS and SM attractors were less likely to occur during tasks, they were excluded from the analyses. Supplementary Fig. S6 shows separate results for the two runs in each condition.

We asked whether neural trajectories toward attractors—characterized by the angle and magnitude of the intrinsic and extrinsic activity—systematically varied based on the attentional state of a participant. We focused our analysis on the SONG dataset, which contained both naturalistic movie-watching (i.e., two runs of comedy sitcom episode watching) and controlled attention tasks (i.e., two runs of gradual-onset continuous performance task; gradCPT)24. Importantly, these runs included time-varying behavioral measures of attention. During sitcom episodes, participants continuously rated how engaging they found the episode by adjusting the scale bar40. This measure was collected after the fMRI scan, as participants re-watched the same episode. (Narrative engagement has been characterized as a state of heightened emotional arousal and attentional focus40,41,42,43. Because narrative engagement accompanies changes in both arousal and attentional states, denoting them as “attentional state” is a simplification made in this article.) During gradCPT, participants pressed a button at every second whenever target images appeared inside the scanner. The inverse of response time variability served as a proxy of sustained attention, with moments of stable response times indicating high attention and variable response times indicating low attention44. We correlated the behavioral time series of each participant with the four geometric measures’ time series, which was compared to a respective chance distribution where each geometric measure was correlated with circular-shifted behavioral time courses (Table 2 and Fig. 5B).

Table 2 Correlations between attention measures and the angle (representing direction) and magnitude (representing speed) of the neural trajectory across different conditions, compared to the respective chance distribution

Both the angle and magnitude of intrinsic activity (\(\vec{WD}\)) were significantly correlated with attention dynamics (Table 2 and Fig. 5B). The relationships were comparable between repeated runs of the movie-watching and attention tasks, highlighting the reliability of our results. Interestingly, an opposite relationship to attention dynamics was found between the two contexts. The angle of the intrinsic vector was large when participants reported high engagement toward episodes, whereas the angle was small when participants performed stably in gradCPT. The magnitude of the intrinsic vector was small when participants reported high engagement toward episodes, whereas the magnitude was large when participants performed stably in gradCPT. This means that neural dynamics toward attractors—largely determined by the landscape—not only varied across attentional state dynamics but in a manner different across contexts.

On the other hand, the angle and magnitude of the extrinsic activity (\(\vec{B}\)) did not meaningfully relate to attention (Table 2). There were weak trends of correlations, which were seemingly derived from emergent correlations amongst the four geometric measures (Supplementary Fig. S5). This highlights that attention relates to changes in intrinsic neural dynamics, but not stimulus-driven neural dynamics.

In a study by Song et al.24 in which this original data was collected, different brain states were associated with high attention in movie-watching and attention task contexts. Participants reported high engagement to sitcom episodes during the “base” state, a state where no functional network exhibited dominant activity and was positioned at the center of the latent manifold. On the contrary, participants exhibited stable task performance to gradCPT during the DMN state, a state defined by high activity in the DMN (consistent with findings by refs. 45,46,47). Motivated by these results, we categorized attractors into either of the four ends of gradients 1 (DMN and UNI) and 2 (VIS and SM), based on the neural pattern similarity between attractors and the gradients. We assessed the relationships between the geometric measures and attention dynamics, separately for these four attractors. The goal was to see if the relationship, shown in Fig. 5B and Table 2, differs depending on where the neural trajectories converge.

We primarily found that the likelihood of attractors differed across the two contexts (Fig. 5C). The neural activity was nearly equally likely to fall into one of the four attractors during movie-watching, whereas the ends of gradient 1, the DMN and UNI, were much more likely to serve as attractors during sustained attention tasks, compared to the ends of gradient 2. This finding showing different likelihood of attractors across contexts indicates that the attractor landscapes differed across contexts.

During movie-watching, we found that the main relationship between the geometric measures and attention dynamics remained consistent, irrespective of which attractors the neural activity fell into. When participants reported high engagement to movies, the intrinsic drift directed away from the attractors with decreased magnitude (Fig. 5D). This implies that the brain was in a flattened or shallow region of the landscape when people were attentive, such that the brain activity was more likely to lie at the center of the manifold and gravitated less toward the attractors (Fig. 6A). On the contrary, during attention tasks, the main effect we found in Table 2 was specific to when the neural activity converged onto the DMN attractor (Fig. 5D). When attentive, neural activity approached the DMN attractor faster and more directly. This implies that the brain was on a steeper region of the attractor landscape near the DMN attractor when attentive (Fig. 6B).

Fig. 6: Neural geometry across attentional states and contexts.
Fig. 6: Neural geometry across attentional states and contexts.The alternative text for this image may have been generated using AI.
Full size image

Schematic illustration of the results in Fig. 5D for A movie-watching and B attention task conditions. (Top) The angle and magnitude of the intrinsic vector during engaged vs. not engaged and attentive vs. not attentive states in movie-watching and attention task contexts. The black node illustrates the initial state and the four colored nodes illustrate attractors, with the vertical axis indicating gradient 1 and the horizontal axis indicating gradient 2 as in Fig. 4B. The illustration is based on the angle and magnitude of the vectors from the initial state to the attractors, not the positions of the initial states with respect to the attractors nor the distance amongst attractors. Black dashed lines illustrate intrinsic drifts of neural activity toward attractors. Black solid lines illustrate a one-time-step vector of the drift. Shaded areas indicate positions of brain states that were reported to be associated with high attention in Song et al.24. The VIS and SM attractors are dashed in attention task context because they are less likely to occur in this context (Fig. 5C). (Bottom) The 3D schematic illustrations of the attractor landscapes in the two contexts. White circles indicate brain states in attentive states and black circles indicate brain states in inattentive states. DMN default mode network, UNI unimodal network, VIS visual network, SM somatosensory-motor network.

Together, the results suggest that the neural dynamics toward attractors differed across high and low attentional states, in a context-dependent manner. This indicates that the attractor landscape of large-scale cortical dynamics systematically varies depending on attentional states and task demands.

Discussion

In this study, we fit a dynamical systems model to large-scale fMRI data to investigate how the geometry of neural dynamics differs across changes in attentional states in different contexts. By simulating brain dynamics over time, we identified stable attractors that aligned with cortical gradients separating transmodal from sensorimotor regions as well as sensory from motor regions. Neural trajectories toward these attractors systematically varied with moment-to-moment attentional state fluctuations, in a manner different across situational contexts. These results indicate that the geometry of neural dynamics along the attractor landscape reflects changing attentional states and different task demands across situations.

The dynamical systems model used in this study is a simplified neural mass model that is tailored to simulate large-scale regional activities and their interactions, rather than local neuronal activities within a region. We found that the model parameters effectively reproduced descriptive statistics such as functional connectivity and stimulus encoding and were sensitive to trait- and state-level differences. We compared the model parameters with these metrics because, although they are imperfect measures, they are the most widely used in the field and therefore provide an interpretable benchmark. Beyond reproducing descriptive statistics, the model’s strength comes from decomposing components of neural activity that are driven by interactions between brain regions, autocorrelation within each region, and external inputs—which together explained nearly half of the variance in the observed neural activity. These parameter estimates became the basis for estimating attractors and trajectories along the landscape. Moreover, the model was not tailored specifically to fit fMRI data, meaning the model can be used with other data modalities. These together suggest that our model provides a mathematical description of neural dynamics that are generative, biologically plausible, and generalizable to other research domains.

Forward simulation of the model revealed a set of attractors to which neural activity was more likely to converge. While the model identified an attractor landscape that governs cortical dynamics, this does not imply that cortical activity must converge to those attractors. Instead, the landscape provides the underlying structure over which activity evolves, with trajectories continuously influenced by stimuli, task demands, and behavior. The attractors identified from this analysis recapitulated canonical brain states that were identified in previous studies, which tiled the known gradients of cortical hierarchy. Specifically, the attractors were marked by high activities in the DMN, VIS, and SM—an emergent property of the model rather than a feature imposed by the model design. This conceptually replicates many studies in human systems neuroscience that have revealed the existence of a low-dimensional manifold of macroscale neural activity36,48,49,50 that is conserved across evolution51,52, stable across contexts53,54, and confined by structural architecture and genetic makeup of the brain55,56,57,58. This indicates that the positions of the attractors are largely fixed, confined by the brain’s canonical functional architecture.

Although the positions of the attractors are largely determined, it does not mean that the attractor landscape is fixed. Rather, the attractor landscape as well as the traversal along its hills and valleys can be flexible, which motivated us to study neural dynamics along the attractor landscape in relation to attention dynamics. We found that not only did the attractor landscapes differ across situational contexts, but even within a context, regions occupied within the landscape varied depending on a person’s attentional state. When participants were engaged in sitcom episodes, the intrinsic drift directed away from the attractors with decreased magnitude. This was a depiction of neural activity being less prone to fall into attractors as it situated on a shallow landscape at moments of engagement. In contrast, when participants were paying attention to an effortful psychological task, the intrinsic drift directed specifically toward the DMN attractor with increased magnitude. This indicates that the neural activity more easily fell into the DMN attractor because it was in a steeper landscape when attentive. This is in line with studies that reported high activity in DMN at moments of optimal performance during this task45,46,47. These results highlight the flexible geometry of neural dynamics on the large-scale attractor landscape—it systematically changes across attentional state fluctuations.

The findings that brain activity tended to lie on a shallow attractor landscape when engaged in sitcoms and a steep local attractor when attentive to tasks resemble results reported by Munn et al.26, which characterized cortical dynamics within an energy landscape framework. Energy landscape formulations conceptualize fluctuations in brain activity in terms of the energy required to move between brain states and have been the basis for influential hypotheses regarding functions such as predictive coding, learning, and inference59. Munn et al.26, specifically, found evidence of flattened cortical landscape (i.e., decreased energy) upon activation of the noradrenergic arousal system, which projects broadly across the cortex, and deepened cortical landscape (i.e., increased energy) upon activation of the cholinergic vigilance system, which projects to relatively local functional networks. This suggests a hypothesis that different neuromodulatory circuits may underlie internal states of being immersed in engaging narratives versus being attentive to effortful and controlled tasks. In line with previous findings that engagement correlates with perceived emotional arousal during narratives40,41,42,43, our results hint that the attractor landscape of being engaged may resemble a state of heightened arousal, more so than heightened vigilance. Future work can address this hypothesis by administering pharmacological agents that modulate noradrenergic and cholinergic activity during similar task and movie-watching conditions.

In sum, the attractor landscape is flexible across situational contexts, and cortical dynamics along this landscape reflect changes in attentional states. By modeling neural dynamics, we offer a geometric framework that can explain how internal states arise from large-scale brain activity.

Methods

Model description

The dynamical systems model used in this study is adopted from the neural mass model called the mesoscale individualized neurodynamic (MINDy) model9,19. The model is designed to fit the neural activity time series—whichever units or scales the neural activities are sampled from—and the time-aligned experimental variables, such as task, stimulus, or behavior. For our use, the neural activity corresponded to the BOLD activity of 200 cortical parcels collected from human fMRI. The experimental variable corresponded to audiovisual and semantic feature embeddings of the movies that participants watched inside the scanner and visual feature embeddings of the images that were presented as task stimuli. However, the choice of neural units and experimental variables can vary depending on the study and research question. No tailoring specific to the fMRI data (e.g., deconvolution of the hemodynamic response function) was made.

To reiterate, the model is defined as the following equation, with the neural activity at time \(t\) represented as \({x}_{t}\) (\(\begin{array}{c}{x}_{t}\in {{\mathbb{R}}}^{n}\end{array},{n}=\) 200 parcels) and stimuli at time \(t\) represented as \({u}_{t}\) (\(\begin{array}{c}{u}_{t}\in {{\mathbb{R}}}^{p}\end{array},{p}=\) 100 PCs of the feature embeddings), with \(\begin{array}{c}\varDelta t\end{array}\) set to 1 TR.

$${\hat{x}}_{t+1}={x}_{t}+W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}+\beta {u}_{t}$$
(4)
$$\begin{array}{c}{\psi }_{\alpha }\left({x}_{t}\right)=\sqrt{{\alpha }^{2}+{(b{x}_{t}+0.5)}^{2}}-\sqrt{{\alpha }^{2}+{(b{x}_{t}-0.5)}^{2}}\end{array}$$
(5)

The weight matrix (\(\begin{array}{c}W\in {{\mathbb{R}}}^{n\times n}\end{array}\)) represents directional interaction between neural units, which conceptually corresponds to effective connectivity. \(W{\psi }_{\alpha }({x}_{t})\) represents the weighted sum of the neural unit’s nonlinearly transformed activity at time \(t\) multiplied by its directed connections from every other neural unit. To constrain the estimation of \(W\) as the sum of sparse random component and a low-rank structured component60, we decomposed the parameter into \(W=\,{W}_{S}+\,{W}_{L}\) where \({W}_{S}\) (\(\begin{array}{c}{W}_{S}\in {{\mathbb{R}}}^{n\times n}\end{array}\)) represents a sparse matrix after \({L}_{1}\) regularization and \({W}_{L}=\,{W}_{1}{W}_{2}^{T}\) (\(\begin{array}{c}{W}_{1,\,2}\in {{\mathbb{R}}}^{n\times r}\end{array}\)) is given by low-rank approximation (\(r=n/3\)) with sparsity also given to \({W}_{1}\) and \({W}_{2}\) with \({L}_{1}\) regularization. This formalization allows the model to capture global low-dimensional motifs of connectivity superimposed on a sparse network, which gives rise to the known low-dimensional and modular structure of large-scale functional organization.

Nonlinearity follows a parametrized sigmoidal transfer function (\(\begin{array}{c}{\psi }_{a}\end{array}\)), which maps neural activity to a bounded output at a range of −1 to 1. The slope of the transfer function is determined by the estimated \(\alpha\). \(b\) is fixed at 20/3 in our model.

\(\begin{array}{c}\beta \in {{\mathbb{R}}}^{n\times p}\end{array}\) represents a linear coupling between neural activity and incoming stimuli that are time-aligned with one another. The time was aligned by convolving the stimulus time series with the canonical hemodynamic response function. \(\beta\) conceptually corresponds to a linear mapping from the stimulus space to the neural space.

A decay term (\(\begin{array}{c}D\in {{\mathbb{R}}}^{n}\end{array}\)) represents convergence to baseline activity at the absence of external inputs or interactions, with \(\odot\) denoting element-wise product. \(D\) is an algorithmically important parameter, because \(D\) is initially set to a value significantly higher than 1, which flips the right-hand side of the equation toward a large negative factor of \({x}_{t}\). This accentuates the difference in neural activity between units, thus allowing effective estimation of the directed connectivity \(W\). From our empirical tests, if \(D\) is set to a biologically plausible value of <1, the estimated \(W\) does not recapitulate the descriptive functional connectivity measure.

Model fitting

We fit the neural activity time series acquired at each run, in batches of size 300 TRs. Specifically, we predicted the neural activity of consecutive time steps \({\hat{x}}_{2:T}\) based on the observed neural activity \({x}_{1:T-1}\) (where \(T\) = 300). Model parameters were optimized across 2500 iterations using stochastic gradient descent, specifically the Nesterov-accelerated Adaptive Moment Estimation optimizer as chosen in previous studies9,19. The loss \({{\mathscr{L}}}\) was calculated as follows. \(\varLambda\) represents the regularization term, which enforces sparsity in connections. Regularization terms were fixed to \({\lambda }_{1}=0.075\), \({\lambda }_{2}=0.2\), \({\lambda }_{3}=0.05\).

$$\begin{array}{c}{{\mathscr{L}}}=\frac{1}{2}{{||}{\hat{x}}_{2:T}-{x}_{2:T}{||}}_{2}^{2}+\varLambda \end{array}$$
(6)
$$\varLambda={\lambda }_{1}{{||}{W}_{S}{||}}_{1}+{\lambda }_{2}{{\rm{Tr}}}(|{W}_{S}|)+{\lambda }_{3}\left({{||}{W}_{1}{||}}_{1}+{{||}{W}_{2}{||}}_{1}\right)$$
(7)

Because \(D\) was initialized at a value higher than 1, the prediction accuracy (i.e., rank correlation between the predicted and observed neural activity patterns) started at a value near −1 at the start of the training. The prediction accuracy increased gradually across iterations. In contrast, \(W\) quickly became comparable to a functional connectivity matrix in the initial phase of training, but across more iterations, it gradually approached a diagonally dominant matrix, which is conceptually similar to the first-order autoregressive model. To prioritize biologically meaningful parameter optimization rather than brain activity prediction, we stopped the training at 2500 iterations (in batches of 300 consecutive TRs) at which point \(W\) was similar to functional connectivity, but the model’s accuracy still remained negative. To boost prediction accuracy, we fit an ordinary least squares linear regression model to estimate \({pW}\), \({pD}\), and \({pB}\), which are scalar values (\({pW}\), \({pD}\), \({pB}\) \({\mathbb{\in }}{\mathbb{R}}\)). Parameters \(W\), \(D\), and \(\beta\) were scaled by these value estimates, respectively.

$$\begin{array}{c}\begin{array}{c}{\dot{x}}_{t}={pW}*W{\psi }_{\alpha }\left({x}_{t}\right)-{pD}*D\odot {x}_{t}+{pB}*\beta {u}_{t}\end{array}\end{array}$$
(8)

Model fitting and ensuing analyses were conducted in Python (v3.12.2) and Pytorch (v2.4.1).

Hyperparameters

Hyperparameter selection largely followed the original implementation of the MINDy model9, but with some simplifications made (Table 3).

Table 3 List of hyperparameters and parameter initializations

FMRI datasets

Two openly available fMRI datasets were analyzed: the SONG dataset (participants recruited in South Korea; N = 27) and the HCP dataset (participants recruited in the USA; N = 119). Both datasets were approved by the institutional review boards and de-identified for sharing. No information on participants’ age, sex, or gender was collected in the present study. Preprocessing steps followed Song et al.24. We applied a 200-parcel cortical atlas by Schaefer et al.29 where the BOLD activities of voxels corresponding to each parcel were averaged to represent the parcel activity9,19. The SONG dataset includes two runs of resting-state, two runs of controlled sustained attention task called the gradual-onset continuous performance task (gradCPT), two runs of comedy sitcom-watching, and one run of educational documentary-watching (3 T scans with TR = 1 s). The HCP dataset includes four runs of resting-state in 3 T (TR = 0.72 s) and four in 7 T (TR = 1 s), two runs of working memory tasks in 3 T (6 different types of cognitive task runs were excluded from analyses because the total TRs were less than 350 TRs), and four runs of movie-watching in 7 T. These runs varied in total duration, ranging from 405 to 1486 TRs.

Input features

Low-level visual features were characterized by hue, saturation, and pixel intensity, which were estimated per frame and averaged across frames within an event (rgb2hsv function in MATLAB R2024a). In the gradCPT run with face images, hue and saturation were excluded from the analyses because the images were presented in grayscale. Low-level audio features were represented with amplitude and pitch of left and right stereos, which were estimated per frame and averaged across frames within an event (audioread and pitch functions in MATLAB). Visuo-semantic features were represented by 512-dimensional embeddings of OpenAI’s pretrained Contrastive Language-Image Pre-training model61 (huggingface; clip-vit-base-patch32). For audio-semantic features, we first applied OpenAI’s WhisperX model to the audio file to transcribe speech and dialogues, along with their time stamps62. Transcripts were sampled at a TR resolution and were transformed using 512-dimensional embeddings of Google’s Universal Sentence Encoder63 (tensorflow v2.19.0; https://tfhub.dev/google/universal-sentence-encoder-multilingual/3). A mean value of the respective dimension was assigned to moments when speech or dialogue did not exist.

Embedding time series were convolved with a canonical hemodynamic response function and normalized across time per dimension. Principal component analyses were conducted on the concatenated embedding time series of the 3 movie-watching runs in SONG and 4 runs in HCP, respectively. We selected the top 100 principal component time series, which explained 74.97% and 77.69% of variance, respectively. Principal component analysis was conducted on the gradCPT face run in isolation, given that the same images were presented in the same sequence for all participants (100.00% of explained variance for 100 principal components). For gradCPT scene runs, because presented images and sequences differed for all participants, participant-specific embeddings were concatenated for a principal component analysis (85.34% of explained variance). Resulting principal component time series were again normalized across time per dimension to be used as stimulus input \(u\) in the model.

Validation of model parameters

Parameters were estimated through training the model on the data collected from an individual fMRI run (2500 training iterations). After parameters were fixed, we predicted the next-time-step neural activity from the observed neural activity time series to calculate explained variance (\({r}^{2}\)). Note that the data used for training and testing were the same. When comparing parameter \(W\) with an undirected functional connectivity matrix (parcel-by-parcel Fisher’s transformed Pearson’s correlation coefficients), we took the average of the upper and lower triangles of \(W\) (region \(i\)\(j\) and region \(j\)\(i\) in a directed graph) and took the cosine similarity with the edge strengths of the functional connectivity matrix. Due to the low temporal resolution of the fMRI data, our \(W\) estimates are largely symmetrical. Encoding coefficients were estimated using an ordinary least squares regression with a residual term. Cosine similarity between all values in \(\beta\) and values in encoding coefficients were computed. For both metrics, values in descriptive statistics were randomly shuffled 10,000 times, which served as the respective chance distribution. Z statistics and two-tailed p values were calculated with respect to chance distributions. Cosine similarities were calculated between parameter estimates of run pairs. The cosine similarity values were grouped into whether they correspond to the same vs. different individuals or the same vs. different movie stimuli, which were compared using Wilcoxon rank sum tests.

Attractor clusters and gradients

Attractors were estimated by forward simulating the observed neural activity pattern at each time step based on the estimated model parameters (Table 1). For each time step, we performed 5000 simulations. If there was no longer a change in the last 10 iterations of the 5000th forward simulations \(({{||}\dot{x}{||}}_{\infty } < 1{{\rm{e}}}-6)\), we considered the system to have converged to a fixed point. If the Euclidean distance between the pair of estimated fixed points obtained from different time steps was less than 0.1, those were grouped as the same attractor. Using this approach, we found that most fMRI runs converged to 2 or 4 attractors. Principal component analysis was applied to each individual fMRI run only for visualization purpose in a 3D space (Fig. 4A).

We then aggregated attractors (each defined as a neural activity pattern across 200 parcels) across all fMRI runs and applied k-means clustering to identify 4 attractor clusters. The choice of k = 4 was motivated by the observation that most runs exhibited either 2 or 4 attractors, with higher numbers occurring less frequently. The mean activity pattern of each cluster represented the attractor cluster (Fig. 4B, C). Two pairs of clusters exhibited a pattern correlation of −1. Principal component analysis was also applied to the aggregated attractor patterns, only for visualization purpose (Fig. 4B).

Cortical voxel gradients estimated by Margulies et al.36 were obtained from NeuroVault (https://identifiers.org/neurovault.collection:1598). Gradient values were averaged within each parcel to generate gradient estimates of the 200 parcels (Fig. 4D). The neural activity patterns of the attractor clusters were then compared to the parcel-level gradient values using Spearman’s rank correlation.

Relating speed and direction of neural dynamics toward attractors with measures of attention

With the neural activity pattern at each time step as an initial state (\({x}_{t}\)), we considered one-step forward simulation of \({\hat{x}}_{t+1}={x}_{t}+W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}+\beta {u}_{t}\) and separated vectors representing internally-driven change (\(W{\psi }_{\alpha }\left({x}_{t}\right)-D\odot {x}_{t}\), shortened to \(\vec{WD}\)) and externally-driven change (\(\beta {u}_{t}\), shortened to \(\vec{B}\)). The angle between the respective vector and the eventual end state was calculated with the following equation,

$$\theta=\arccos \left(\frac{v1\cdot v2}{{||}v1{||}\cdot {||}v2{||}}\right)$$
(9)

where \(v1\cdot v2\) represents the dot product of the two vectors, and \({||v}1{||}\) and \({||v}2{||}\) are the Euclidean norms of the vectors. Before applying the inverse cosine, the cosine similarity was clipped to the range of [−1, 1]. The magnitude of the vector was calculated as the Euclidean norm. Because these measures were estimated at each time step, repeating this across all time steps within a run resulted in their time series.

To probe attentional state changes during movie-watching, Song et al.24 asked participants to continuously rate their engagement, on a scale of 1 to 9, as they re-watched the movies after the fMRI scan. Each participant’s engagement rating was normalized across time and convolved with a hemodynamic response function. To probe attentional state changes during gradCPT, we analyzed participants’ button response times as in the previous study24. After linear interpolation of no button response trials and regressing out the linear trend, we calculated response time variability by taking the deviance from the mean response time at every TR. Because studies showed that moments of low response time variability correspond to high sustained attention or optimal task performance, the inverse response time variability time course was used as a proxy for attention dynamics44. Again, the inverse response time variability was normalized across time and convolved with a hemodynamic response function. See Song et al.24 for details of behavioral experiments and data analyses.

Angle and magnitude time series were correlated with attention measures sampled at a TR resolution using Spearman’s rank correlation, for each individual run. The mean of all participants’ Fisher’s r-to-z transformed \(\rho\) values were compared to permuted null distributions to calculate z statistics, where behavioral time series were circular-shifted across time with a random multiplication of either 1 or −1 (10,000 iterations). FDR correction was applied across 16 significance tests (Table 2).

The relationship between geometric measures and attention was analyzed separately depending on the position of the attractor at each time step, or where the eventual end state lies at the final round of forward simulation. Because we found that attractor clusters lie at the ends of the known primary and secondary gradients (Fig. 4), we categorized the attractor of each time step to one of the four ends of the two gradients. The attractor was labeled as either the DMN, UNI, VIS, or SM, based on the highest correlation coefficient. Correlation between the geometric measure and attention was calculated from a subset of time series corresponding to the respective attractor cluster category. In a majority of participants’ gradCPT runs, VIS and SM attractors did not appear. Because fewer than 10 fMRI participants’ gradCPT face or gradCPT scene runs included VIS and SM attractors, statistical analysis was not conducted for these cases. Significance was again tested by comparing the mean of Fisher’s r-to-z transformed \(\rho\) values to null distributions created from circular-shifted behavioral time series. Twelve comparisons were corrected for in Fig. 5D, and 24 comparisons were corrected for in Supplementary Fig. S6B using FDR correction.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.