Abstract
Complex behavior entails a balance between taking in sensory information from the environment and utilizing previously learned internal information. Studies on mice show that the brain continually alternates between outward and inward cognitive modes every few seconds, accompanied by stereotyped cascades of neuronal spiking. Our analysis of large fMRI datasets revealed a similar mechanism in humans. Human brain activity was punctuated every several seconds by coherent, propagating waves emerging in the exteroceptive sensorimotor regions and terminating in the interoceptive default mode network. As in mice, these waves in human fMRI are accompanied by phase-specific enhancements in sensory information encoding and memory retrieval. These findings suggest a conserved feature of mammalian brain physiology that bears directly on the integration of sensory and mnemonic information during everyday behavior.
Similar content being viewed by others
Introduction
The human brain undergoes slow, spontaneous fMRI fluctuations during rest, in the absence of external stimulation and task engagement1,2. While this activity has been used widely to characterize the functional connectivity between brain regions2,3,4,5, its contribution to the normal operation of the brain has remained elusive. Two curious features of this activity that have drawn attention in recent years are its manifestation as discrete, quasi-periodic events and its spatiotemporal propagation across the brain6,7,8,9. Recent work describes such propagation as moving from low-order sensory-motor (SM) regions to high-order default mode network (DMN)7,10. This traversing of the cortical hierarchy has been compared to cross-layer error back-propagation required for optimizing artificial neural networks11,12, raising the prospect that these waves may play a physiological role in learning and memory consolidation.
Analogous global brain dynamics have been observed at the single neuron level in the mouse. These widespread fluctuations in neuronal firing are often coupled to spontaneous behaviors such as pupil dilation and locomotion, and show distinct delays between two major clusters of neurons13,14. More recently, these dynamics were shown to manifest as massive spiking cascades involving ~70% of recorded neurons across the forebrain and playing out over several seconds during stationary rest15. During both spontaneous activity and periods of visual stimulation, spiking cascades were coordinated in time with hippocampal sharp-wave ripples (SPW-Rs), a neurophysiological event known to be involved in memory functions16. In the case of visual stimulation, each cascade cycle involved a transition from a phase of high-efficiency sensory encoding to a phase of lower encoding during heightened SPW-Rs17. These transitions appear to reflect switches between exteroceptive sensory sampling and internal mnemonic processes over the timescale of seconds. This type of continuous alternations between external and internal processing modes has been observed across various brain systems, particularly the hippocampus, at multiple temporal scales18,19, and is hypothesized to play a critical role in learning20.
One conceptually appealing possibility is that the fMRI waves in the human brain reflect the same or homologous neurophysiological processes as the spiking cascades in the mouse brain. Indeed, they share common features. For example, both phenomena manifest as quasi-periodic events that occur over seconds time scales, affect global forebrain activity, and couple with measurable arousal fluctuations15,21, as evidenced by their correlation with pupil size10,22,23,24. In the absence of external stimulation, fMRI waves in the human brain propagate between two sets of brain networks showing opposite responses to cognitive tasks25,26,27. Similarly, the spiking cascade sequences in mice involve the interplay between two groups of neurons with opposite activity modulations during locomotion15. The hippocampus appears to be central to this large-scale coordination of activity. When hippocampal SPW-Rs were measured together with concurrent fMRI in the monkey, they were synchronized with fMRI changes across the brain28,29. Analysis of this synchronization revealed distinct delays between sensory/motor and higher-order regions that suggest cross-hierarchy propagation28,29. Nevertheless, it remains unknown whether propagating fMRI events are the macroscopic counterpart of neural firing cascades. More importantly, it is also unclear whether fMRI waves, like neural firing cascades, continue to occur during stimulation and relate to sensory and memory functions during wakefulness.
In the present study, we analyzed multiple human fMRI and mouse neuronal recording datasets to address this topic. We found that, like the spiking cascades in the mouse, the propagating fMRI waves in the human brain persisted during the performance of a visual memory task. The fMRI wave cycle was also similarly marked by alternating phases of stronger sensory information encoding and greater efficiency in memory retrieval. The cascades and fMRI waves were similarly synchronized to pupil dilations in humans and mice, suggesting a shared neuromodulatory basis. These findings thus demonstrate that similar, internally generated physiological cycles are coordinated with switches between exteroceptive sensory sampling and internal mnemonic processes in the human and mouse brain, suggesting an evolutionarily conserved principle of mammalian forebrain function.
Results
Fluctuating arousal entrains brain-wide events across the mouse and human forebrain
Pupil diameter is a surrogate signal for fluctuating arousal that is readily measured in both human and mouse subjects during rest30,31. In both species, we observed that the dynamic changes in pupil diameter were matched to the occurrence of brain-wide events, thus providing a means to compare spiking cascades and fMRI waves.
In mice, pupil size fluctuations, indicative of changes in the arousal state, were prominent during periods of immobility, with or without visual stimulation, as evidenced by data from the Allen Institute Visual Coding project32 (Fig. 1A, B). Across the brain, we found that pupil dilations coincide with periods of widespread spiking events, in which neurons fired sequentially in reproducible patterns (Fig. 1B, C, red arrows). Similar dynamics were derived previously without using pupil data and described as brain-wide spiking cascades15. We repeated the same analysis on a two-photon calcium imaging dataset33 and another large-scale Neuropixel dataset with broader coverage of the mouse brain34, revealing that these pupil-associated cascades were distributed broadly across the brain and involve multiple neuron subtypes (Figs. S1 and S2).
A Locations of neuropixel probes from all mice with major recording sites color-coded: visual cortex (blue), hippocampus (green), and thalamus (pink). Top: 3D illustration. Bottom: 2D projection on a middle brain slice. B Example pupil and spiking data from a representative mouse during stationary visual stimulation. Top: spontaneous fluctuation of pupil diameter with alternating dilation (red) and constriction (blue) phases, with dilation onsets marked by red dashed lines and triangles. Bottom: normalized spiking activity of recorded neurons sorted by the principal delay profile, revealing spiking cascades of sequential activations from negative-delay neurons (blue-symbolic-neurons) to positive-delay neurons (red-symbolic-neurons). C The normalized pupil diameter (top, mean ± SEM) and spiking activity (bottom) from this representative mouse were averaged based on pupil dilation onsets over an 8-s time window. D Schematic of a resting-state fMRI scan from HCP 7-Tesla dataset. E Example pupil and fMRI data from a representative human HCP subject at rest. Top: pupil diameter fluctuations. Bottom: concurrent fMRI signals of various brain regions sorted by principal gradient values36. F The normalized pupil diameter (top, mean ± SEM) and fMRI signals (bottom) were averaged at pupil dilation onsets over an 18-s time window, summarized from all 184 subjects. G The pupil-dilation-associated fMRI changes mapped onto a brain surface, shown at 7 time lags (0–12 s post-dilation onset). Rows display cortical surface maps (first two rows) and thalamic volume maps (third row). Directional arrows denote dorsal (D), anterior (A), and anatomical left (L) directions.
In humans at rest, we similarly found that pupil size changes are associated with highlystructured fMRI changes across the brain (Fig. 1D, E). For this analysis, we used the Human Connectome Project (HCP) 7T dataset35. Alignment of fMRI time courses across the cortex to the onset of pupil dilation revealed a spatiotemporal sequence progressing along a principal gradient (PG) direction (Fig. 1F and Fig. S3; Supplementary Movie 1), which approximates the cortical hierarchy gradient36. These events manifested as infra-slow (multi-second) waves moving gradually from SM to DMN regions. These SM-to-DMN waves have been derived previously without using pupil data7,10. The cortical changes were accompanied by corresponding thalamic progression from posterior/lateral sensorimotor nuclei to anterior/medial limbic nuclei (Fig. 1G and Fig. S3F). Therefore, spontaneous pupil dilation events during immobile rest are synchronized with sequential brain dynamics of global involvement, observed as spiking cascades in mice and propagating fMRI waves in humans. Simultaneous EEG-fMRI recordings also revealed associated changes in delta-band (1–4 Hz) activity across the fMRI wave cycle, similar to those observed across the cascade cycle (Fig. S4). The duration of the waves appears to be constant (~11 s) and independent of brain size (Fig. S5).
While the function of these brain-wide events is poorly understood, evidence in mice suggests that the spiking cascades mediate alternation between periods of external stimulus encoding and internal memory processing17. Might the fMRI waves in humans likewise be linked to the switch between exteroceptive and interoceptive modes of brain function? To address this question, we investigated the occurrence of spontaneous propagating fMRI waves as human subjects performed a cognitive task involving memory. Specifically, we assessed whether the sensory encoding of stimuli and successful memory retrieval performance varied as a function of these spontaneous events.
Visual stimulus encoding predicts subsequent memory function
To systematically investigate the role of propagating SM-to-DMN waves in human cognition, we first needed to establish a reliable method for evaluating the encoding of visual stimuli from fMRI responses across the brain. We developed a method to do this using the Natural Scenes Dataset (NSD)37, in which a series of 10,000 captioned natural images were shown, in the form of 4-s trials, to each of 8 subjects with each image being presented three times over 40 scan sessions on different days. For each trial, the subjects needed to indicate whether they had seen the stimulus before (Fig. 2E).
A Framework for training and evaluating a CLIP-based semantic decoder. An fMRI encoder was trained to map stimulus-evoked fMRI responses into CLIP embedding space via contrastive learning. For evaluation, the trained fMRI encoder encodes fMRI responses to fMRI embeddings in CLIP space, which are decoded by pre-trained caption decoder38 to generate text descriptions, which were then compared to ground truth captions for evaluation. B Representational dissimilarity matrices based on cosine similarity for semantic CLIP embeddings (left) and fMRI embeddings (right) showed highly similar (p = 0; the permutation test) structures. C Visualization of trained fMRI embeddings using t-SNE reveals categorical distinctions, with categories color-coded. D Examples of correctly and incorrectly decoded samples, from top to bottom: image stimuli IDs in the NSD dataset, ground truth captions, and captions decoded from fMRI response. E Schematic of task design in the NSD dataset. Eight participants viewed 10,000 unique images, each presented three times over 30–40 scans over a year, using a 4-s event-related design. F Box plot comparing the encoding accuracy, i.e., the proportion of correctly decoded samples, between the trained and untrained fMRI encoders (p = 2.8 × 10−7, N = 8, two-sided paired t-test). Dots represent participants, and the dotted line indicates 5% chance level. G Influence of encoding accuracy on subsequent memory task performance. Stimuli are binned by initial encoding accuracy (20% increments). Memory performance at the second presentation showed significant differences across bins (p = 2.8 × 10−7, one-way ANOVA, N = 195,000 trials, 5 groups). Error bars represent the standard error of the mean (SEM). H Box plot showing significant difference (p = 0.004, N = 8, two-sided one-sample t-test) in subsequent memory performance between trials with correct and incorrect initial encoding. Dots represent participants. Box plots show the interquartile range (boxes), minima and maxima (whiskers), and the median (center line). Source data are provided as a Source Data file.
To quantify the level of sensory stimulus encoding, we developed a deep learning model to decode semantic information of each image stimulus based on fMRI responses it evoked (Fig. 2A and Fig. S6A). The model comprised an fMRI encoder, which extracted latent representations from the fMRI responses, i.e., the fMRI embeddings, and a caption decoder38, which translated the fMRI embeddings into descriptive text captions. The fMRI encoder was trained to align the fMRI embeddings with the contrastive language-image pre-training (CLIP) embedding space39 through contrastive learning40. We then derived a composite semantic similarity (CSS) score to quantify the semantic similarity between the fMRI-decoded caption and the original caption. An image trial is considered correctly decoded if the generated caption shows significant (p < 0.05) similarity compared to a null distribution created from randomly sampled captions (see “Methods” for details). The performance of the decoding model thus serves as a measure of the accuracy of brain encoding of semantic information.
Our deep learning model successfully decoded the semantic information associated with the visual stimuli based on the fMRI responses they evoked. The representational similarity analysis confirmed the alignment between fMRI and caption embeddings (Fig. 2B, Fig. S6B, C), and the fMRI embeddings after training are organized into distinct categories in a low-dimensional space (Fig. 2C)41. The trained model generated correctly decoded captions — those significantly similar to the ground truth captions — in 33.0 ± 4.2% (mean ± SD) of trials (see Fig. 2D and Fig. S7A for examples), compared to the 5% chance-level performance of the untrained model (Fig. 2F, p = 2.8 × 10−7; and Fig. S7B). In the context of the cognitive task, the semantic encoding accuracy reliably predicted subsequent memory performance: a higher CSS score at the first appearance of an image stimulus led to a higher rate of correct recall at its second repeat (Fig. 2G and Fig. S8). This relationship held true across subjects, with subsequent memory performance significantly better for trials with correct initial encoding compared to those without (Fig. 2H). Using this trained model, next evaluated whether the occurrence of spontaneous propagating fMRI waves might bear on the quality of stimulus encoding, subsequent memory recall, or both.
Alternating stimulus encoding versus memory recall during propagating fMRI waves
To explore the potential relevance of propagating fMRI waves for encoding and memory performance, we first established their presence during the cognitive task. These waves were identified directly from task fMRI data without using pupil data7 (Fig. 3A). As in the resting state, propagating waves remained closely tied to pupil diameter fluctuations, which were affected to a lesser extent by task events (Fig. S9A). The duration of the SM-to-DMN waves (~10–15 s) is much longer than the task trials (4 s), and their occurrence and relationship to pupil fluctuations are dissociated from the structure of the concurrent cognitive task (Fig. S9B).
A Detection of SM-to-DMN propagating waves during the task. The waves were detected using template-matching methods7, with the principal gradient (PG) map as the template. Six examples of detected waves, with similarity values showing correlations between their delay profile and the PG template. B Schematics of memory encoding and recall measurements. C Opposite modulations of semantic/memory encoding and memory recall over the cycle of the SM-to-DMN waves. The averaged pattern of the detected SM-to-DMN waves (first row) was shifted 6 s backward in time to account for the hemodynamic response delay. Time zero marks the onset of the global mean signal increase (dashed line), which appears to correspond to the cascade center (D), as indicated by the timing of the upswing in semantic and visual encoding accuracies for human and mouse data. Both pupil size (second row) and memory recall (green, bottom row) change significantly across the wave cycle and peak at the DMN-activated phase, whereas the semantic encoding (third row) and memory encoding (purple, bottom row) are modulated in an opposite manner. Shadows represent areas within one SEM (N = 8 subjects). D Opposite modulations of visual encoding accuracy (third row) and hippocampal SPW-R rate (bottom row) across the spiking cascade cycle (first row) during stationary periods of continuous natural image stimulation in mice, adapted from ref. 17. Time zero (dashed line) marks the onset of positive-delay neuron firing. Shadows represent areas within one SEM (N = 20 mice). Source data are provided as a Source Data file.
To evaluate the potential influence of these propagating fMRI waves on sensory encoding efficiency, we used the deep learning-based fMRI decoding method described above to assess the quality of encoding for each stimulus presentation. We found that the accuracy of such encoding varied systematically across the SM-to-DMN propagation cycle (Fig. 3C, red trace). Accounting for hemodynamic delays (see “Methods”), the stimulus encoding was strongest at the SM-activated phase of the propagating wave, showing a significant increase relative to the DMN-activated phase (p = 9.8 × 10−5, N = 8, two-sided paired t-test). Similar results were reproducible across various semantic similarity metrics (Fig. S10A).
We also used human memory performance to assess how fMRI waves affected both stimulus encoding and memory recall. For encoding, accurate memory of a given stimulus during its second appearance (Repeat #2 in Fig. 3B) was taken as a measure of strong encoding during the initial presentation (Repeat #1 in Fig. 3B), during which the wave dynamics were examined. We found that the strength of encoding (i.e., subsequent memory) was closely related to the phase of the SM-to-DMN wave at the time of stimulus presentation. Stimuli presented during the SM-activated phase had a higher efficacy of encoding than those presented during the DMN-activated phase (p = 2.2 × 10−3, N = 8, two-sided paired t-test; Fig. 3C, purple trace), thus matching the fMRI deep learning-based measure of stimulus encoding described just above (Fig. 3C, red trace). By contrast, evaluation of recall performance, which was estimated during the 2nd and 3rd presentations of a stimulus along with fMRI wave dynamics, revealed a significant peak performance later in the wave cycle, when the subject’s recall coincided with the DMN-activated phase (p = 1.2 × 10−3, N = 8, two-sided paired t-test; Fig. 3C, green trace). These results were similar for both short-term (i.e., within the same day) and long-term (i.e., across days) memory types (Fig. S10B and S10C).
The modulation of stimulus encoding and memory recall was further supported by subcortical changes during propagating fMRI waves. The hippocampus was activated at the late wave phase but its activation level was only weakly associated with memory recall performance (Fig. S11C, D). Moreover, anterior and medial thalamic regions, which are associated with memory and limbic functions42,43, are activated more at the late phase, whereas posterior thalamic regions most involved in sensory processing are activated earlier in the wave cycle (Fig. S11E).
These cyclic modulations of stimulus encoding and memory recall in humans resembled analogous observations in mice during different phases of the spiking cascades17 (Fig. 3D). Specifically, the SM-activated phase of the fMRI wave matched a period within the cascade cycle (0–0.5 s) of improved stimulus encoding, whereas the DMN-activated phase aligned with a different period within the cascade cycle (0.5–2 s) of increasing hippocampal SWP-R rate, which was also associated with pupil dilation (Fig. 3C, D). While the hippocampal SWP-R rate and memory performance are clearly different measures, they may point to similar processes that transpire during more introspective modes of brain activity, commonly associated with activity of the DMN44,45.
Visual semantic information coding in multiple brain regions is similarly modulated by the SM-to-DMN wave cycle
Repeating the semantic decoding using only regional fMRI data suggested that the semantic information was encoded across a wide range of brain regions, with the highest encoding accuracy observed in the visual cortex (Fig. 4A, B). Importantly, the fMRI encoding accuracy in all individual regions was systematically modulated over the SM-to-DMN wave cycle (Fig. 4C and Fig. S12) in a manner similar to the whole-brain finding (Fig. 3C). Interestingly, the DMN, particularly its C division that encompasses the hippocampal complex and is adjacent to visual association areas, exhibited its peak encoding accuracy at the SM-activated phase, while its activation was relatively low, suggesting a dissociation between sensory encoding and regional activation level. These region-specific results on visual semantic encoding are consistent with those on cascade-dependent visual encoding17, further suggesting that the spiking cascades and cross-hierarchy waves represent the same underlying neurophysiological process present in mice and humans, and possibly conserved across mammals more broadly.
A Cortical surface map showing the regional significance (paired t-tests: trained vs. untrained) of semantic encoding accuracy. B A box plot showing the semantic encoding accuracy estimated for different brain regions defined in the Yeo-17 networks atlas101,113. Each dot represents an individual participant. The significance of encoding accuracy was assessed by comparing the trained model to untrained model, similar to the analysis in Fig. 2F. Asterisks denote the levels of statistical significance (two-sided paired t-test, N = 8): *, 0.01 < p < 0.05; **, 0.001 < p < 0.01; ***, p < 0.001. Exact p values are listed in Table S3. Box plots show the interquartile range (boxes), minima and maxima (whiskers), and the median (center line). C Semantic encoding accuracy (solid line) is consistently modulated over the SM-to-DMN wave cycle in six brain regions exhibiting the most significant encoding accuracy. There is no significant variation in the phase of peak encoding accuracy among these regions (p = 0.64, one way ANOVA, N = 48, 6 groups). For comparison, the average activation for each region is marked with a dashed line, which is shifted ahead of time by 6 s to account for the hemodynamic response delay. These regions are distinctly color-coded and their locations are indicated on a flattened cortical surface. Time series data are provided as the mean ± SEM for eight participants (N = 8). Source data are provided as a Source Data file.
Discussion
Here we showed that infra-slow (<0.1 Hz) fMRI waves propagating over the human cortical surface are associated with reciprocal modulation of encoding and retrieval of information conferred by visual stimuli, resembling those previously tied to spiking cascades in mice. First, we analyzed electrophysiological and fMRI measures of brain activity, demonstrating that spontaneous pupil dilations are similarly accompanied by spiking cascade dynamics in mice and SM-to-DMN propagating waves in humans, thereby unifying these two types of infra-slow (<0.1 Hz) global brain activity across different spatial scales and species. We then assessed the semantic encoding of visual stimuli using a CLIP-based deep learning model, finding that the SM-to-DMN propagating waves continued to occur during task performance and were associated with opposing modulation in both encoding and retrieval of the stimulus content. The encoding of semantic information and memory peaked at the early phase of SM-activation, whereas memory retrieval accuracy reached the maximum at the DMN-activated phase. Together with previous findings in mice, these results suggest that the highly structured infra-slow global brain activity may serve as an evolutionarily conserved mechanism by which the brain orchestrates the execution of exteroceptive sensory sampling and internal mnemonic processes on the timescale of seconds.
The brain’s response to identical sensory stimuli is known to vary over time even on the timescale of seconds. Previous studies have shown how pre-stimulus ongoing activity and arousal state may contribute to this variability13,31,46,47,48,49,50,51,52. Our findings align with and extend these previous reports. Leveraging recent advances in deep learning techniques, our study goes beyond a simple quantification of response amplitude2,49,50 and assesses the accuracy of the brain’s encoding of semantic information. Importantly, most previous studies have presumed that ongoing brain activity and changes in arousal occur spontaneously and randomly. As a result, much focus has been on the response modulation of ongoing activity that is temporally locked (prior to the stimulus) and spatially restricted (confined to the same local brain region). In contrast, we consider the effects of internal fluctuation in the context of highly structured brain dynamics (i.e., the spiking cascade or propagating wave) involving the large-scale coordination of activity. The initiation of these recurring global brain events is independent of visual stimulation and memory tasks, and they are associated with sensory processing quasi-periodically in a continuous and persistent way.
We further found that memory retrieval was modulated over the SM-to-DMN wave cycle in a manner opposite to that of stimulus encoding, matching our previously observed reciprocal modulation of hippocampal SPW-R rate and visual encoding17. This previous study did not, however, identify specific memory functions or other cognitive operations associated with SPW-Rs during the task, since SPW-Rs are usually observed during rest and sleep and often linked to offline memory consolidation29,53,54. By comparing our human study results with these prior findings in mice, we found a correspondence between the cascade phase of high SPW-R rate and the wave phase of fMRI DMN activation, which is associated with a better performance in memory retrieval. This observation largely agrees with a series of recent studies on different species that linked SPW-Rs during tasks to memory retrieval55,56,57, as well as the marked fMRI DMN activations44,45,58,59.
Notably, we applied a global 6-s shift to the fMRI BOLD signals to infer underlying network activity while accounting for hemodynamic delays (Figs. 3C and 4C). A 1–2 s deviation of this modeled delay from the true value would be minimal relative to the duration of fMRI waves and, therefore, is not expected to affect our interpretation of their temporal relationship with modulations in sensory and memory functions. Without this time shift, however, the periods of stronger semantic encoding and poorer memory retrieval would correspond to the baseline preceding the SM-to-DMN waves, rather than to the SM-activated phase.
The observed modulation of sensory and memory functions over the cascade/wave cycle may be associated with changes in the direction of information flow, particularly between the cortex and hippocampus. Memory retrieval during tasks and memory consolidation during rest and sleep likely require information flow from the hippocampus to the cortex, whereas the encoding of sensory information and memory would be facilitated by reversed flow29,60,61. Thus, the cascade/wave phases optimized for sensory encoding and memory retrieval may be dominated by opposite directions of information transmission, which may rely on distinct spatial gradients in activation levels. In fMRI, such activation gradients are obvious for SM-to-DMN waves with dominant SM or DMN activation at different phases. This is less clear for cascades, since the negative- and positive-delay neurons were found in all recorded brain regions15. However, the hippocampal regions, especially CA1 and the dentate gyrus (DG), contain a much higher number of negative-delay neurons compared to any other areas, including all visual areas, whereas the thalamus contains the fewest. The fMRI signals may arise from the summation of activity of these two types of neurons, and the apparent wave propagation may relate to their distinct composition across different brain regions and across cortical hierarchies. Thus, the activation gradient between these two neuronal groups can be translated into spatial gradients between the hippocampus, cortex, and thalamus. We hypothesize that these gradients, alternating on the multi-second timescale, determine the dominant direction of information flow, which itself occurs on much faster (millisecond) timescales. This hypothesis remains to be tested by future studies. It is worth noting that artificial neural networks also feature alternating forward/backward information flows across hierarchical layers during training11,12, which may thus represent a mechanism essential to the learning of all connection-based intelligence systems.
The cascade and wave dynamics reported here might represent a fundamental mechanism by which the brain coordinates the opposing operations of exteroceptive sensory sampling and internal mnemonic processes. The switching between these processes over seconds may establish a dynamic balance essential for optimizing cognitive/mental processes of the same multi-seconds timescale62,63,64,65, likely achieved under states of intermediate arousal66,67. Highly aroused states could break this balance by terminating this infra-slow global dynamic. Locomotion, presumably associated with heightened arousal, has been found to replace cascade dynamics with sustained firing of the positive-delay neurons17 that are expected to enhance sensory and memory encoding but impede memory retrieval68,69. Toward the other end of this spectrum, during drowsiness, the infra-slow global dynamic may prolong the memory consolidation phase whereas hinder encoding. The SM-to-DMN waves have been found to occur more frequently during various sleep stages and be associated with learning-related features (i.e., the rapid eye movements and possibly Ponto-Geniculo-Occipital waves) during rapid eye movement (REM) sleep70. Though not directly focused on the cascade and waves, recent studies convergingly pointed out an essential role of infra-slow neural dynamics in learning and memory. In addition to hippocampal SPW-Rs, infra-slow dynamics have been found to simultaneously coordinate the density of sleep spindles, an electrophysiological feature that has relevance for learning and memory71,72. Importantly, the amplitude of infra-slow dynamics during sleep, defined through spindle density and cardiac rate, is not only correlated with memory performance on the subsequent day73, but optogenetically enhancing it also leads to improved memory74. Similar to the modulation of brain activity during cascades and waves described here, such spindle-based infra-slow dynamic alternates between an offline phase, characterized by higher spindle and hippocampal SPW-Rs rates with low arousal, and an online phase, marked by lower spindle and ripple rates with higher arousal and susceptibility to external stimulation73.
The neural mechanisms underlying these global brain dynamics remain unclear. However, they are unlikely to be mediated purely through corticocortical axon conduction due to the highly mismatched speed. Instead, the neuromodulatory systems, particularly the cholinergic system, may play a crucial role in the general of these global brain dynamics. Consistent with the associated arousal modulation, the fMRI waves are accompanied by subcortical de-activation specifically at the basal forebrain nucleus basalis (NB) and brainstem arousal-relating nuclei, including the locus coeruleus75. Pharmacological deactivation of the NB on one side of monkey brains suppressed the global mean BOLD (gBOLD), whose peaks often represent the SM-to-DMN waves, on the ipsilateral side76. The global brain dynamics might originate from interactions among neurons at this subcortical region, which are then broadcasted to the cortex and other brain regions through widespread cholinergic projections77,78,79 to affect the dynamics of the two distinct groups of neurons, i.e., the negative- and positive-delay neurons. These neuron-level dynamics, i.e., the spiking cascades, are then translated to the apparent fMRI waves across hierarchies due to the systematic modulation of their composition along this same direction. However, the validation of this hypothesis would require future studies with neural recordings of distinct scales.
The SM-to-DMN propagating wave and its effect on sensory and memory functions may offer explanations for some previous task fMRI observations. Graph-theory metrics based on fMRI connectivity/correlations, such as cartography and network flexibility, have been used to quantify brain dynamics and found associations with various cognitive components, particularly learning64,80,81. Most of these metrics focused on assessing the integration and segregation of the large-scale networks, which are expected to be profoundly affected by the presence of the global SM-to-DMN waves. Thus, the waves could be an important contributor to these metrics of network dynamics. Another related phenomenon is the so-called encoding/retrieval flip, in which the de-activation and activation of the posteromedial cortex, a key component of DMN, are preferentially associated with successful memory encoding and retrieval respectively82,83,84. This phenomenon can be explained by our finding that memory encoding and recall were oppositely modulated over the wave cycle with distinct DMN activations. Importantly, the present study expands this early research by incorporating the previous findings into the framework of highly structured cross-hierarchy propagating waves, which persist under various brain conditions beyond tasks.
Finally, the SM-to-DMN waves may also relate to memory dysfunction in Alzheimer’s disease (AD). The gBOLD signal has been repeatedly linked to various AD pathologies85,86,87. The gBOLD peaks (also SM-to-DMN waves21) have been found to be coupled by strong cerebrospinal fluid (CSF) movements, known to be essential for peri-vascular waste clearance88,89,90. The strength of this gBOLD-CSF coupling is indeed associated with the accumulation of amyloid-beta and tau86,87. Particularly, the failure of the SM-to-DMN waves to reach the DMN appeared to account for preferential amyloid-beta accumulation at these higher-order regions during early stages of AD86. The link between the fMRI waves and waste clearance may be at least partly attributed to associated non-neural physiological modulations, particularly infra-slow vasomotion and CSF dynamics21,89,91,92. Besides the toxic protein accumulation, AD also features dysfunctions in memory and subcortical neuromodulatory systems93,94,95,96, both of which are linked to the cascades and global waves10,15. Thus, changes in this infra-slow global dynamic may also relate to the dysfunction of the memory and arousal systems in AD.
Methods
Datasets
Allen Institute Visual Coding Neuropixel dataset
The dataset comprises high-density extracellular neuron recordings of 58 mice (13 females) using Neuropixel probes32. Each mouse was implanted up to six Neuropixel probes, which targeted the primary visual cortex and five high-order visual cortical areas. The silicon probes were inserted to a depth of up to 3.5 mm into the brain, enabling the recording of spiking activity within two visual thalamic nuclei, i.e., the lateral posterior nucleus (LP) and the lateral geniculate nucleus, as well as other regions that the probes traversed, such as the hippocampus.
Allen Institute Visual Coding two-photon calcium imaging dataset
The dataset comprises single-neuron recordings from the mouse visual cortex, obtained through 2-photon fluorescence imaging. Utilizing transgenic tools, these recordings specifically targeted the activities of distinct populations of Cre-defined neurons. The dataset includes a total of 63,251 neurons from 14 different transgenic lines, covering 6 cortical areas and 4 cortical layers. For our study, we focused on two Cre lines (Cux2-CreERT2 and Emx1-IRES-Cre) that had the highest number of neurons recorded. Further details can be found in ref. 33.
UCL mice neuropixel recording dataset
The UCL mice dataset includes recordings from ~30,000 neurons across 43 brain regions in mice, utilizing Neuropixels probes to cover the entire left hemisphere. In each mouse, two or three probes were inserted simultaneously, allowing for the concurrent recording of hundreds of neurons during each session. The study comprised 92 probe insertions across 39 sessions from 10 mice (6 females), with an average of ~747 neurons recorded per session. Additional details are available in ref. 34.
Human Connectome Project (HCP)
We utilized the WU-UMinn HCP 7T dataset, a subset of the HCP S1200 release35, comprising 7T fMRI data from 184 subjects (112 females) within the age range of 22–35. Our analysis focused on two 15-min, eyes-open resting-state fMRI sessions, with repetition time (TR) of 1 s and 1.6 mm isotropic voxels. Simultaneous eye tracking was conducted using an EyeLink device with a sampling rate of 1000 Hz. The resting-state HCP fMRI volumetric data were preprocessed using the minimal preprocessing pipeline97 and artifacts were further removed with ICA-FIX denoising98. fMRI data were then spatially smoothed with 2 mm Full Width at Half Maximum (FWHM) Gaussian kernel and temporal filtering within a bandpass range of 0.001–0.15 Hz. Finally, signal from each voxel was standardized by subtracting the mean and dividing by the standard deviation.
Natural Scenes Dataset (NSD)
The NSD includes whole-brain 7T fMRI scans with a repetition time (TR) of 1.6 s and 1.8 mm isotropic voxels, conducted during a visual memory task37. This dataset involved eight human participants (6 females) who were shown between 9000 and 10,000 natural scene images. Each image was presented three times, resulting in a total of 22,000 to 30,000 trials across a span of 1 year. For every trial, as participants viewed an image stimulus, they were required to report whether they perceived the image as novel. The NSD fMRI data underwent initial preprocessing to correct for head motion, EPI distortion, gradient nonlinearities, and alignment across scan sessions. Analyses of fMRI data for each subject were performed in the subject-native space. For the fMRI time series analysis, i.e., propagation, nuisance parameters such as motion, white matter, and CSF signals were regressed out from the volumetric fMRI data, which were then smoothed spatially with 2 mm FWHM Gaussian kernel and temporally with bandpass filtering of 0.001–0.15 Hz. Each voxel’s signal was then standardized to have zero mean and unit standard deviation. For analyzing the hemodynamic response of single trials (inputs of the decoding models), GLMdenoise—a generalized linear model (GLM) approach was used to provide estimates of the BOLD amplitude while reducing noise by integrating nuisance regressor37,99.
Simultaneous EEG-fMRI resting-state dataset
Resting-state fMRI data were collected simultaneously with electroencephalography (EEG) for 27 subjects (14 females, average age: 22.1\(\pm\)3.1 years). For our analysis, we only utilized the 10-min resting-state scan. The fMRI imaging data were acquired using a 3 T scanner, with a repetition time (TR) of 2.1 s and 3 mm isotropic voxels. The EEG data were gathered using a 32-channel MR-compatible EEG system, with a recording sampling rate of 5000 Hz. Additional details can be found in ref. 21. The resting-state fMRI BOLD data was preprocessed using script from the 1000 Functional Connectomes Project with slight modification100. Nuisance parameters, including linear and quadratic trends, motion parameters, white matter, and CSF signals, were regressed from the fMRI data. The volumetric data then was smoothed spatially with 2 mm FWHM Gaussian kernel, and temporally filtered with a bandpass range of 0.001–0.15 Hz.
For every fMRI dataset analyzed, we utilized the Schaefer 400 Parcellations (Yeo-17 Network version)101 to obtain cortical signals from the volumetric fMRI data. This was accomplished by averaging the standardized voxel signals located within each parcel. Similarly, signals from thalamic nuclei/regions were extracted utilizing the Morel Atlas102, and signals from the brainstem nuclei, part of the ascending arousal network, using the Harvard AAN atlas103.
Informed written consent was obtained from all participants in the human EEG-fMRI dataset. Data analysis adhered to usage agreements, human data collection was conducted in compliance with protocols approved by the Institutional Review Board at Pennsylvania State University (protocol numbers: STUDY00005969 and STUDY00015305).
Pupil size
Pupil areas recorded for humans were obtained using the EyeLink device, with raw data sampled at 2000 Hz for the NSD dataset and 1000 Hz for the HCP dataset. We converted the pupil area into pupil diameters and then down sampled to 50 Hz. Missing pupil data, resulting from false detections or eye blinks, were interpolated using data from the nearest time points. Subsequently, we synchronized the pupil data with each TR of the concurrent fMRI signal. The periods with significant missing data (more than 50%) were removed from subsequent analyses.
Pupil areas recorded for mice were captured using cameras, with a sampling rate of 100 Hz for the UCL Neuropixel dataset and 30 Hz for both the Allen Institute Neuropixel and two-photon datasets. We converted the pupil area into diameters and then resampled the data to a uniform rate of 30 Hz. Missing pupil data were then interpolated using the nearest time points to ensure continuity in the dataset.
To identify dilation events within infra-slow pupil fluctuations, we utilized a low-pass filter on the pupil size time series, setting a cut-off frequency at 0.15 Hz for human datasets and 0.3 Hz for mouse datasets. Pupil dilation events were defined as the periods where dilation lasted for at least 1 s in the filtered pupil size data.
EEG analysis
The EEG data were preprocessed to remove the gradient and ballistocardiogram artifacts from each channel, utilizing algorithms detailed in ref. 104. Following this, the data were subjected to low-pass filtering with a cut-off frequency of 125 Hz. Pulse artifacts were removed through independent component analysis (ICA), and signals were corrected for distortions to account for distortions attributable to head motion105. More comprehensive description of the EEG signal preprocessing can be found in ref. 21.
Delta-band power for each channel was computed by first applying a band-pass filter within the 1–4 Hz range and then calculating the amplitude of the Hilbert-transformed signal. Then the power for each channel is individually normalized by subtracting its mean and dividing by its standard deviation. The averaged delta power is obtained by taking the mean across all the recording channels.
Local Field Potentials (LFPs) analysis
Delta-band Power: Delta power was computed for LFPs across all recorded channels. To calculate delta power, a band-pass filter (1–4 Hz) was applied to the LFP signal of each channel, followed by rectification and lowpass filtering (<0.72 Hz, corresponding to \(\pi\) cycles of the mean band-pass frequencies).
Hippocampal Sharp Wave Ripples (SWRs): Hippocampal sharp wave ripples (SWRs) are brief, high-frequency oscillations (110-200 Hz) that can be observed in the LFP recorded from hippocampal recording sites. For ripple detection in this study, we employed an offline method15,106 utilizing the LFP signal (1250 Hz) captured from the hippocampal CA1 region. Ripple events was detected individually for each CA1 recording site (channel), resulting in extensively overlapping ripple detection across the channels. Detected ripple events were considered valid only if they were identified in over 40% of the CA1 channels.
Semantic decoding model
The semantic decoding model is designed to evaluate the semantic information contained in the stimuli-evoked BOLD responses, generating text captions that describe the image stimuli presented to the subject. The decoding model consists of two main components: an fMRI encoder and a caption decoder (Fig. 2A).
The fMRI encoder is used to extract latent representation from the BOLD response. The detailed architecture of the fMRI encoder, as shown in Fig. S6A, utilizing convolutional layers and residual connections, aims to transform the high-dimensional BOLD response into 512-dimensional fMRI embeddings. To address the challenge of fMRI data scarcity, the fMRI encoder was trained to align the fMRI embeddings with the CLIP embedding space which has been extensively pre-trained using 400 million (image, text) pairs, thus offering a rich, 512-dimensional target. Therefore, we trained the fMRI encoder in contrastive learning paradigm39, aiming to maximize the alignment between the fMRI embeddings and the corresponding CLIP text embeddings, while minimizing the alignment with mismatched pairing. To achieve this, we use the contrastive training loss with loss function defined for \({i}^{{\mbox{th}}}\) fMRI embedding \({{{{\bf{Z}}}}}_{i}\) and \({j}^{{\mbox{th}}}\) CLIP text embedding \({{{{\bf{T}}}}}_{j}\) within a batch B as:
Here, \(\tau\) represents the temperature, a hyperparameter hyperparameter, and \(\cos \left(\cdot,\cdot \right)\) computes vector similarity. In addition, we also maximize the alignment between fMRI embeddings and CLIP embeddings by incorporating the cosine loss defined as
Therefore, the total loss is defined as the summation of alignment loss and contrastive loss:
where \({\lambda }_{1}\) and \({\lambda }_{2}\) are tuning hyperparameters. In our training regimen, we set \({{{{\rm{\lambda }}}}}_{1}=0.35\), \({{{{\rm{\lambda }}}}}_{2}=0.65\) and \({{{\rm{\tau }}}}=0.45\). We also incorporate dropout before the final layers, with a dropout ratio of 0.3.
We utilized a pre-trained caption decoder, DeCap38, to generate captions from fMRI embeddings. DeCap was initially trained to produce captions using CLIP text embeddings based on large text corpus. Since the fMRI embeddings were aligned with the CLIP space through the fMRI encoder, we directly employed DeCap to decode these fMRI embeddings, for the generation of captions that describe the content of image stimuli shown to the subjects.
In our study, unless otherwise specified, we use the decoding model to decode the response of voxels within the “nsdgeneral” ROI which includes occipital regions that are generally responsive in the NSD experiment37. For region-wise decoding analysis as shown in Fig. 4 and Fig. S12, we adopted the regions of interest (ROIs) as delineated by the Yeo-17 network101. For each specific ROI, only the voxels that are defined by the corresponding ROI mask are considered. The responses from these selected voxels are then utilized as inputs to the decoding model to generate captions.
Representation similarity analysis (RSA)
We analyzed the alignment between CLIP text embeddings and fMRI embeddings through RAS. For each validation fold, we constructed representation dissimilarity matrices (RDMs) for both the fMRI and CLIP text embeddings. To facilitate visualization and interpretation of the RDMs’ structure, we employed t-SNE techniques41 to project CLIP text embeddings into a 2-dimensional space. Subsequently, we applied k-means clustering to group stimuli with minimal Euclidean distance in this 2-dimensional representation into the same cluster. The RDMs for both fMRI and CLIP text embeddings, as shown in Fig. 2A, are organized and sorted according to this clustering scheme.
The similarity between between the fMRI and CLIP text RDMs is quantified using Pearson’s correlation coefficient. To evaluate the statistical significance of this similarity, we followed the permutation testing approach described by Kriegeskorte et al.107. Specifically, we generated a null distribution by randomly permuting stimulus labels—thereby shuffling the rows and columns of one of the RDMs—and calculating the correlation between the permuted and original matrices. This process was repeated 500 times to approximate the null hypothesis that the two RDMs are unrelated. The p value was then estimated by comparing the observed correlation to this null distribution.
Semantic similarity metrics
To evaluate the fidelity of generated captions to their semantic content, we computed semantic similarity score between the predicted captions and ground truth captions labeled by human. This evaluation employs several established metrics widely used in computer vision and natural language processing research: Bleu_1, Bleu_2108, CIDEr109, METEOR110, and ROUGE_L111, with each of these metrics offering a different perspective on the semantic alignment between generated and ground truth text. Given the limitation inherent to each metric, we additionally defined a composite semantic metric (CSS) derived by averaging the scores from the aforementioned metrics, thereby providing a more holistic evaluation of caption fidelity.
To assess the statistical significance of the model-generated captions, for each trial/image, the semantic similarity score, for example, CSS score, is compared against a null distribution. This null distribution is constructed from the CSS scores between the ground truth caption and 1000 randomly generated captions, which are generated by randomly sampling from the CLIP embedding space39. Notably, this null distribution exhibits variability across different trials, reflecting the diverse complexity levels associated with the semantic content of each image stimulus. Such approach ensures the evaluations accurately reflecting the model’s performance in generating semantically coherent captions and invariant to the varying degrees of semantic complexity present across trials. The same evaluative framework is applied across all aforementioned metrics to assess the accuracy of the generated captions. A trial is considered correctly decoded if the metric score is significant compared to the null distribution, using a significance level of 0.95; otherwise, it is considered incorrectly decoded. Encoding accuracy is quantified as the proportion of trials that are correctly decoded.
Memory encoding and recall
The NSD dataset includes visual memory tasks, where each image stimulus was presented to participants three times. During each presentation, participants are prompted to indicate whether they have previously seen the image. By dissecting the memory task performance, we derived two memory measurements for evaluating the memory function based on participant response: memory encoding and memory recall.
Memory encoding, as a proxy for measuring the sensory coding efficacy, tends to evaluate how effectively the participant encodes a novel image into memory such that the participant can correctly recognize the image as previously seen. We formally define the memory encoding accuracy for each novel image as the memory recognition accuracy at the second presentation of the image. Memory recall, on the other hand, tends to evaluate how effectively the participants is able to accurately retrieve and recognize an image previously seen. For each image, we can derive two recall accuracy, corresponding to the participant’s recognition performance during the second and third presentations of the image, which is not novel to the participant.
Neural population sensory decoding analysis
To assess the sensory information encoded by neuronal populations, we analyzed spiking data obtained during natural scene image stimulation sessions from the Allen Mice Neuropixel dataset. In these sessions, mice were subjected to passive viewing of a sequence of images, each displayed for a duration of 250 milliseconds. Strictly following the approach in ref. 17, we defined the neural code as a population response to each displayed image and employed support vector machines (SVMs) based on neural code to decode which one of the 118 images was viewed by the mouse. The sensory efficacy is quantified by the decoding accuracy of the SVMs.
fMRI infra-slow propagating waves
To detect SM-to-DMN fMRI propagating waves, we adopt a template-matching approach7. The propagations, which typically involve a majority of cortical regions, are assumed entrained within the fMRI global signal fluctuations. Accordingly, we defined set of candidate events based on the low-frequency (<0.15 Hz) components of the global signal, with the event boundaries defined by the adjacent troughs. We derived a delay profile for each candidate event and a candidate event was considered as SM-to-DMN propagation if its delay profile closely matched the principal gradient profile (PG) \({{{{\bf{u}}}}}_{{{{\rm{PG}}}}}\in {{{{\rm{R}}}}}^{{{{\rm{N}}}}}\)36\(.\) The degree of similarity was quantitatively assessed using Pearson’s correlation coefficient, and we considered a candidate event as SM-to-DMN propagation if the correlation was significant (p < 0.001). See Supplementary Methods for further details.
Neural spiking cascades
In line with prior studies15,17, we first computed neural spike rates using 200 ms time bins and identified candidate infra-slow neural events by segmenting the spiking rate data based on the troughs of the filtered global mean spike rate (low-pass, 0.25 Hz). We then, using the delay-profile decomposition method, extract delay profile for each candidate event and the principal delay profile representing the predominant sequential pattern within infra-slow brain activity. A candidate event was considered as a valid cascade event when the principal delay (PD) profile matched with its delay profile with a significant correlation (p < 0.001). Similar to ref. 15,17, we categorized neurons into two distinct groups based on their positioning within the PD profile: positive-delay neurons exhibiting significant PD values and negative-delay neurons characterized by significantly negative PD values (p < 0.001, one-sample t-test). The slow cascade features a sharp increase in the spiking activity of positive-delay neurons in the middle, following the definition in ref. 15, we thus identified and defined these time points as the local peak of the first-order temporal derivative of the mean spiking time course of the positive-delay neurons. See Supplementary Methods for further details.
Modulation across propagating wave cycles
To assess the fluctuation of the trial-based measurements, i.e., semantic encoding accuracy, memory encoding/decoding accuracy across the propagating waves cycle, as shown in Fig. 3C, we constructed time series \({f}_{{\mbox{acc}}}\left(t\right)\) for each measurement during fMRI scan session defined as
where \({{{{\rm{T}}}}}_{{{{\rm{k}}}}}\) is the onset time of stimulus and \({{{{\rm{C}}}}}_{{{{\rm{k}}}}}\) is the accuracy for the k-th trial, and \(\Delta T\) is the time window following the stimulus onset, which was set to 2 s throughout our study. The NaN stands for “Not a Number” and is omitted in the summation in equation.
Therefore, by formulating these trial-based assessments as time series, they are analyzed exactly same to other continuous measurements, such as pupil diameter and delta power. For the analysis of modulation across the propagation wave cycle, we aligned and averaged these time-series metrics relative to the propagation center (defined as the global signal peak) and normalized to the change in percentage relative to baseline, defined from -21s to -11s prior to the global signal peak.
To present the temporal relationship between these trial-based measurements and the neural dynamics underlying fMRI waves, we shifted the fMRI BOLD wave backward in time by 6 s to account for the hemodynamic delay.
Predicting subsequent memory performance with semantic encoding
We conducted two analyses using the NSD dataset to study how initial image presentation encoding accuracy influences subsequent memory task performance. In the first analysis (Fig. 2G), stimuli from all eight participants were grouped into bins based on the percentile of the first encoding accuracy, incremented by 20%, and the memory task accuracy during the second presentation was averaged within each bin. In the second analysis (Fig. 2H), for each participant, trials were grouped based on whether they were correctly decoded. The subsequent memory accuracy for each group was then averaged and compared across groups.
Relationship between propagating waves and brain size
We examined the effect of brain size on SM-to-DMN propagating waves using the HCP-7T dataset. Brain size was estimated based on brain volume (BrainSegNotVent, generated in FreeSurfer112). The duration (segment length) of each propagation event was calculated and averaged for each subject. Propagation speed was determined by dividing brain volume by the average event duration for each subject. Based on brain size, the 184 subjects were divided into three groups: small (lower 1/3), medium (middle 1/3), and large (upper 1/3). Group analyses were then conducted to compare propagation speed and durations across these groups (Fig. S5).
Software and implementation
All code was developed in Python. Deep learning models were implemented using PyTorch (v1.13.1) and dalle-pytorch (v1.6.4). Data processing and analysis were performed with SciPy (v1.10.0), NumPy (v1.24.2), scikit-learn (v1.2.0), and pandas (v1.5.3). Visualization was carried out using Matplotlib (v3.6.3) and Seaborn (v0.13.2). fMRI data were processed and visualized with Nilearn (v0.10.0) and NiBabel (v5.3.2). The CLIP model was implemented based on the open-source repository from OpenAI (https://github.com/openai/CLIP).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
For mice single neuron analysis, we used the Neuropixels Visual Coding Neuropixels and two-photon calcium imaging datasets from the Allen Institute32,33, accessible at https://portal.brain-map.org/overview. For resting-state human fMRI analysis, we used HCP-7T dataset from https://www.humanconnectome.org. We shared our EEG-fMRI dataset at https://openneuro.org/datasets/ds003768. For task human fMRI analysis, we used NSD dataset available at https://naturalscenesdataset.org. Source data are provided with this paper. The processed data in this study is available upon request.
Code availability
The Python code that produced the major results of this paper is available at https://github.com/psu-mcnl/fMRI-Arousal. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.
References
Biswal, B., Zerrin Yetkin, F., Haughton, V. M. & Hyde, J. S. Functional connectivity in the motor cortex of resting human brain using echo-planar mri. Magn. Reson. Med. 34, 537–541 (1995).
Fox, M. D. & Raichle, M. E. Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nat. Rev. Neurosci. 8, 700–711 (2007).
Yu, M., Sporns, O. & Saykin, A. J. The human connectome in Alzheimer disease—relationship to biomarkers and genetics. Nat. Rev. Neurol. 17, 545–563 (2021).
Smith, S. M. et al. A positive-negative mode of population covariation links brain connectivity, demographics and behavior. Nat. Neurosci. 18, 1565–1567 (2015).
Zhang, D. & Raichle, M. E. Disease and the brain’s dark energy. Nat. Rev. Neurol. 6, 15–28 (2010).
Matsui, T., Murakami, T. & Ohki, K. Transient neuronal coactivations embedded in globally propagating waves underlie resting-state functional connectivity. Proc. Natl. Acad. Sci. 113, 6556–6561 (2016).
Gu, Y. et al. Brain activity fluctuations propagate as waves traversing the cortical hierarchy. Cereb. Cortex 31, 3986–4005 (2021).
Yousefi, B., Shin, J., Schumacher, E. H. & Keilholz, S. D. Quasi-periodic patterns of intrinsic brain activity in individuals and their relationship to global signal. Neuroimage 167, 297–308 (2018).
Yousefi, B. & Keilholz, S. Propagating patterns of intrinsic activity along macroscale gradients coordinate functional connections across the whole brain. Neuroimage 231, 117827 (2021).
Raut, R. V. et al. Global waves synchronize the brain’s functional systems with fluctuating arousal. Sci. Adv. 7, eabf2709 (2021).
Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J. & Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci. 21, 335–346 (2020).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Stringer, C. et al. Spontaneous behaviors drive multidimensional, brainwide activity. Science 364, eaav7893 (2019).
Okun, M., Steinmetz, N. A., Lak, A., Dervinis, M. & Harris, K. D. Distinct structure of cortical population activity on fast and infraslow timescales. Cereb. Cortex 29, 2196–2210 (2019).
Liu, X., Leopold, D. A. & Yang, Y. Single-neuron firing cascades underlie global spontaneous brain events. Proc. Natl. Acad. Sci. 118, e2105395118 (2021).
Yang, Y., Leopold, D. A., Duyn, J. H. & Liu, X. Hippocampal replay sequence governed by spontaneous brain-wide dynamics. PNAS Nexus 3, pgae078 (2024).
Yang, Y., Leopold, D. A., Duyn, J. H., Sipe, G. O., & Liu, X. Sensory encoding alternates with hippocampal ripples across cycles of forebrain spiking cascades. Advanced Science, 12, 2406224 (2025).
Hasselmo, M. E. Neuromodulation and cortical function: modeling the physiological basis of behavior. Behav. Brain Res. 67, 1–27 (1995).
Buzsáki, G. Two-stage model of memory trace formation: a role for noisy brain states. Neuroscience 31, 551–570 (1989).
Honey, C. J., Newman, E. L. & Schapiro, A. C. Switching between internal and external modes: a multiscale learning principle. Netw. Neurosci. 1, 339–356 (2017).
Gu, Y. et al. An orderly sequence of autonomic and neural events at transient arousal changes. Neuroimage 264, 119720 (2022).
Yellin, D., Berkovich-Ohana, A. & Malach, R. Coupling between pupil fluctuations and resting-state fMRI uncovers a slow build-up of antagonistic responses in the human cortex. Neuroimage 106, 414–427 (2015).
Bolt, T. et al. Autonomic physiological coupling of the global fMRI signal. Nat Neurosci 28, 1327–1335 (2025)
Podvalny, E., Sanchez-Romero, R. & Cole, M. W. Functionality of arousal-regulating brain circuitry at rest predicts human cognitive abilities. Cereb. Cortex 34, bhae192 (2024).
Fox, M. D. et al. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc. Natl. Acad. Sci. 102, 9673–9678 (2005).
Mitra, A., Snyder, A. Z., Blazey, T. & Raichle, M. E. Lag threads organize the brain’s intrinsic activity. Proc. Natl. Acad. Sci. 112, E2235–E2244 (2015).
Bolt, T. et al. A parsimonious description of global functional brain organization in three spatiotemporal patterns. Nat. Neurosci. 25, 1093–1103 (2022).
Logothetis, N. K. et al. Hippocampal–cortical interaction during periods of subcortical silence. Nature 491, 547–553 (2012).
Buzsáki, G. Hippocampal sharp wave-ripple: a cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073–1188 (2015).
Yoss, R. E., Moyer, N. J. & Hollenhorst, R. W. Pupil size and spontaneous pupillary waves associated with alertness, drowsiness, and sleep. Neurology 20, 545 (1970).
Reimer, J. et al. Pupil fluctuations track fast switching of cortical states during quiet wakefulness. Neuron 84, 355–362 (2014).
Siegle, J. H. et al. Survey of spiking in the mouse visual system reveals functional hierarchy. Nature 592, 86–92 (2021).
de Vries, S. E. J. et al. A large-scale standardized physiological survey reveals functional organization of the mouse visual cortex. Nat. Neurosci. 23, 138–151 (2020).
Steinmetz, N. A., Zatka-Haas, P., Carandini, M. & Harris, K. D. Distributed coding of choice, action and engagement across the mouse brain. Nature 576, 266–273 (2019).
Van Essen, D. C. et al. The human connectome project: a data acquisition perspective. Neuroimage 62, 2222–2231 (2012).
Margulies, D. S. et al. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc. Natl. Acad. Sci. 113, 12574–12579 (2016).
Allen, E. J. et al. A massive 7 T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126 (2022).
Li, W., Zhu, L., Wen, L. & Yang, Y. DeCap: decoding CLIP latents for zero-shot captioning via text-only training. In The Eleventh International Conference on Learning Representations https://openreview.net/forum?id=Lt8bMlhiwx2) (2023).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning Vol. 139 of Proceedings of Machine Learning Research 8748–8763 (eds Meila, M., Zhang, T.) https://proceedings.mlr.press/v139/radford21a.html) (PMLR, 2021).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A Simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research 1597–1607 (eds III, H. D. & Singh, A) https://proceedings.mlr.press/v119/chen20j.html) (PMLR, 2020).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Aggleton, J. P. et al. Hippocampal–anterior thalamic pathways for memory: uncovering a network of direct and indirect actions. Eur. J. Neurosci. 31, 2292–2307 (2010).
Hwang, K., Shine, J. M., Cole, M. W. & Sorenson, E. Thalamocortical contributions to cognitive task activity. Elife 11, e81282 (2022).
Higgins, C. et al. Replay bursts in humans coincide with activation of the default mode and parietal alpha networks. Neuron 109, 882–893.e7 (2021).
Norman, Y., Raccah, O., Liu, S., Parvizi, J. & Malach, R. Hippocampal ripples and their coordinated dialogue with the default mode network during recent and remote recollection. Neuron 109, 2767–2780.e5 (2021).
Arieli, A., Sterkin, A., Grinvald, A. & Aertsen, A. Dynamics of ongoing activity: explanation of the large variability in evoked cortical responses. Science 273, 1868–1871 (1996).
Livingstone, M. S. & Hubel, D. H. Effects of sleep and arousal on the processing of visual information in the cat. Nature 291, 554–561 (1981).
Hasenstaub, A., Sachdev, R. N. S. & McCormick, D. A. State changes rapidly modulate cortical neuronal responsiveness. J. Neurosci. 27, 9607–9622 (2007).
Fox, M. D., Snyder, A. Z., Zacks, J. M. & Raichle, M. E. Coherent spontaneous activity accounts for trial-to-trial variability in human evoked brain responses. Nat. Neurosci. 9, 23–25 (2006).
He, B. J. Spontaneous and task-evoked brain activity negatively interact. J. Neurosci. 33, 4672–4682 (2013).
McGinley, M. J. et al. Waking state: rapid variations modulate neural and behavioral responses. Neuron 87, 1143–1161 (2015).
Chen, W., Park, K., Pan, Y., Koretsky, A. P. & Du, C. Interactions between stimuli-evoked cortical activity and spontaneous low frequency oscillations measured with neuronal calcium. Neuroimage 210, 116554 (2020).
Wilson, M. A. & McNaughton, B. L. Reactivation of hippocampal ensemble memories during sleep. Science 265, 676–679 (1994).
Euston, D. R., Tatsuno, M. & McNaughton, B. L. Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science 318, 1147–1150 (2007).
Norman, Y. et al. Hippocampal sharp-wave ripples linked to visual episodic recollection in humans. Science 365, eaax1030 (2019).
Vaz, A. P., Inati, S. K., Brunel, N. & Zaghloul, K. A. Coupled ripple oscillations between the medial temporal lobe and neocortex retrieve human memory. Science 363, 975–978 (2019).
Sakon, J. J. & Kahana, M. J. Hippocampal ripples signal contextually mediated episodic recall. Proc. Natl. Acad. Sci. 119, e2201657119 (2022).
Karimi Abadchi, J. et al. Spatiotemporal patterns of neocortical activity around hippocampal sharp-wave ripples. Elife 9, e51972 (2020).
Kaplan, R. et al. Hippocampal sharp-wave ripples influence selective activation of the default mode network. Curr. Biol. 26, 686–691 (2016).
Klinzing, J. G., Niethard, N. & Born, J. Mechanisms of systems memory consolidation during sleep. Nat. Neurosci. 22, 1598–1610 (2019).
Dudai, Y., Karni, A. & Born, J. The consolidation and transformation of memory. Neuron 88, 20–32 (2015).
Palva, J. M. & Palva, S. Infra-slow fluctuations in electrophysiological recordings, blood-oxygenation-level-dependent signals, and psychophysical time series. Neuroimage 62, 2201–2211 (2012).
Kucyi, A., Hove, M. J., Esterman, M., Hutchison, R. M. & Valera, E. M. Dynamic brain network correlates of spontaneous fluctuations in attention. Cereb. Cortex 27, 1831–1840 (2017).
Shine, J. M. et al. The dynamics of functional brain networks: integrated network states during cognitive task performance. Neuron 92, 544–554 (2016).
McGinley, M. J., David, S. V. & McCormick, D. A. Cortical membrane potential signature of optimal states for sensory signal detection. Neuron 87, 179–192 (2015).
Bayer, J., Gläscher, J., Finsterbusch, J., Schulte, L. H. & Sommer, T. Linear and inverted U-shaped dose-response functions describe estrogen effects on hippocampal activity in young women. Nat. Commun. 9, 1220 (2018).
Baldi, E. & Bucherelli, C. The inverted u-shaped dose-effect relationships in learning and memory: modulation of arousal and consolidation. Nonlinearity Biol. Toxicol. Med. 3, 9–21 (2005).
Zerbes, G., Kausche, F. M., Müller, J. C., Wiedemann, K. & Schwabe, L. Glucocorticoids, noradrenergic arousal, and the control of memory retrieval. J. Cogn. Neurosci. 31, 288–298 (2019).
de Quervain, D. J.-F., Roozendaal, B. & McGaugh, J. L. Stress and glucocorticoids impair retrieval of long-term spatial memory. Nature 394, 787–790 (1998).
Liu, X. et al. Sleep-stage dependent patterning of slowly propagating brain activity. npj Biol. Timing. Sleep 2, 1 (2025).
Siapas, A. G. & Wilson, M. A. Coordinated interactions between hippocampal ripples and cortical spindles during slow-wave sleep. Neuron 21, 1123–1128 (1998).
Diekelmann, S. & Born, J. The memory function of sleep. Nat. Rev. Neurosci. 11, 114–126 (2010).
Lecci, S. et al. Coordinated infraslow neural and cardiac oscillations mark fragility and offline periods in mammalian sleep. Sci. Adv. 3, e1602026 (2017).
Kjaerby, C. et al. Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat. Neurosci. 25, 1059–1070 (2022).
Liu, X. et al. Subcortical evidence for a contribution of arousal to fMRI studies of brain activity. Nat. Commun. 9, 395 (2018).
Turchi, J. et al. The basal forebrain regulates global resting-state fMRI fluctuations. Neuron 97, 940–952.e4 (2018).
Mesulam, M.-M., Mufson, E. J., Levey, A. I. & Wainer, B. H. Cholinergic innervation of cortex by the basal forebrain: cytochemistry and cortical connections of the septal area, diagonal band nuclei, nucleus basalis (Substantia innominata), and hypothalamus in the rhesus monkey. J. Comp. Neurol. 214, 170–197 (1983).
Pinto, L. et al. Fast modulation of visual perception by basal forebrain cholinergic neurons. Nat. Neurosci. 16, 1857–1863 (2013).
Ballinger, E. C., Ananth, M., Talmage, D. A. & Role, L. W. Basal forebrain cholinergic circuits and signaling in cognition and cognitive decline. Neuron 91, 1199–1218 (2016).
Bassett, D. S. et al. Dynamic reconfiguration of human brain networks during learning. Proc. Natl. Acad. Sci. 108, 7641–7646 (2011).
Braun, U. et al. Dynamic reconfiguration of frontal brain networks during executive cognition in humans. Proc. Natl. Acad. Sci. 112, 11678–11683 (2015).
Huijbers, W. et al. Explaining the encoding/retrieval flip: memory-related deactivations and activations in the posteromedial cortex. Neuropsychologia 50, 3764–3774 (2012).
Daselaar, S. et al. Posterior midline and ventral parietal activity is associated with retrieval success and encoding failure. Front. Hum. Neurosci. 3, 641 (2009).
Vannini, P. et al. What goes down must come up: role of the posteromedial cortices in encoding and retrieval. Cereb. Cortex 21, 22–34 (2011).
Han, F. et al. The A. D. N. Initiative, Reduced coupling between cerebrospinal fluid flow and global brain activity is linked to Alzheimer disease–related pathology. PLoS Biol. 19, e3001233 (2021).
Han, F., Liu, X., Mailman, R. B., Huang, X. & Liu, X. Resting-state global brain activity affects early β-amyloid accumulation in default mode network. Nat. Commun. 14, 7788 (2023).
Han, F. et al. Reduced coupling between cerebrospinal fluid flow and global brain activity is linked to tau pathology. Alzheimer’s. Dement. 19, e075860 (2023).
Iliff, J. J. et al. Cerebral arterial pulsation drives paravascular CSF—interstitial fluid exchange in the murine brain. J. Neurosci. 33, 18190–18199 (2013).
van Veluw, S. J. et al. Vasomotion as a driving force for paravascular clearance in the awake mouse brain. Neuron 105, 549–561.e5 (2020).
Xie, L. et al. Sleep drives metabolite clearance from the adult brain. Science 342, 373–377 (2013).
Broggini, T. et al. Long-wavelength traveling waves of vasomotion modulate the perfusion of cortex. Neuron 112, 2349–2367.e8 (2024).
Liu, X. Decoupling between brain activity and cerebrospinal fluid movement in neurological disorders. J. Magn. Reson. Imaging 60, 1743–1752 (2024).
Whitehouse, P. J. et al. Alzheimer’s disease and senile dementia: loss of neurons in the basal forebrain. Science 215, 1237–1239 (1982).
Bartus, R. T. On neurodegenerative diseases, models, and treatment strategies: lessons learned and lessons forgotten a generation following the cholinergic hypothesis. Exp. Neurol. 163, 495–529 (2000).
Drachman, D. A. & Leavitt, J. Human memory and the cholinergic system: a relationship to aging? Arch. Neurol. 30, 113–121 (1974).
Hasselmo, M. E. The role of acetylcholine in learning and memory. Curr. Opin. Neurobiol. 16, 710–715 (2006).
Glasser, M. F. et al. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80, 105–124 (2013).
Salimi-Khorshidi, G. et al. Automatic denoising of functional MRI data: combining independent component analysis and hierarchical fusion of classifiers. Neuroimage 90, 449–468 (2014).
Kay, K. N., Rokem, A., Winawer, J., Dougherty, R. F. & Wandell, B. A. GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Front. Neurosci. 7, 247 (2013).
Biswal, B. B. et al. Toward discovery science of human brain function. Proc. Natl. Acad. Sci. 107, 4734–4739 (2010).
Schaefer, A. et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).
Krauth, A. et al. A mean three-dimensional atlas of the human thalamus: generation from multiple histological data. Neuroimage 49, 2053–2062 (2010).
Edlow, B. L. et al. Neuroanatomic connectivity of the human ascending arousal system critical to consciousness and its disorders. J. Neuropathol. Exp. Neurol. 71, 531–546 (2012).
Liu, Z., de Zwart, J. A., van Gelderen, P., Kuo, L.-W. & Duyn, J. H. Statistical feature extraction for artifact removal from concurrent fMRI-EEG recordings. Neuroimage 59, 2073–2087 (2012).
Falahpour, M., Chang, C., Wong, C. W. & Liu, T. T. Template-based prediction of vigilance fluctuations in resting-state fMRI. Neuroimage 174, 317–327 (2018).
Stark, E. et al. Pyramidal cell-interneuron interactions underlie hippocampal ripple oscillations. Neuron 83, 467–480 (2014).
Kriegeskorte, N. et al. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60, 1126–1141 (2008).
Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. BLEU: a method for automatic evaluation of machine translation. In Proc. 40th Annual Meeting on Association for Computational Linguistics ACL ’02 311–318 https://doi.org/10.3115/1073083.1073135) (Association for Computational Linguistics, USA, 2002).
Vedantam, R., Zitnick, C. & Parikh, D. CIDEr: consensus-based image description evaluation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4566–4575. https://doi.org/10.1109/CVPR.2015.7299087) (IEEE Computer Society, Los Alamitos, CA, USA, 2015).
Denkowski, M. & Lavie, A. Meteor universal: language specific translation evaluation for any target language. In Proc. Ninth Workshop on Statistical Machine Translation, (eds Bojar, O., Buck, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M. & Specia, L) 376–380. https://aclanthology.org/W14-3348) (Association for Computational Linguistics, Baltimore, Maryland, USA, 2014).
Lin, C.-Y. ROUGE: a package for automatic evaluation of summaries. In Text Summarization Branches Out 74–81. https://aclanthology.org/W04-1013) (Association for Computational Linguistics, Barcelona, Spain, 2004).
Fischl, B. FreeSurfer. Neuroimage 62, 774–781 (2012).
Thomas Yeo, B. T. et al. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106, 1125–1165 (2011).
Acknowledgements
This work was supported by the Brain Initiative award (1RF1MH123247-01 to X.L.), the NIH R01 award (1R01NS113889-01A1 to X.L.), the Intramural Research Program of the National Institute of Mental Health (ZIA-MH002838 to D.A.L.), and the Intramural Research Program of the National Institute of Neurological Disorders and Stroke (ZIA-NS003027 to J.H.D.). The authors of this work recognize the Penn State Institute for Computational and Data Sciences (RRID:SCR_025154) for providing access to computational research infrastructure within the Roar Core Facility (RRID: SCR_026424)].
Author information
Authors and Affiliations
Contributions
Conceptualization: Y.Y., X.L. Methodology: Y.Y., X.L. Investigation: Y.Y., X.L. Supervision: X.L. Writing—original draft: Y.Y., D.A.L., J.H.D., and X.L. Writing—review and editing: Y.Y., J.H.D., and X.L.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Ryan Raut and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yang, Y., Leopold, D.A., Duyn, J.H. et al. Sensory encoding and memory retrieval are coordinated with propagating waves in the human brain. Nat Commun 17, 2343 (2026). https://doi.org/10.1038/s41467-026-69068-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-026-69068-x






