Main

A fundamental challenge in hydrology is to explain and predict water movement through the terrestrial portion of the hydrologic cycle and into the network of streams and lakes1. This knowledge is essential for applications ranging from water resource management to flood forecasting. Every watershed has a unique climatic and physiographic setting; therefore, understanding its hydrologic processes provides the opportunity for a step forward in hydrologic knowledge2. This Analysis builds on such watershed-scale hydrologic knowledge, by synthesizing the behaviour of hundreds of catchments at the global scale, towards a generalization of hydrologic processes3. Synthesis of hydrologic process knowledge helps to explain where and when different processes occur, and how they are controlled by the climate and landscape4,5. Answering this question has the potential to transform our ability to extrapolate process knowledge to data-scarce regions, to simplify predictive models by unifying perspectives from subdisciplines such as engineering hydrology and agricultural hydrology, and to improve water management based on knowledge of water storage and flow6,7.

Geoscientists have recently proposed the concept of Earth’s digital twin that moves towards a solution for comprehensive predictions required to support decisions on water sustainability and reduce the impacts of water-related hazards A digital twin is a dynamic simulation that aims to identically replicate the physical earth and its environments8,9,10. It is a virtual representation of the physical system that combines big data, remote sensing and artificial intelligence with process understanding and numerical modelling, providing scientists and policymakers with perspectives on our changing environment from birds-eye overviews to deep dives11. However, in hydrology, we lack the basic information needed to populate a digital twin, such as the dominant pathways and depths of water flow through watersheds, described here as hydrologic processes12. This includes information to address fundamental questions such as whether most water moves overland, through shallow or deep soil layers, or as groundwater. Hydrologic theories on the global distribution and physical controls on processes are relatively rare; such syntheses oppose the view that emphasizes the uniqueness of each watershed13. In search of unifying hydrologic theories, Sivapalan6 proposes self-organization in landscape properties as the strongest clue to the drivers of hydrologic process, resulting in theories such as the control of topographic convergence on the dynamic spatial pattern of saturated areas.

Synthesis efforts across the field of hydrology have demonstrated the value of large-sample approaches towards scaling up our hydrologic understanding. So far, these efforts have focused on hydrological data, notably national to global databases of meteorology, streamflow, water chemistry, soil moisture and landscape attributes14,15,16. In the realm of groundwater, studies have analysed permeability and porosity17 and depth to groundwater18 on continental to global scales. Alongside data-intensive research, there have been efforts to describe hydrographic features (for example, water bodies, river networks and observations) and their relationships, such as the Open Geospatial Consortium WaterML 2.0 standard for online water data19,20. There have been only limited synthesis efforts that use field-based process knowledge across large samples of watersheds to explain large-scale patterns of hydrologic behaviour. Two recent papers synthesize data from multiple research watersheds to evaluate the drivers of hydrologic response21,22. Several pioneering studies have analysed a single process on a global scale, merging many datasets to identify new patterns. These include groundwater depth data used to explain the global distribution of wetlands and hydrologic regulation of rooting depth23,24 as well as global stable isotope data used to show that ‘young’ water accounts for more than a third of global streamflow25. However, no study has previously attempted to integrate global knowledge on watershed runoff generation processes.

Although there are hundreds of intensively studied watersheds around the globe where field measurements have revealed the respective dominant hydrologic processes26, this knowledge is fragmented and challenging to search and integrate across regions and continents27. Searches by location or watershed name in citation databases or search engines may return many articles without clear runoff process descriptions and searches by process type are complicated by ambiguity in process naming28. Journal articles providing the clearest explanation of hydrologic process knowledge typically follow best practices in two steps. First, the authors use their expert knowledge of the watershed to translate field measurements of stores or fluxes into derived information about hydrologic processes. Second, the authors communicate the derived information to the reader in a consolidated and concise section that summarizes process knowledge. Such a summary is here called a hydrologist’s ‘perceptual model’ of the watershed functioning and can be expressed both as text and a schematic figure describing the main stores and fluxes that transport water within the watershed29. These perceptual models are a valuable tool to share hydrologic process knowledge and identify knowledge gaps, test hypotheses to upscale process knowledge, and complement other forms of knowledge sharing such as datasets of climate, streamflow and physical features of watersheds30,31. Aggregating process knowledge on a large scale is needed to satisfy new demands from the hydrologic modelling community, where continental-domain modelling frameworks have the potential to tailor the model structure to suit the dominant processes in individual watersheds32,33.

This Analysis describes an effort to compile and synthesize process knowledge from perceptual models of research watersheds across the globe. Our synthesis enables us to test hypotheses from the literature that propose overarching drivers of patterns of hydrologic processes. We test the classic theory that aridity controls the balance of dominant flow processes between surface and subsurface flow34,35, and a recent theory that lateral flow processes dominate in most global regions36. Underlying our synthesis is a new open-source spatial database that brings together descriptions of hydrologic processes from intensively studied watersheds worldwide. The database is searchable using a hierarchical, taxonomic classification of process groups. The database enables advances in hydrologic theory to understand how climate and landscape control dominant processes, advances in hydrologic models that simulate spatially variable processes across different geographical domains, and advances in managing environmental change through tools such as digital twin environments.

A database of process knowledge

Our search has so far returned 400 published descriptions of hydrologic processes in watersheds with a median area of 6.9 km2. Among these process descriptions, 131 are figures (including 63 identified in an earlier study of perceptual model figures37) and 269 are text descriptions. Hydrologists began to publish process descriptions in the 1970s, and the number of such descriptions continues to grow, showing the potential for future expansion of synthesis efforts as process descriptions for an increasing number of locations are available each year (Fig. 1b). Figure 1a shows the locations of the watersheds, superimposed on global biome boundaries. Watershed locations include every continent, and all biomes, with the exception of mangroves and flooded grasslands (Fig. 1c). The spatial reach of these 400 process descriptions makes possible global analyses of watershed function, and the database provides a valuable public resource for the hydrologic community. The following sections demonstrate key hypothesis-testing research made possible by synthesizing this process information.

Fig. 1: Research watershed locations, publication years and biomes.
figure 1

a, Global map of research watershed locations with descriptions of hydrologic processes (symbols signify figure or text descriptions) superimposed on a global classification71 of 14 biomes. FAO, Food and Agriculture Organization; USGS, United States Geological Survey. b,c, Counts of watershed process descriptions by publication year (b) and biome (c).

Clusters of watersheds occur in regions with a rich history of hydrologic process studies, that is, the USA, Europe, Brazil and Japan. Such clusters demonstrate the impact of scientific funding priorities on regional hydrologic knowledge and the benefits of formal or informal networks of research watersheds that encourage data sharing, collate watershed metadata and improve discoverability38,39. The bias towards research watersheds in the Global North leads to a bias towards humid, temperate regions, similar to biases in the global river gauge network40. Arid regions and grassland biomes are less well represented. These regions may not have dominant hydrologic processes that conform to the stereotypical hydrological cycle: in arid regions and grasslands, green (transpirable) water may be more important than blue (surface or ground) water, and relationships such as impacts of afforestation on streamflow may differ from temperate regions41. The deficit of research watersheds to study these regions has environmental justice implications42, as hydrological theories and tools in these regions have weaker linkages to process knowledge and, therefore, would be expected to produce poorer forecasts.

Search results enable us to visualize global patterns of processes recorded in research watersheds (Fig. 2). The figure includes widely reported processes (subsurface stormflow and overland flow) and less common processes (organic layer subsurface stormflow and non-perennial flow). In Fig. 2a–c, process occurrence aligns with our expectations. Subsurface stormflow is widely reported, with exceptions in locations such as the central USA, parts of South America and Asia, and Spain. In these more arid locations, conditions that generate subsurface stormflow may be less likely to occur, that is, highly weathered soils with restrictive layers, rising water tables or wetting front propagation43. Organic layer subsurface stormflow is found in northern permafrost regions where flow is restricted to the thawed active layer44, and as part of stormflow in some humid or tropical forests45. Overland flow is widely reported, with exceptions in temperate or boreal forested areas of the Pacific Northwest, Appalachians, Northern Europe and Japan. Forest soils typically have high macroporosity, promoting rapid drainage of water to deep soils and reducing overland flow46. In Fig. 2d, process occurrence does not align with our expectations, as more non-perennial flow processes were recorded in the wetter eastern USA than the drier western USA. Although non-perennial flow regimes are most commonly associated with dryland areas, our results support recent work reporting their occurrence in diverse climate and geographic settings47.

Fig. 2: Locations of research watersheds where selected processes were recorded.
figure 2

ad, Maps show recorded locations of subsurface stormflow (a), organic layer subsurface stormflow (b), overland flow (c) and non-perennial flow (ephemeral or intermittent channel flow) (d).

Global synthesis of observed hydrologic processes

We use the database to analyse and reveal overarching patterns in runoff generation mechanisms. In our first analysis, we visualize differences in hydrologic processes over space (Fig. 3). The figure shows the dominance of process types in each biome, with the ‘biome triangle’ organized by temperature and precipitation48. The size of each slice is proportional to the number of different processes regularly recorded in that category (processes that are recorded in at least one in eight watersheds in the biome). Some patterns shown in the figure follow our expectations, such as greater numbers of snow and ice processes recorded in cold regions, forests dominated by subsurface processes (as previously discussed) and fewer subsurface stormflow processes in hot and dry climates where shallow water tables are rarer. Other patterns are unexpected, but may be explained by examining the studies included in the database. For example, the surprising numbers of snow and ice processes recorded in deserts are due to research watersheds in the American West that are in the desert biome yet receive winter snowfall. In another unexpected result, the tropical grasslands biome has the highest proportion of subsurface and groundwater processes. We find that many studies from this biome describe soils with a high infiltration that promotes recharge to deep groundwater and minimizes infiltration excess flow, thereby differentiating this biome from desert environments where biological soil crusts reduce infiltration49. This finding might also reflect the emphases of studies in developing regions, which sometimes discuss broader hydrogeological features related to agricultural water use. Overall, the overview of spatial variability in processes shown in Fig. 3 provides guidance for global hydrologic and Earth system modelling efforts to ensure that models are capable of representing the common processes in each biome.

Fig. 3: Process types recorded in each biome.
figure 3

Number of different processes recorded in each biome, organized by temperature and precipitation. Processes are grouped by type, where red colours indicate surface processes, blue colours indicate subsurface processes and purple colours indicate channel processes.

We build on this first analysis by investigating factors that control the difference in processes over space. We focus on testing two hydrologic theories that predict global patterns in which processes occur. The first are Dunne’s34,35 hypotheses on the effect of aridity, slope and soil depth on the type and depth of dominant flow processes. Dunne predicts that, in arid-to-subhumid climates with thin vegetation, most runoff is derived from infiltration excess overland flow, with subsurface stormflow being less important. In humid climates with dense vegetation, under conditions of thinner soils and/or gentler, wider valleys, most runoff is derived from saturation excess (that is, precipitation falling onto saturated areas) and the return flow of groundwater discharging to the surface. In humid climates, conditions of deeper soils and/or steeper, narrower valleys favour runoff derived from subsurface stormflow, with peaks produced by saturation excess and return flow.

We use the hydrologic process database to test whether Dunne’s hypothesis is supported by the global body of watershed field studies. Figure 4 shows the total number of times processes are reported by vertically organized categories and Dunne’s categories against global classes of aridity (aridity metric from Willmott and Feddema50).

Fig. 4: Process types recorded by aridity class.
figure 4

a,b, Stacked bar charts showing fractions of process types recorded as occurring in research watersheds by aridity class for vertically organized process categories from canopy to groundwater (a) and Dunne process categories (infiltration excess, saturation excess or subsurface stormflow) (b). Positive aridity indices indicate humid climates, and negative aridity indices indicate arid climates. White numbers on bars indicate the number of process descriptions in each aridity class.

Our results agree with some parts of Dunne’s hypothesis. There are four key points. First, arid watersheds tend to favour the shallowest (surface) or deepest (groundwater) processes, whereas humid watersheds favour mid-depth processes (soil drainage and subsurface stormflow) (Fig. 4a). The lesser importance of soil drainage and subsurface stormflow in arid and subhumid watersheds probably occurs, in part, because of deeper water tables, as opposed to humid watersheds where the water table is closer to the ground surface23. Second, in accordance with Dunne’s model, arid climates have a greater fraction of infiltration excess processes and a lesser fraction of saturation excess processes compared with humid climates (Fig. 4b). Third, the prevalence of subsurface stormflow varies little with aridity class, except for the most humid watersheds where a higher incidence of subsurface stormflow process is recorded. Fourth, and contrary to Dunne’s hypothesis, saturation excess processes are more frequently described than infiltration excess processes for all aridity classes of research watersheds. These results could occur because of a myriad of controls on saturated areas—for example, saturated areas can develop because of ponding in surface depressions and topographic convergence of water, even when the water table is well below the ground surface51. Articles in the database report saturation excess overland flow in arid watersheds (aridity index <−0.5) in thin top soil layers or pockets52,53, in areas of topographic convergence54 and in alluvial near-stream areas55. In humid watersheds, Dunne hypothesizes that thinner soils and/or gentler slopes favour surface flow, while deeper soils and/or steeper slopes favour subsurface flow. In our database, slope and soil depth show a negative correlation for individual watersheds, and we therefore analysed these variables individually. However, our results showed weak or no trend of subsurface stormflow prevalence with soil depth or slope (Extended Data Fig. 1).

As well as the occurrence of surface or subsurface processes, a second key knowledge gap in hydrology is how lateral and vertical flow processes drive the hydrologic response. This is important for global Earth system, hydrological and ecohydrological modelling capabilities, as such models often neglect lateral processes7 despite some recent efforts to include them56,57. For example, a continental-scale integrated hydrology model that excluded lateral groundwater flow was shown to underestimate transpiration as a fraction of evapotranspiration by 15 percentage points58. Therefore, the second hypothesis that we test is the Fan et al.36 proposition that local terrain imposes a major hydrologic control by organizing down-valley drainage and convergence of surface and subsurface flow, creating drier hills and wetter valleys. They propose that lateral drainage must be accounted for in Earth system models to represent soil moisture, shallow groundwater, evapotranspiration and vegetation distribution and, therefore, energy, water and carbon fluxes. They hypothesize that the controls of lateral drainage on evapotranspiration and vegetation distribution are important everywhere except for in low-relief, arid areas (which have no surplus water to drain) and high-relief, ever-wet areas (which have plentiful water and limited water controls on evapotranspiration). Such areas are very limited, confined to Northern Africa and small areas in Western and Central Asia (low-relief, arid) and South-East Asia and small areas in Western Amazonia and Central Africa (high-relief, ever-wet).

The process descriptions in our database do not have sufficiently described details on controls on energy and vegetation to test the full nuances of Fan et al.’s hypothesis, but we can map locations where lateral flow processes dominate over vertical flow processes, and vice versa. To achieve this, for each watershed in the database, we extracted all processes included in the figure or text description that describe water movement in the soil, counting whether each describes lateral or vertical flow. Watersheds where lateral processes outnumber vertical processes are defined as lateral-dominated (Fig. 5).

Fig. 5: Lateral and vertical process dominance by location, landform and biome.
figure 5

ac, Global map showing watersheds in which investigators recorded a greater number of lateral or vertical soil water flow processes or equal numbers of each, shown by location (a) as well as by landform type (b) and biome type (c) (bar graphs).

We found that process descriptions for research watersheds record a greater or equal number of lateral soil water and subsurface stormflow processes compared with vertical processes in 84% of watersheds (Fig. 5, red and grey dots). The spatial distribution in lateral flow processes suggests control by climate or landform. The bar graphs show that lateral flow processes are more often dominant in mountains, humid forests and cold regions. Vertical processes are dominant in fewer regions, typically in drier biomes, including temperate grassland in the interior plains of the USA, Mediterranean scrub in Southwestern Australia and savannas in the east of Africa and South America. These areas often lie in cratons with subdued topography and less well-developed river networks. Therefore, Earth system, hydrological or ecohydrological models that neglect lateral movement of water in the soil are missing a process that has been found to be important across most of the globe7.

Discussion and conclusions

In hydrology, overarching hypotheses about the roles of climate, biome and landform in controlling runoff generation processes have been rare and have been difficult to test experimentally. However, such hypotheses can be tested and brought to life by interrogating process descriptions from hundreds of research watersheds. This knowledge synthesis approach leverages decades of grant funding, fieldwork effort and local expertise created through field hydrology programmes around the globe. In this study, we analysed published descriptions of hydrologic processes from 400 watersheds around the globe. We found that patterns of process occurrence for subsurface flow, organic layer subsurface flow and overland flow could be related to expected differences in process with climate and land cover. By contrast, non-perennial flow occurred in locations not traditionally associated with that process. Process occurrence showed distinct differences by biome, with forested biomes being dominated by subsurface processes, while grassland, desert and tundra showed greater numbers of surface processes.

To investigate underlying controls on these spatial patterns in processes, we tested how the occurrence of surface or subsurface processes is related to watershed aridity. We found a global tendency for arid watersheds to favour shallow (surface) or deep (groundwater) processes, whereas humid watersheds favour mid-depth processes (soil drainage and subsurface stormflow). Such analysis can be adapted to relate any process categorization to any climate or landscape metric. These data and search capabilities open the door to an efficient cycle of hypothesis generation and testing to investigate emergent patterns that relate landscape organization and hydrologic function. For modelling applications, the database can be used to determine where different processes are important in controlling river flows. This is critical in Earth system, hydrological or ecohydrological model development, where missing flow components such as lateral flow may compromise predictions, and in streamflow forecasting models such as the National Oceanic and Atmospheric Administration (NOAA)’s proposed Next Generation Water Resources Modeling Framework, which will allow spatial variability in model structure to suit regional characteristics including the dominant local processes59.

Underlying our synthesis is a global, searchable database of research watersheds with 400 published descriptions of dominant hydrologic flow pathways. We used a hierarchical taxonomy to label process descriptions, enabling flexible aggregation, analysis and visualization of process patterns. The database offers many future opportunities for deeper investigation into how hydrologists generate knowledge about runoff generation processes. By examining the underlying journal articles in the database, we can assess what types of field evidence are most valuable to deduce dominant flow pathways and evaluate the strength of evidence in individual watersheds. This information will be valuable for the design of future hydrologic observatories or research watersheds that seek to analyse dominant runoff generation processes for applications such as flood mitigation. The database adds value to existing research watershed studies, by putting them in the context of neighbouring watersheds and those from similar landscapes. By highlighting outlying watersheds compared with regional patterns, we can assess landscape characteristics that contribute to anomalies in hydrologic response. In the future, artificial intelligence methods might be leveraged to evaluate very large numbers of articles to extract process knowledge. Early efforts show the potential for the hydrologic sciences to treat ‘text as data’ and extract topic and location data from journal articles, although further improvements in process identification and geolocation accuracy will be required60.

We conclude this Analysis with a call to action to preserve hydrologic process knowledge. Today’s focus on artificial intelligence and machine learning has brought about a new appreciation of the value of data across many scientific fields. Despite this, in developing the database, we became keenly aware of the risk of loss of hydrologic data in the form of historical, watershed-specific knowledge. We found journal articles that included process descriptions but in which pinpointing the location of the watershed was difficult. Older articles (for example, pre-1990) commonly included a high-resolution map of a small watershed but indicated the exact watershed location only roughly on a large-domain map. Process descriptions in such studies cannot be linked to spatial data such as land use or topography and cannot be included in a spatial database. Across all research watersheds, we appeal to authors to expose missing or obscured process descriptions. Many articles provided field data inventories or statistics but did not include quantitative information on storages and fluxes, nor interpret datasets to provide expert descriptions of dominant hydrologic processes. Our informal discussions with field scientists hint at a common worry that there is not yet enough information for an authoritative description of dominant processes, which may, therefore, never be published. In other examples, process information was included in the article, but was spread across multiple sections and was not easily accessible. Instead, we found that alternative or interdisciplinary sources, such as a collection on forest hydrology61, sometimes provided more complete process descriptions than watershed-specific journal articles. In summary, developing a global understanding of hydrologic processes is critical to underpin accurate forecasting models and digital twin simulations of Earth’s water resources. Uncovering and preserving process descriptions enables us to conserve decades of field-based domain knowledge for future hydrologic generations.

Methods

Process description source criteria

Process descriptions included in the hydrologic process database were gathered from published journal articles and reports from recognized organizations such as US Geological Survey, US Forest Service or South Africa’s Water Research Commission. Descriptions were required to include a substantially complete description of runoff generation processes, that is, movement of water from the land surface to stream channel, in either figure or text form. We required that process descriptions resulted from field investigation of the watershed, whereby local hydrologic measurements were used to determine active stores and fluxes of water. On a 4-point scale for the strength of hydrologic field evidence, such evidence would be classed as stage 3 or stage 4 (where 4 is the strongest)62,63. This requirement was applied to the field investigation as a whole, not to individual processes. Descriptions of meteorology, land surface or floodplain processes were outside the study scope. We included descriptions at hillslope to watershed scales, the scales at which research watersheds are commonly instrumented and investigated to understand runoff generation. For process description figures, we followed the criteria from McMillan et al.37, that the figure should relate to one specific watershed, that it should label stores and fluxes and that it should relate to the physical watershed and not to a computer model. Text descriptions were additionally required to be presented as a coherent section and not scattered throughout the article. While some sources might contain sufficient scattered information to describe runoff generation in the watershed, it was impractical to disentangle such process knowledge from the remainder of the text.

We searched a wide variety of sources to maximize the number of process descriptions found. We used a systematic search from Google Scholar using the search terms ‘runoff generation’ and (‘perceptual model’ or ‘conceptual model’ or ‘conceptual diagram’). We used the first 500 results from this search (ordered by relevance). We used reference lists from previous publications cataloguing processes in research watersheds28,46,63,64,65, from the Hydrological Processes special issue ‘Research and Observatory Catchments: the Legacy and the Future’ and from the experimental hydrology wiki (https://experimental-hydrology.net). We used lists of research watersheds from the Long Term Ecological Research, NorthWatch and Critical Zone Observatory networks and searched Google Scholar using search terms of the watershed name and ‘runoff generation’. We reached out directly to field and catchment hydrologists involved in the Consortium of Universities for the Advancement of Hydrologic Science, Inc. interest group on ‘Research and Observatory Catchments’ and made a public call on social media for relevant papers. All of these sources were screened for eligible process descriptions as above.

Process description analysis

For each eligible process description, we collected information about the source document, the watershed and the process description. For the source document, we collected the citation, the DOI and licence information. For the watershed, we collected the name, area, and latitude and longitude of the basin outlet. These attributes were sometimes described in the text, but in other cases had to be manually derived from maps included in the article or from other articles describing the same watershed. We urge future authors to include latitude and longitude values with more than three-decimal-place precision in all articles describing research watersheds.

For the process description in a figure or in the text, we collected a list of named processes, and the equivalent standard process names from a taxonomic classification of hydrologic processes28. If processes were described as very rarely or never occurring, we did not include them in the list. Other than that criterion, we did not label processes as dominant or otherwise in the watershed as the decision was found to be too subjective to produce repeatable results. Using standard process names enables us to search for equivalent but differently named processes in different watersheds, and to search hierarchically for groups of processes. We describe processes as being ‘recorded’ in the watershed if the article names the process in the figure or text description. Our criteria specify that the article must include a substantially complete description of runoff generation processes. Therefore, we assume that non-recorded processes have a low probability of occurring and, thus, that over multiple watersheds the number of recordings and importance of the process scale with each other. We recorded whether the process description used subpanels (figure description) or subsections (text description) to describe spatiotemporal variability by specifying processes at different time periods or spatial zones (for example, by season, or by hillslope position). We recorded the number of divisions and the type of classification. For the purposes of our analysis, processes that occur in any time period or spatial zone were associated with the watershed location as specified by the coordinates of the watershed outlet. We collected additional information conveyed by the figure or text such as types of vegetation, soil and geology and whether any information on uncertainty was presented.

Process database specifications

The process descriptions are stored in a PostgreSQL relational database. Each figure or text description is linked to metadata, including the article citation and watershed location. Further, each description is linked to a list of processes that occur in the watershed, with standardized process names from a hydrologic process taxonomy28. This taxonomy allows hierarchical searching for watersheds exhibiting any process, including its subprocesses, such as overland flow, including infiltration excess and saturation excess flow processes. A schema for the database is shown in Extended Data Fig. 2. A public version of the database is available as described in the ‘Data availability’ section; this version redacts original text/figures for articles without an open-access licence.

Process depth analysis

To create Fig. 4, showing the fraction of processes in vertically organized categories, we assigned each process in each watershed to a depth category, as follows. Categories were specified via the hierarchical taxonomy28 so that each category included all subprocesses related to a parent process. Canopy processes included canopy evaporation, interception, canopy snow processes and canopy sublimation processes. Surface processes included all surface processes in the taxonomy with the exception of canopy processes (thereby including evapotranspiration, snow, glacier and frozen ground processes, overland flow and infiltration). Near-surface processes included subsurface stormflow in the organic horizon. Soil processes included all water fluxes in the unsaturated zone below the organic horizon. Subsurface stormflow processes included lateral matrix and macropore flow at soil horizons and at the soil–bedrock interface, and included variable source area subsurface stormflow. Groundwater processes included all groundwater fluxes within the watershed and included groundwater flow, perched water table processes, displacement of groundwater and return flow. Regional groundwater processes included regional groundwater flux and deep groundwater losses.

For each watershed in the database, we calculated an aridity index using the Thornthwaite moisture index that is based on the ratio of precipitation and potential evapotranspiration50, using monthly average climate values from the Climatic Research Unit Time Series (CRU TS) v3.23 dataset66. For each aridity class, we selected all watersheds in that class, and pooled all processes that were recorded in those watersheds with their number of occurrences. The number of process occurrences in each depth category was divided by the total number of process occurrences in the aridity class to give the fraction of processes in each depth category shown in Fig. 4.

Soil depth and slope analysis

Extended Data Fig. 1 shows the fraction of overland flow and subsurface stormflow process by landscape slope class and soil thickness class. Slope values were derived from a 1-km-pixel-size classification of median slope (among a variety of global topographic variables), based on a 250-m Global Multi-resolution Terrain Elevation Data 2010 dataset67. Larger pixel sizes from 5 km to 100 km were available, but we selected 1 km based on the small size of most watersheds in the database (median watershed size 6.9 km2). Slope classes were selected on a log scale based on the distribution of slope values in our dataset. Soil thickness values were derived from a global 1-km dataset of thickness of soil, regolith and sedimentary deposit layers that gives an average value of the thickness of unconsolidated material (including soil on upland hillslopes and sedimentary deposits in valley bottoms)68.

Lateral process analysis

To create Fig. 5, showing whether each watershed location is dominated by lateral or vertical processes, we considered only processes in the soil and subsurface stormflow categories. We used this subset of processes to balance the intention of the article by Fan et al.36, which guided the analysis, with the level of detail available in the database. Fan et al.36 describe lateral flow as the lateral ridge‐to‐valley convergence that routes water from hillslopes towards the nearest stream channel, creating drier hills and wetter valleys. In process descriptions in the database, soil water flux is typically differentiated into vertical drainage in the unsaturated zone, and lateral flow that occurs only at impeding layers including soil horizons and at the bedrock interface. Vertical drainage is also known as percolation, recharge and vertical matrix or macropore flow. Lateral flow at impeding layers is also known as subsurface stormflow, throughflow and interflow. In the saturated (groundwater) zone, lateral and vertical movement of water are not as clearly distinguished, with articles often referring to groundwater flow or groundwater displacement but not specifying the degree to which lateral or vertical water movement is involved. Therefore, for this Analysis, we did not include groundwater processes. For soil and subsurface stormflow categories, we assigned each process recorded in each watershed to a lateral or vertical category. As for the previous analyses, our method assumes that the number of recordings and importance of the lateral or vertical category scale with each other. Categories were specified via the hierarchical taxonomy28 so that each category included all subprocesses related to a parent process. Lateral processes included all subsurface stormflow processes and included lateral matrix flow, lateral macropore flow, lateral unsaturated flow and topographic convergence of catchment waters. Vertical processes included vertical matrix flow, vertical macropore flow and vertical drainage to groundwater. For each watershed in the database, we counted the number of recorded soil and subsurface stormflow processes according to the categories above. Watersheds where the recorded number of lateral processes was greater than, equal to or less than the number of vertical processes were respectively specified as lateral-dominated, equal or vertical-dominated.

The bar graphs in Fig. 5 show the fraction of lateral-dominated, equal or vertical-dominated watersheds by landform type and by biome type. Landform types are derived from a 250-m resolution World Terrestrial Ecosystems raster classification with four classes (plains, hills, mountains and tablelands), developed by the US Geological Survey, Esri and The Nature Conservancy69. These classes are a simplified reclassification of the World Ecological Facets Landform Classes derived using the Hammond Landform Classification Algorithm70. Biomes are labelled according to a global classification of 14 biomes by a team convened for the World Wildlife Fund71. We used the ArcGIS layer of these biomes from the RESOLVE Ecoregions and Biomes dataset72. For visual clarity, the bar graph combines the following biomes: temperate broadleaf and conifer forests combined into temperate forest; tropical dry broadleaf forest and mediterranean forest combined into dry forest; and tropical, temperate and montane grassland combined into grassland.