Introduction

Since the founding of the Correlates of War project in 19631, armed conflict research has benefited from advances in the scientific method, including data collection2,3, modeling4 and prediction5. However, an outbreak of armed conflict remains difficult to predict, and any attempt to do so requires an explanation of its underlying causes5,6. These challenges have led many scholars in environmental security research to refrain from offering causal explanations altogether. For instance, one of the field’s founders, Homer Dixon, explicitly avoided “entangling himself in the metaphysical debate about the relative importance of causes of naturally caused armed conflict”7.

Ever since, a host of studies have attempted to overcome many of these limitations, by using econometric, statistical, and qualitative causal inference methods for observational data8,9,10. Since the 1990s, pathways that causally trace armed conflict to environmental variables have become increasingly rich and diverse11,12,13,14,15,16. Observationally identified causes linking conflict, agriculture, and weather include rapacity dynamics17,18,19 intensifying enmities along political fracture lines20,21, and food prices22,23 among others.

Until now, however, methods have only been able to empirically confirm less exhaustive causal mechanisms that link conflict to environmental factors16,24,25. To estimate causal effects, randomized controlled trials (RCTs) have typically been regarded as the gold standard, as random assignment of units to treatment or control eliminates confounding between assignment and outcome. Since such rigorous experimentation does not apply to armed conflict, more exhaustive causal mechanisms are still in search of appropriate methodology16,25. This leaves a significant gap in the literature not only for scholarly purposes but also for policy-making, as policy interventions can only effectively address causes once they are identified and estimated.

In addressing this gap, our article outlines a methodology for inferring causality from non-experimental observations of armed conflict. We leverage recent advances in the theory and methods of causality26. In particular, Pearl’s contributions—such as causal diagrams, causal interventions, and a formalized approach to causal reasoning—provide a robust framework for testing commonly hypothesized causal pathways using observational data. Our cross-sectional dataset consists of 294 non-experimental observations, one for each subdistrict in Iraq (Arabic: ناحية nāḥiya), which serves as our unit of analysis. Our outcome variable is the count of conflict events, with each observation additionally described by explanatory variables, including demographics, vital resources, the environment, and weather. These observations were sampled from several geo-coded maps2,27,28,29,30,31,32. From these data, we derived an empirical causal mechanism outlining linkages between environmental variables and conflict. Represented as a causal graph, the mechanism illustrates causal pathways from environmental variables to armed conflict outcomes. The mechanism is characterized by the causal effects of these variables on the count of conflict events, accounting for causal spillover wherever these effects could be identified and estimated. Several findings confirm that causal effects on conflict events can be traced back to their natural causes.

While we refrain from making strong claims about having established a robust causal mechanism, our study highlights exogenous dynamics that must be considered in environmental security research. More importantly, by demonstrating how advances in causal methodology can be leveraged in armed conflict research with non-experimental observations, we show how empirical causal structures, the identification of causal paths, and the estimation of causal effects can inform environmental security research. Third, we corroborate the existence of cause-and-effect relationships between environmental variables and conflict. Finally, we illustrate how causal frameworks can be used to inform more effective policy interventions.

Our study adapts causal methodology, including the retrieval of empirical causal structures and the estimation of causal effects under spillover conditions, to test existing hypotheses in armed conflict research using observational data—a novel approach in this domain. Furthermore, it applies these frameworks to environmental conflict research, offering new insights and paving the way for more effective policy interventions.

The article is structured as follows: Section 2 demarcates the research gap in the existing environmental security and armed conflict literature. Section 3 presents the hypotheses, and Section 4 outlines the results. Section 5 discusses key contributions, research limitations, future directions, practical implications, and conclusions. Finally, Section 6 describes the study’s data and methods.

Related studies

Armed conflict prediction is generally difficult6,33,34. It is particularly challenging in localities where armed conflict is expected to occur for the first time35. Efforts to overcome the challenges and pitfalls of armed conflict prediction have occasionally grounded prediction in environmental variables36. A specific line of environmental security research has focused on the causality of armed conflict. Studies have investigated both the connections tracing the causes of violent, organized conflict to scarce resources11,16,37 and the causal typologies of environmental conflict11,14,15.

Some studies have focused on and explicated specific pathways using econometric, statistical, and qualitative causal inference methods for observational data. An overview of these studies can be found in, e.g., Martin-Shields and Stojetz8; Von Uexkull and Buhaug9; Ide et al.10. Researchers have identified numerous causes linking conflict, agriculture, and the environment using observational data, including rapacity dynamics17,18,19, intensifying enmities along political fracture lines20,21, and food prices22,23, among others. Causes of conflict have been attributed to environmental stress and scarcity12,13,37, the allocation and management of scarce water resources38,39,40, and shocks in food production41.

Some high-profile studies such as Hsiang, Burke, and Miguel24 have attracted sharp criticism41 for capturing publication bias and making questionable omissions from data, rather than focusing on actual dynamics. The critique demonstrated that the inferred impact of environmental conditions on conflict is contingent upon the context, the type of violence, the involved actors, and the methodology itself. By extension, broad claims about causal pathways linking the environment to conflict are, at best, weak and inconclusive16,25,42.

Hence, while domain knowledge strongly suggests that environmental conditions contribute to organized armed conflict within countries, more comprehensive causal mechanisms linking these conditions to conflict remain less understood. Understanding these mechanisms in greater detail will likely require methodological advances to retrieve them from empirical data43. This is the research gap that our study aims to address.

Theory and hypotheses

This section derives our causal hypotheses from relevant findings. Adapted from Sakaguchi, Varughese, and Auld16, Fig. 1 outlines the basic mechanism of causal linkages between the environment and conflict, a mechanism that has been extensively hypothesized in environmental security literature. These hypothetical linkages are rooted in environmental causes. The figure distinguishes between direct and indirect linkages (i.e., paths A and B, respectively). The indirect linkages are based on the idea that the scarcity of vital resources—also referred to as environmental scarcity—hypothetically mediates the relationship between environmental causes and armed conflict outcomes. This is the rationale behind our hypotheses.

Fig. 1
figure 1

Hypothesized causal pathways linking the environment to conflict.

Long-term weather patterns have been argued to directly cause armed conflict24,44,45,46,47. The direct link between long-term weather patterns and armed conflict can be seen in how populations respond to environmental changes. For instance, environmental disruptions affecting livelihoods can prompt community mobilization and lead armed groups to intervene, anticipating adverse outcomes. These actions may lead to direct causal effects of environmental processes on armed conflict. Such effects can originate from factors such as soil moisture, temperature, or the way different physical environments absorb or release accumulated heat or energy (i.e., latent heat or energy)16,24. Therefore, we hypothesize that changes in soil moisture, temperature, and latent energy directly cause changes in armed conflict activity (H1).

  • H1a: An increase in latent energy in the form of heat directly causes an increase in armed conflict activity.

  • H1b: An increase in skin temperature directly causes an increase in armed conflict activity.

  • H1c: An increase in soil moisture directly causes a decrease in armed conflict activity.

Further, environmental processes have been argued to indirectly cause armed conflict16,48,49. Causal mediation of environmental effects on armed conflict primarily concernsenvironmental scarcity7,12,17,37. Environmental scarcity has been proposed as a mediator of environmental effects on armed conflict7,16. While we elaborate on causal mediation more specifically below, we can already hypothesize that the environmental processes indirectly cause armed conflict.

  • H2a: An increase in latent energy indirectly causes an increase in armed conflict activity.

  • H2b: An increase in skin temperature indirectly causes an increase in armed conflict activity.

  • H2c: An increase in soil moisture indirectly causes a decrease in armed conflict activity.

To add specificity to the indirect causal paths, causal mediation depends on specific causal conditions. Agricultural and pastoral conditions have been argued to shape social response to long-term weather patterns that induce migration50. Additionally, degradation and desertification of arable land, as well as the availability of water for agriculture, have been argued to mediate the environmental effects on violent conflict10,51,52. Wheat production has been shown to mediate the causal effects of temperature on the emergence of actual violence19. Since wheat is one of Iraq’s key crops53, we hypothesize that the environmental effects are also mediated by the production of agricultural resources, specifically wheat production.

  • H3a: Given the indirect paths from the environmental processes to armed conflict activity, wheat production causally mediates the effects of these processes on armed conflict activity, by further decreasing armed conflict activity.

In line with the previous reasoning, since the association between conflict activity and factors such as population size, growth, density, or migration has already been established54,55,56, these variables may also naturally mediate the effects of the environmental causes on armed conflict outcomes. Specifically, the scarcity of vital resources matters more to denser populations than to less dense ones48,57. Intuitively, a denser population is more likely to cope with and mitigate tensions less effectively than a less dense population. Everything else being equal, the denser population is, therefore, more likely to succumb to organized violence caused by environmental processes. Thus, our next hypothesis, H3b, aligns with this reasoning.

  • H3b: Given the indirect pathways from the environmental processes to armed conflict activity, population density causally mediates these effects by further increasing armed conflict activity.

With these hypotheses, we can construct the entire hypothetical causal structure of linkages between environment and conflict, as shown in Fig. 2. As depicted in the figure, grounded in the environmental processes, the scarcity of vital resources exposes population to existential stress. Both population density and the scarcity of agricultural resources aggravate these effects.

Fig. 2
figure 2

All the hypotheses combined: The hypothetical causal structure.

Results

Empirical causal structure

In the theory section, we proposed a hypothetical causal structure for the linkages between environment and conflict. Figure 3 presents the causal structure that we empirically retrieved from the available non-experimental observations.

Fig. 3
figure 3

The empirical causal structure retrieved from the aggregated data.

Although somewhat less expressive, the empirical causal structure largely corresponds to the hypothetical one in Fig. 2. The conflict nodes are clustered together, with the only node having only incoming edges being the count of conflict events. Additionally, the structure is rooted in the environmental processes. Apart from the direct causal path from the temperature node to battle events, all other paths from the environmental processes to conflict events are indirect.

Because the population density, skin temperature, and wheat production nodes have the highest number of incoming and outgoing edges, these nodes are pivotal to the connectedness of empirical causal structure. This supports the causal mediation of environmental processes on conflict outcomes. In fact, the absence of population density, wheat production, and skin temperature would significantly disconnect the environmental causes from these outcomes. Rather than accepting this evidence as conclusive, we further use the empirical structure to conduct hypothesis testing.

Causal hypotheses

The empirical causal structure can assist in validating the causal hypotheses of naturally caused armed conflict. Using this structure, we fitted a spatially explicit structural equation model (SESEM) and formulated and estimated the causal effects of our explanatory variables on the count of conflict events.

The SESEM model demonstrated acceptable performance, with the Comparative Fit Index (CFI) exceeding 0.90 and the Standardized Root Mean Square Residual (SRMR) falling below 0.08 across nearly all spatial distances58. These values indicate an acceptable model fit, highlighting the model’s effectiveness in capturing the underlying data structure. While the Root Mean Square Error of Approximation (RMSEA) occasionally exceeded 0.10, suggesting room for improvement, this is likely due to the limited number of available observations.

For each causal estimate, a hypothesis test was conducted to determine whether the estimate should be attributed to random error. Table 1 lists the causal estimates, the standard errors, and statistical significance.

Table 1 The causal effects across varying distances.

Our first set of hypotheses posits that the environmental processes directly cause armed conflict outcomes. Specifically, changes in latent energy (H1a), skin temperature (H1b), and soil moisture (H1c) were hypothesized to directly affect the count of conflict events. Among these, only skin temperature shows a direct causal path to conflict events through battle events. The estimates, standard errors, and statistical significance for each isolated path from skin temperature to conflict events are provided in Table 2.

Table 2 The causal effects of skin temperature on conflict events across three different paths and varying distances.

Our first set of hypotheses posits that the environmental processes directly cause conflict events. Based on the empirical causal structure, the estimated causal effect of skin temperature on total conflict events is statistically significant at the 0.1% level and shows a positive relationship to conflict eventsacross all spatial distances (see Table 1). This includes a 47.35 non-spillover effect at a distance of zero (see Table 2). Since there is no evidence to reject the null hypotheses for (H1a) and (H1c), we accept (H1b), and we do not accept (H1a) and (H1c).

Further, our second set of hypotheses posits that the environmental processes indirectly cause conflict events. Latent energy was hypothesized to increase the count of conflicts indirectly (H2a). According to the empirical causal structure, all causal effects of latent energy on conflict events must be mediated. Everything else being equal, the estimated causal effects of latent energy on conflict events are statistically significant at the 0.1% level and positive across all spatial distances. This includes a 24.27 non-spillover effect at a distance of zero (see Table 1). Thus, we accept (H2a). Similarly, soil moisture was hypothesized to decrease the count of conflicts indirectly (H2c). With no direct paths from soil moisture to conflict events, its effects can only be mediated. Everything else being equal, the estimated causal effects of soil moisture on conflict events are statistically significant at the 0.1% level and negative across all spatial distances, including a -18.30 non-spillover effect at a distance of zero (see Table 1). Therefore, we also accept (H2c). However, for the indirect paths from skin temperature to armed conflict activity through population density (H2b), , the estimated causal effects of skin temperature on conflict events are not statistically significant at the 0.1% level across all spatial distances (see Table 2). Consequently, we do not accept (H2b).

Finally, our third set of hypotheses posits that the causal effects of the environmental processes on conflict events are agriculturally and demographically mediated. Specifically, given the indirect paths from soil moisture and latent energy to armed conflict activity, wheat production was hypothesized to decrease the count of conflict events (H3a), while population density was hypothesized to increase the count of conflict events (H3b).  Everything else being equal, the estimated causal effects of wheat production on conflict events are statistically significant at the 0.1% level and positive across all spatial distances, including a 14.02 non-spillover effect at a distance of zero (see Table 1). Similarly, the estimated causal effects of population density on conflict events are statistically significant at the 0.1% level and positive across all spatial distances, with a 29.66 non-spillover effect at a distance of zero (see Table 1. These findings lead us to reject the null hypotheses for (H3a) and (H3b), not accepting (H3a) and accepting (H3b) across all spatial bins. Finally, the removal of wheat production, skin temperature. and population density from the empirical causal structure would significantly disconnect soil moisture and latent energy from conflict events. Our causal estimates corroborate these structural findings.

Seasonal effects

In addition to the aggregated results, we applied the causal discovery algorithm to temporally non-aggregated data slices at each season’s mid-point. The results are presented in Figs. 7, 8, 9 and 10 of the Appendix. As seen in the figures, applying the causal discovery algorithm to the seasonal data slices reveals highly irregular empirical causal structures that deviate from the prevailing domain knowledge.

Discussion

We make three contributions to armed conflict research. First, we demonstrate how advances in causal methodology can be applied to armed conflict research. While natural experiments on armed conflict are rare, and most observations are non-experimental24, causal assumptions can bridge the gap between experimental and non-experimental studies. By leveraging causal methodology, we show how to retrieve an empirical causal structure, identify causal paths, and estimate causal effects using non-experimental data. Additionally, the probability of distances between hypothesized and empirically discovered causal structures can be quantified and tested59,60, especially when domain knowledge informs causal discovery of naturally caused armed conflict. Otherwise, an empirical causal structure can be retrieved exploratorily, as shown in our study.

Second, in the context of environmental security research, we confirm the existence of cause-and-effect relationships between environmental and demographic variables and armed conflict outcomes. We demonstrate that environmental processes, mediated by agriculture and demographics, play a role in causing conflict. By disentangling the causal factors underlying the naturally caused armed conflict, we addressed the research gap in the environmental security literature16,25,43. The early research avoided engaging in the metaphysical debate over the relative importance of natural causes of armed conflict7. By retrieving the empirical causal structure from non-experimental conflict observations, we identified the causal mechanism linking the environment to conflict. Our empirical causal structure shows that the environmental processes, particularly soil moisture and latent energy, can be at the core of naturally caused armed conflict. We also established direct and indirect causal effects, such as the direct aggravating effect of skin temperature andthe indirect alleviating effect of soil moisture. Our findings show that the effects of these environmental causes on conflict events are mediated by demographics and vital resources. Additionally, we confirm that population density aggravates conflict however, contrary to our hypothesis, we find no evidence that higher wheat production alleviates conflict activity.

Third, we show how our causal findings can inform the design of more effective policy interventions to address armed conflict outcomes. Our findings provide a foundation for policy approaches that mitigate armed conflict. Policy acceptance hinges on unpacking black-box predictions5, and because causality explains underlying mechanisms, it offers greater interpretability than other approaches to analysis do61. Causality also holds policy relevance. For instance, mediators such as population density and wheat production are key points where indirect causal paths from the environmental factors to conflict outcomes can be strategically targeted for intervention. Causal frameworks also enable validation of additional such points. Thus, understanding the causal mechanisms behind naturally caused armed conflict can guide the development of policies aimed at addressing its root causes.

Our research is constrained by several limitations. First, regarding our variables, many vital resources are influenced by factors such as precipitation, temperature, energy, and water availability24,51,52,62,63, which justifies the selection of wheat, skin temperature, soil moisture, and latent energy as explanatory variables. Population density is also a well-established variable54,55,64. However, the causal sufficiency assumption posits that any omitted, causally relevant variable can confound causal estimates65. In our case, this primarily applies to social and political variables (e.g., political power-sharing arrangements, inter-group animosities, and horizontal inequality)2,66,67. The absence of these variables may have violated the causal sufficiency assumption, potentially confounding our causal estimates. Unfortunately, such data is not available at an adequate resolution for a data-scarce country like Iraq. Even if these variables had been available, however, their inclusion might have introduced reverse or cyclical causality, thereby confounding our estimates. Hence, the trade-off between methodological rigor and domain knowledge is an inherent challenge of causal modeling.

Our second limitation relates to the geographical nature of subdistricts. The need to delimit the conflict under study both geographically and historically5 constrains the generalizability of our findings to other geographies. This reflects a broader trade-off: the need to localize armed conflict spatially and temporally versus the need to ensure sufficient variability in the sampled observations. Moreover, because subdistricts are geographical units, it cannot be assumed that the observations are independently and identically distributed. Neither conflict activity nor the explanatory variables necessarily conform to subdistrict borders. Specifically, the tribal dynamics play a role in shaping armed conflict in Iraq, and the subdistrict and tribal boundaries do not always coincide68. As a result, it is plausible that some Iraqi subdistricts are affected by causal effects from other Iraqi subdistricts40,69,70: While our spatially explicit structural equation modeling accounted for such spillover effects, the causal discovery method used to retrieve the causal structure does not incorporate a remedy for this spatial dependency.

Our third limitation pertains to the temporal dimension of causality. Since causes are conventionally expected to precede their effects, temporality is a fundamental property of causal inference. However, the limited availability and quality of our time-series data constrained our ability to model the temporal dynamics of naturally caused armed conflict. Specifically, the available time-series were not sufficiently granular or consistent to support robust temporal modeling. This lack of time-series data prevented us from inferring seasonal effects beyond the irregular causal structures, deviating from the domain knowledge. The irregular causal structures for the seasonal snapshots are unlikely to be robust, as it is unlikely that the entire environmental conflict dynamic could have unfolded within a single month.

Several research directions could address the limitations outlined above. First, given the exclusion of social and political variables from our study, subsequent studies should assess the robustness of our causal findings under the assumption of unobserved confounding—i.e., by relaxing the causal sufficiency assumption. Causal frameworks for this purpose already exist69,71,72. An increasing number of these tools are now available as off-the-shelf solutions, providing researchers with practical means to integrate them into their methodology72,73,74.

Further research should also investigate the generalizability of our findings to other geographical localities. The smaller the size of locality, the less variable are the observations. Consequently, the less likely it is that their variability suffices for statistical significance. Any reduction in the size of locality can artificially make causal findings less statistically significant. However, it is also possible to overcome the reduced variability of observations, characteristic of small-size localities, by describing conflict observations in informationally dense terms. This can be achieved by describing each locality with additional explanatory variables, while dimensionally reducing the number of explanatory variables to a smaller number of latent factors. Armed conflict outcomes can then be modeled in terms of these latent factors. This approach allows us to examine more diverse contextual data and assess the extent to which our findings can be generalized and applied to different geographical regions and conflict contexts.

Geography also matters for strategy. Natural resources are inherently geo-located, as are competing factions. Claiming these resources, competing factions often act strategically. These dynamics rarely unfold in a socio-political and demographic vacuum. For instance, the already mentioned cross-subdistrict tribal networks can introduce both natural and strategic contingencies into conflict dynamics, as has been observed across Iraqi subdistricts. Obviously, armed conflict within some Iraqi subdistricts may contract causal effects from other subdistricts because of geographical reasons. However,  further investigation is warranted into whether such spillover effects are primarily driven by natural environmental factors or by strategic human agency. Causal frameworks that support such investigation are already available for use69,70.

Our findings should also be examined for temporal effects. The retrieved empirical causal structure already suggests the non-linearity of armed conflict outcomes75. Future research should, therefore, investigate how to infer causal effects on armed conflict dynamics across time. Causality can be temporally represented by causal time-series graphs76. Such graphs can identify time-varying causal paths and guide the estimation of temporal causal effects. However, not only would this require time-series data of sufficient quality, but also a plausibly hypothesized mechanism of temporal causal effects, which may go beyond our explanatory variables. Additionally, most currently available causal discovery methods for time-series data rely on the stationarity assumption, which limits their ability to detect and account for seasonal fluctuations in conflict dynamics77.

By explaining why phenomena occur as they do, causal analysis offers greater pragmatic utility for policymaking than any other methodological approach. This strengthens the case for accepting our findings and underscores the broader value of applying causal methodology. This also encourages adoption of causal methodology in other policy domains where decisions can benefit from empirically grounded causal evidence.

In the context of environmental security, understanding the causal linkages between environmental factors and conflict provides valuable guidance for designing conflict mitigation strategies. By targeting mediators along natural causal pathways to conflict outcomes, policy interventions can be crafted to disconnect these pathways before aggravating causal effects reach conflict outcomes. Such interventions can be cost-effective and even preventive. For instance, well-conceived social and migration policies aimed at reducing population density can disconnect some of these causal links. Similarly, development aid policies can mitigate conflict by addressing environmental stressors This can be achieved by strategic investments in long-term hydrological infrastructure. Finally, our findings enable the geographical assessment of Iraqi subdistricts for their specific vulnerability to particular causal effects, thereby informing the spatial targeting of policy implementation.

In conclusion, quasi-experimental approaches often fall short in the study of armed conflict, and natural experiments are rare. Further, ethical considerations preclude rigorous experimentation in this domain. To the best of our knowledge, our study is the first to apply the recent advances in the theory and methods of causality to the analysis of naturally caused armed conflict. Relying exclusively on non-experimental observations, we derived an empirical causal structure that corresponds to the underlying mechanism linking the environmental factors to conflict outcomes. Based on this structure, we exemplified how to infer the causal effects of natural processes on armed conflict. While our findings rest on specific assumptions, they offer a foundation that can be strengthened through additional validation. Such validation—particularly through interdisciplinary methods—is therefore welcome.

Data and methods

Data sources

The armed conflict activity variables were sourced from the Armed Conflict Location and Event Data Program2 (ACLED). We also retrieved geo-coded maps from multiple sources, including the Humanitarian Data Exchange, the European Centre for Medium-Range Weather Forecasts of Copernicus Climate Change Service28, NASA27,29,31, the Center for International Earth Science Information Network of Columbia University30, and MapSPAM78. These maps provided grid-based representations of our explanatory variables. Our unit of analysis is the Iraqi subdistrict (n=294), encompassing all subdistricts across the country. The observational time frame spans from January 1, 2020, to January 1, 2022. Within this period, observations were aggregated into a single cross-sectional dataset. The choice of this time horizon was motivated by the high availability of recent data. In cases where a grid layer was unavailable for the entire period, we used data covering the maximum available portion of the horizon for the corresponding variable. The aggregation methods are specified in the sections below.

Despite a strong rationale for including additional explanatory variables that capture the societal and political context of Iraq2,66,67, Iraq remains a data scarce country, and such variables are generally unavailable at a sufficiently granular resolution. We addressed this issue of unavailability as one of our key research limitations in the discussion section.

Conflict activity variables

We retrieved several armed conflict variables from ACLED2. These variables are geospatially linked to specific geographical coordinates and are geo-coded as counts. ACLED categorizes conflict activity into distinct types, including battles, explosions/remote violence, violence against civilians, protests, riots, and strategic developments.

Conflict events, violence against civilians

Violence against civilians refers to the deliberate infliction of harm on unarmed non-combatants by organized armed factions. Such events include acts of sexual violence, abductions, and forced disappearances2.

Conflict events, battles

In addition to violence against civilians, we also included conflict events categorized as battles, which are defined as violent engagements between two organized armed groups.

Total conflict events

Our outcome variable was the total count of conflict events, proxied by summing the counts of each specific conflict event type (i.e., battles, explosions/remote violence, violence against civilians, protests, riots, and strategic developments).

Explanatory variables

Environmental variables

Some environmental conditions pertain specifically to weather. While weather refers to short-term atmospheric conditions, climate describes the average weather of a region over an extended period. We specified our environmental variables in line with Sakaguchi, Varughese, and Auld16. Given that temperature, soil moisture, and heat have been already hypothesized—and empirically demonstrated—to be associated with and causal to violent conflict24, we selected these variables to represent the weather conditions in Iraq.

Skin temperature

Temperature is a physical quantity that indicates how hot matter is79,80. For our analysis, we selected skin temperature—the temperature at the interface between the Earth’s surface and the atmosphere—because it directly influences the growth and cultivation of agricultural resources, as well as the availability of water81. Skin temperature data was sourced from ERA5-Land dataset28. Each pixel on the corresponding map, with a spatial resolution of 11,132 square meters, represents a temperature value measured in Kelvins.

Soil moisture

Soil moisture refers to the total amount of water, including water vapor, present in unsaturated soil79,80. We selected soil moisture as an explanatory variable because it impacts the growth and cultivation of agricultural resources, as well as the availability of water. The soil moisture data was also sourced from ERA5-Land dataset28, specifically at a depth of 28–100 cm. Each pixel on the map, with a resolution of 11,132 square meters, represents a volumetric fraction of water at this depth.

Latent energy

Also referred to as latent heat, latent energy refers to the energy released from the Earth’s surface to the atmosphere. Latent energy is associated with the evaporation or condensation of water vapor at the Earth’s surface79. It represents an environmental process that extends beyond temperature but still impacts the physical surroundings. We sourced the average latent heat net flux27 from NASA and United States Geological Survey’s MODIS 006 MOD16A2 dataset. The flux represents the average latent energy passing through matter. Corresponding to a resolution of 500 square meters, each pixel on the map displays a value in Joules.

Environmental scarcity

Environmental scarcity refers to the lack of vital resources on which human communities directly and critically depend. Since the scarcity of agricultural resources can catalyze violent conflict16, earlier findings on wheat production guided our proxy for crop availability in Iraq19.

Wheat production

We sourced the total wheat production for rainfed and irrigated crops from the 2020 version of Global Spatially Disaggregated Crop Production Statistics Data (MapSPAM, Version 1.0) 78. Each pixel on the map, corresponding to a resolution of 10,000 square meters, shows a value in metric tons. Due to the constrained time horizon of this dataset, the wheat production variable was only available for 2020 as the most recent observation.

Demographics

Population density

Given that population density has been found relevant for armed conflict activity55, we sourced this variable from the Gridded Population of the World Version 4.11 dataset by the Center for International Earth Science Information Network at Columbia University30. Each pixel on the map, corresponding to a resolution of 927.67 square meters, shows an estimated number of people per 30 arc-second grid cell. Due to the constrained time horizon of this dataset, the population density variable was sourced for 2020 as the most recent available observation.

From geo-coded observations to subdistrict values

Our variables are geo-coded, meaning that each corresponding value is linked to specific geographical coordinates. Since our unit of analysis is the Iraqi subdistricts, the value for each variable within a subdistrict must be an aggregate of the geo-located values of that variable within that subdistrict. However, the geo-coded maps containing these variables do not include subdistrict borders. To enable the geographical aggregation of pixel values across subdistricts, a shapefile with the subdistrict borders was integrated into each map. We sourced this shapefile from the UN OCHA Humanitarian Data Exchange, and it stores the subdistrict borders as a geometry variable (i.e., a polygon). The 294 subdistrict borders are shown in Fig. 4.

Fig. 4
figure 4

Subdistrict borders in Iraq.

Further, the conflict activity variables are geo-located. For each subdistrict, we counted the geo-located values of conflict activity variables reported within that subdistrict’s polygon. This procedure was applied to the count of battle events, violent events against civilians, and the total count of conflict events.

Furthermore, the available grids store values of the explanatory variables as pixels, each associated with specific geographical coordinates. We bounded these pixel values by the subdistrict polygons, as previously described. Since the storage of explanatory variables was no longer sparse, we aggregated the values of explanatory variables to highlight the extreme values of each observed explanatory variable. Specifically, pixel values for soil moisture were aggregated geographically as pixel minima and temporally as subdistrict standard deviations. Pixel values for skin temperature were aggregated geographically as pixel maxima and temporally as subdistrict standard deviations. Pixel values for average latent heat flux were aggregated geographically as pixel maxima and temporally as subdistrict standard deviations. Moreover, pixel values for wheat production were aggregated geographically as pixel means for the year 2020, the most recent available observation. Finally, pixels values for population density were aggregated geographically as pixel maxima for the year 2020, the most recent available observation.

Finally, considering that seasonality plays a crucial role in the relationship between environmental and conflict variables82, we extracted four non-aggregated time slices corresponding to the seasonal midpoints: January, April, July, and October 2020. The methods detailed in the following section were applied to both the aggregated data and the non-aggregated time slices, with a focus on demonstrating where the method proves most effective.

Methods

Causal methodology traditionally requires experimentation83,84,85,86. However, it is now possible to infer causality even from non-experimental observations26,87,88,89,90. Acknowledging the need for non-experimental approaches in environmental security, we argue that it is possible to retrieve the causal structure linking the environment to conflict. By applying causal methodology to non-experimental observations, the underlying paths and effects of the causal structure can be identified and quantified. This process unfolds in three stages: causal discovery, causal identification, and causal inference.

The following three subsections explain how causality is discovered, identified, and inferred from non-experimental observations. These subsections exemplify how to respond to a specific causal query: “What is the magnitude of the causal effect of soil moisture on the count of conflict events?

Causal discovery

The purpose of causal discovery is to retrieve a causal structure from available observations65. These structures can be modeled graphically26. Each directed edge in such a causal graph represents causality between the node with an outgoing arrow (the cause) and the node with an incoming arrow (the effect)26. The graph on the left in Fig. 5 shows a directed edge: Soil moisture → Conflict events, indicating that a change in soil moisture causes a change in conflict events. In contrast, the graph in the middle shows a bidirected edge: Soil moisture ↔ Conflict events, suggesting a reciprocal relationship between soil moisture and conflict events.

Fig. 5
figure 5

Bidirected edges and causal cycle in a causal structure.

Each node between a node with only outgoing arrows (i.e., a root cause) and a node with only incoming arrows (i.e., an effect or outcome) is a mediating node (e.g., population density). Further, the graph on the right in Fig. 5 is characterized by a causal cycle: Soil moisture → Conflict events → Population density → Soil moisture. Despite recent theoretical advances91, the simplest conception of causality requires bidirected edges and causal cycles to be absent from a causal graph, as they indicate hidden common causes and reverse causality, both of which can confound causal inference26. Graphs without bidirected edges and cycles are referred to as directed acyclic graphs (DAGs). The simplest conception of causality dictates that causal discovery should retrieve a DAG from available observations65.

Following this logic of the causal discovery stage, we retrieved a DAG from our observations using Greedy Equivalence Search (GES) algorithm65. We selected GES, because it is considered a suitable causal discovery method for small sample sizes65. We employed the Bayesian Information Criterion (BIC) as our loss function. The GES algorithm iteratively adds and removes edges in a stepwise manner, scoring each configuration to identify the structure that best fits the data, while optimizing the BIC. The output was the most likely DAG given our observations. The nodes of the DAG correspond to our armed conflict activity and explanatory variables, while the edges represent the causal relationships between them.

Causal identification

The purpose of causal identification is to determine whether, given a causal structure, the causal query has a unique answer92. If the query is unidentifiable, the identification process reveals this. Additionally, causal identification helps in formulating a quantity that provides a unique answer to the query26,92. This formula, which allows for the quantification of the answer, is referred to as an estimand93.

Given the arrow from Soil moisture → Conflict events in Fig. 6, a node like population density introduces an alternative path between soil moisture and conflict events. If this node is not explicitly considered, it is referred to as a confounder26. If all confounders can be accounted for, the causal query can be properly identified.

Fig. 6
figure 6

Confounder: Population density.

Given the probabilistic interpretation of causal graphs26, let \(\:P(.|.)\) represent a conditional probability distribution. Let \(\:C,\:M,\) and \(\:D\) represent the conflict events, soil moisture, and population density variables, respectively, and let \(\:c\), \(\:m\), and \(\:d\) represent their corresponding realized values. The \(\:do(.)\) operator represents intervention. If no variable were associated with soil moisture and conflict events (see Fig. 6), the unique answer to our causal query would have been \(\:P\left(C\right|\:do\left(m\right))=P(C\left|m\right)\). However, since population density introduces an alternative causal path between soil moisture and conflict events, failing to account for population density could prevent the determination of a unique answer to the query. Therefore, to identify the causal query properly, the confounding effect of population density must be marginalized. As a result, the causal graph in Fig. 6 enables the identification of a unique answer to our query, with \(\:P\left(C\right|\:do\left(t\right))=\sum\:_{D}P\left(C\right|t,\:D\left)P\right(D)\) representing this unique answer.

This reasoning illustrates an identification criterion known as the backdoor criterion26. For other identification methods, we refer the reader to Tian and Pearl94 and Shpitser and Pearl92. By applying this reasoning to the retrieved DAG, we were able to identify causal estimands for our explanatory variables, provided that access to the relevant variables made such identification feasible.

Causal estimation and hypothesis testing

The final stage of causal inference involves estimation and hypothesis testing. In this stage, an estimator (i.e., a method for calculating estimates) is applied to a causal estimand, using sampled observations26,93. This process produces an estimated causal quantity of causal effect, i.e., causal estimate. Ultimately, an assessment is made to determine whether the estimate should be attributed to random error. If not, the quantity is deemed statistically significant.

Since the values of environmental variables in one subdistrict can influence the environmental security of another , the estimation procedure required a method capable of accounting for spatial confounding. To address this, we employed spatially explicit structural equation modeling (SESEM)95.

SESEM combines structural equation modeling (SEM) with spatial dependence modeling to account for effects across varying spatial lag distances. While SEM was used to implement the retrieved causal structure, spatial dependence modeling addressed the conflict dynamics that extended beyond the subdistrict boundaries.

A straightforward way to incorporate spatial dependence modeling via lag distances is to calculate the distances between subdistricts. This can be done in several ways. The most intuitive method involves computing the centroid of each subdistrict and measuring the distances between these centroids. Alternatively, a network can be formed by connecting adjacent subdistricts with edges, in which case the distance between any two subdistricts equals the length of the shortest path between them.

The first step in applying SESEM was to fit a non-spatial SEM model to the data96. Next, spatially explicit variance–covariance matrices were computed across a range of lag distances, which were binned according to sample pair distances. We focused on the lowest 20% of these distances, as spillover effects are more likely to occur among neighboring subdistricts. To ensure reliable inference across each distance range, each bin was set to include 500 sample pairs. SEM models were then fitted for each lag distance, and edge coefficients, standard-errors, and p-values were computed. Finally, individual causal paths were parameterized to obtain path-specific coefficients, standard-errors, and p-values.