Fig. 4 | Nature Communications

Fig. 4

From: Inferring causation from time series in Earth system sciences

Fig. 4

Methodological challenges for causal discovery in complex spatio-temporal systems such as the Earth system. At the process level, autocorrelation (1), time delays (2), and nonlinearity (3), also in the form of state-dependence and synergistic behavior (4), require a careful selection of the estimation method. Further, a time series might contain signals from different processes acting on vastly different time scales (5). Noise distributions (6) can feature heavy tails and extreme-values which challenges the ubiquitous methodological Gaussian assumption. At the data aggregation level, the most basic challenge is the definition of the causally relevant variables (7) representing the subprocesses of interest from spatio-temporally gridded data (e.g., from satellites) or station data measurements. Unobserved variables (8) need to be taken into account regarding a causal interpretation of the estimated graph. Time sub-sampling (9) and aggregation (10) can make causal links appear contemporaneous and even cyclic due to insufficient time resolution (e.g., due to the standard practice of time averaging depicted here in a time series graph24). Causal inferences are degraded due to measurement errors (11) such as observational noise, systematic biases (first few samples), or even missing values (grey samples), that may be causally related to the measured process, constituting a form of selection bias (12). Some datasets are of a discrete type (13), either due to quantization, or as categorical data, e.g., an index representing different weather regimes, and require methods that deal with discrete, and also mixed data types. Next to measurement value uncertainties, for paleo-climatic data even the measurement time points typically are given only with uncertainty (14), which especially challenges methods exploiting time-order. At the computational and statistical level, the scalability of methods, regarding both sample size (15) and high dimensionality (16) due to the number of variables as well as large time delays, is of crucial practical relevance for computational run-time and detection power. Finally, uncertainty estimation (17, width of links), also taking into account data uncertainties, poses a major challenge

Back to article page