Introduction

Since the dawn of civilization, the movements of ocean waters have attracted the attention of researchers and those who live near the sea. Among the phenomena that have most intrigued the inhabitants of coastal regions around the globe, the oscillatory movement of sea level stands out for its regular periodicity1.

This vertical oscillation of the surface of the sea or other large body of water on Earth is called Tide, caused primarily by differences in the gravitational attraction of the Moon and, to a lesser extent, of the Sun on different points on Earth2.

The characterization of tidal temporal variations through statistical analysis (averages, extremes, and amplitude) provides essential technical support for: (i) nautical safety, (ii) the design of port structures, and (iii) integrated coastal management, ensuring operational accuracy and environmental resilience.

According to Dalazoana3, the main reference for tide data on the Brazilian coast is the Tide Tables (TT), published by the Navy’s Hydrography and Navigation Directorate (DHN). These records use the Reduction Level (RL) as a reference-a locally defined minimum level established so that no negative tide height values occur4.

Another reference level used is the observation of sea level to determine an average value. For geodesy, this is the altimetric reference surface, also called the geoid. This is the level surface of the gravity field that best adapts to mean sea level (MSL) and can extend into the solid body of the Earth5. This ideal level surface corresponds to the equilibrium surface, which is understood by observing the waters of the sea, assuming them to be absolutely calm6,7.

According to Marmer8, 19 years of data analysis represent the ideal observation period for obtaining altimetric data, necessary to reduce the effects of periodic variations caused by astronomical influences. However, this is not a rule and will depend on the conditions found in each country. In Venezuela, for example, the datum was established based on 19 years of tide gauge observations in La Guaira between 1953 and 19719. In Colombia, it was established based on 18 years of tide gauge observations in Buenaventura between 1951 and 196810. In Brazil, the Brazilian Vertical Datum (BVD) was initially determined in Torres/RS (Latitude: 29\(^\circ\)20’07”S; Longitude: 49\(^\circ\)43’36”W) provisionally with only 1 year of observations (1919–1920), being replaced in 1958 by the Imbituba Datum with a longer time series of sea level observations (1949–1957)7.

The references currently used derive from measurements taken decades ago, presenting significant limitations for contemporary monitoring, since sea level serves as a fundamental parameter for assessing climate change11. This relevance becomes even more evident when we consider that thermal expansion of the oceans and the melting of glaciers are processes that contributed substantially to the rise in mean sea level observed throughout the 20th century12.

Thermal expansion of sea level has been widely documented in global studies, encompassing various fields of knowledge. Research in Earth sciences13,14,15,16, environmental sciences17,18, and even in decision sciences19 corroborates this phenomenon. Although sea-level rise is commonly linked to climate change and glacier melt, as previously mentioned, it is crucial to consider that local and regional factors may distort these measurements, creating both the impression of a false increase and the amplification of observed variations. Among these factors, coastal subsidence stands out, that is, the natural sinking of coastal lands. This process can create the false impression of accelerated sea level rise, even when global elevation remains stable. Emblematic examples include the Saigon-Dong Nai River Estuary and the Mekong Delta Coast20, as well as the Texas Coast21,22. In the Brazilian context, Salvador shows a significant tendency toward subsidence, a condition explained by the location of the monitoring station on the geological fault that divides the upper and lower cities. This geological structure is associated with the formation of the Recôncavo Sedimentary Basin and, later, with the marine transgression that originated the Todos os Santos Bay23.

Climatic factors also exert a significant influence on sea level rise. Mimura highlight that ocean warming promotes significant thermal expansion, raising mean sea levels24. Vermeer and Rahmstorf propose a direct association between sea level variations and global mean temperature, using historical data to model this connection25. The IPCC itself highlights that more than 90% of the excess heat in the climate system has been absorbed by the oceans since 197026. Regarding rainfall contributions, although less significant than other mechanisms, Milly, Wetherald, Dunne and Delworth warn that their impact should not be underestimated, particularly on interannual and decadal scales27. Evidence of this influence is observed in regional studies, such as on the coast of Bangladesh, where continental precipitation has been shown to be a determining factor in the variability of sea level extremes28.

This study investigates the dependencies between sea level time series along the Brazilian coast and geodetic-meteorological variables at different time scales, employing cross-correlation approaches. For bivariate analyses, the coefficient \(\rho _{DCCA}\)29 was used to quantify the correlations between mean sea level (dependent variable y) and each independent variable individually-GNSS altimetry (x1), precipitation (x2), and temperature (x3). For multivariate evaluations, the \(DMC^2_x\) coefficient was adopted30, which generalizes the \(\rho _{DCCA}\) by simultaneously measuring the cross-correlation between y and multiple independent variables (x1, x2, x3...xk). The methodological selection defined mean sea level as the response variable, while altimetry, precipitation and temperature were established as predictors, thus allowing an integrated analysis of the influencing factors.

The \(\rho _{DCCA}\) has been widely applied, as documented in the literature31,32,33,34,35,36. The \(DMC^2_x\), although more recent, has already demonstrated its potential both in simulation experiments and in empirical studies30,37, with applications ranging from financial crisis analysis38 to climate modeling39 and biomedical processing40. However, a systematic review of the Scopus database (period 2018–2025)Footnote 1 revealed a significant methodological gap: no studies were identified that applied \(DMC^2_x\) to model, at multiple time scales, the relationship between sea level (y) and the integrated set of independent variables investigated here (precipitation, air temperature, and GNSS altimetry). This finding, corroborated by the absence of publications combining the descriptors “DMC coefficient,” “sea level,” and “multiscale analysis” in the last five years, confirms the pioneering nature of the present approach.

Thus, while bivariate approaches, such as the \(\rho _{DCCA}\) coefficient, are valuable tools for diagnosing multiscale relationships between two variables, they are inherently limited in isolating the combined and potentially interactive effects of multiple forcings acting simultaneously on MSL. Conversely, traditional multivariate techniques, often based on assumptions of stationarity, fail to capture the scale-dependent dependencies that are fundamental to geophysical processes. The \(DMC^{2}_{x}\) coefficient overcomes this dual limitation by generalizing detrended cross-correlation analysis to multiple variables, while simultaneously preserving the multiscale property and robustness against nonstationarity. Therefore, its application in this study represents an innovative and particularly suitable methodological approach to unravel the integrated and scale-dependent contributions of geodetic (GNSS), hydrological (precipitation), and thermal (temperature) forcings to coastal sea-level variability.

In addition to the coefficients mentioned, this study used entropy to analyze fluctuations in sea level time series to quantify their degree of predictability. Low entropy values indicate regular patterns, associated with high forecasting skill, whereas high values reflect chaotic or noise-influenced behaviors, which hinder the anticipation of trends. This metric is well established in time-series research, with applications in meteorology41, medicine42, and coastal studies43,44,45, including sea level analyses.

This article is structured into five sections: (i) Introduction, which contextualizes the research and its objectives; (ii) Methods, detailing data selection and analytical procedures; (iii) Results, presenting the main findings; (iv) Discussion, interpreting the results; and (v) Conclusions. This organization seeks to clarify both the foundations and findings of the research, ensuring methodological transparency.

Methods

Study area

Every tide gauge station consists of a structure or equipment installed in coastal areas, ports, or estuaries, and serves to record water level variations over time. According to Keysers46, this information can also be useful in various applications, such as: reducing survey data for the maintenance and expansion of port and waterway capacity, implementing infrastructure in coastal regions, and studying possible adaptation and mitigation measures for the impacts of global sea level rise.

Fig. 1
Fig. 1
Full size image

Tide gauge stations controlled by IBGE and object of study of the research. Source:47.

These stations are spread all over the world and Brazil is no exception. In particular, seven of them, one of which has already been decommissioned, are operated by the Brazilian Institute of Geography and Statistics (IBGE) (Fig. 1), through the Permanent Tide Gauge Network for Geodesy (RMPG) and their data are freely available on the institute’s website.

Data processing

In this study, mean sea level was adopted as the dependent variable, while air temperature, precipitation, and GNSS altimetry were considered independent variables. Figure 2 illustrates the methodological steps employed to analyze the correlation between the modeled data.

Fig. 2
Fig. 2
Full size image

Methodological flowchart for the analysis of fluctuations in the mean sea level time series (dependent variable) and the independent variables (air temperature, precipitation, and GNSS altimetry). The steps include: (1) data acquisition and preprocessing, (2) imputation, (3) exploratory analysis of the time series, (4) determination of the entropy of the mean sea level series, (5) application of detrended cross-correlation measurement methods, (6) results. Source: Created by the authors.

For the mean sea level data, the IBGE time series were used for each of the stations in the permanent tide gauge network for geodesy (RMPG), structured as follows: the date in the format dd/mm/yyyy in the first column, the time in the format hh:mm in the second column, the value of the observed water level in meters, the value of the astronomical forecast also in meters and the value of the observation applied to the 168-hour filter. The analyzed period was defined based on the availability of data from the tide gauge stations, covering the interval between 2002 and 2023. The characteristics of each of the stations and their respective collection periods and rates can be found by accessing the IBGE website at: Permanent Tide Gauge Network for Geodesy (Visited on 07/01/2022)

For the air temperature and precipitation data, the closest automatic meteorological stations to each tide gauge station and with a satisfactory data record were used, maintained by the National Institute of Meteorology (INMET). In addition to air temperature and precipitation, the files provided by INMET include other meteorological parameters, organized as follows: measurement date in the format yyyy-mm-dd, time in the format hhmm, followed by values of total precipitation, atmospheric pressure at the station level, atmospheric pressure reduced to sea level, air temperature, relative humidity, and wind speed. Further details can be found on the INMET website at: INMET: National Institute of Meteorology (Visited on 03/20/2025).

Following the same proximity criterion, GNSS altimetry data were obtained from the NGL (Nevada Geodetic Laboratory) rather than from the Brazilian Continuous Monitoring Network of IBGE (RBMC). This choice was primarily motivated by the need for homogeneity and internal consistency of the time series, which are critical factors for the robustness of the subsequent correlation analyses. The NGL processes data globally, on a daily basis, within a single and stable geodetic reference frame (IGS14), employing a uniform processing strategy for all stations. In contrast, time series derived from RBMC, although of great value for national geodetic applications, may contain discontinuities (jumps) on the order of a few millimeters associated with changes in the processing reference frame over time. Additionally, the use of a single processing center (NGL) avoids the introduction of systematic biases or discrepancies in vertical velocity estimates that may arise from the combination of heterogeneous processing strategies (different software, models, and reference station selections). Such biases, typically on the order of mm/yr, are comparable in magnitude to the signal of interest (vertical crustal motion) and could obscure or distort cross-correlation signals with sea level. Therefore, the homogeneous dataset provided by the NGL constitutes a more reliable and consistent source for crustal vertical motion at the locations of the tide gauge stations.

The formatting of the files provided by the NGL follows the structure: station code; date in the format yymmmdd; decimal year; modified Julian day; GPS week; GPS day of week; longitude of the reference meridian (in degrees); integer and fractional parts of the east, north, and height coordinates; antenna height as specified in the RINEX header; the standard deviations of the east, north, and height coordinates; and the nominal latitude, longitude, and height of the station. Further details can be found on the NGL website: geodesy.unr.eduNevada Geodetic Laboratory (Visited on 03/10/2025).

Table 1 presents the approximate distances between each tide gauge station and its nearest geodetic and meteorological stations used in this study.

Table 1 Approximate distance between the stations used.

Before the exploratory analyses and bivariate and multivariate analyses, the entire set of tide gauge, geodetic, and meteorological data went through the imputation process to fill in missing values, using the ImputTS and Forecast packages, which will be described in detail in a subsequent section.

Statistical methods

Sea level monitoring supports a wide range of applications, as already mentioned in this text. Due to this reality, researchers may be interested in correlating sea level fluctuations with other climatic and/or environmental variables20,21,22,24,25,27,28.

One of the coefficients used to measure correlation between the fluctuations of two signals as a function of time is the Pearson Coefficient48,49,50, although there are those who question the efficiency of the method for non-stationary time series51,52.

A good option is to use \(\rho _{DCCA}\), modeled by Zebende in 201129, it is a coefficient capable of quantifying the level of cross-correlation, based on DFA-Detrended Fluctuation Analysis53 and DCCA-Detrended Cross-Correlation Analysis54, intended for the estimation of the cross-correlation coefficient at different scales of size n. It is defined as the ratio between the detrended covariance function \(F^2_{DCCA}\) and the detrended variance function \(F_{DFA}\), as expressed in Eq. 129:

$$\begin{aligned} \rho _{\textrm{DCCA}}(n) \approx \frac{F^2_{\textrm{DCCA}}(n)}{F_{\textrm{DFA}_1}(n) \cdot F_{\textrm{DFA}_2}(n)} \end{aligned}$$
(1)

One of its advantages is the ability to measure the correlation between series on different time scales, and its analysis is performed without the trend component that tends to mask true correlations35. According to Silva Filho, its construction takes into account the order of pairs of time series, which does not contradict the basic principle of series analysis, which is the temporal dependence between them, a fact that is not present in some cross-correlation coefficients, such as Pearson’s correlation coefficient55.

Regardless of whether you use the standard Pearson coefficient or \(\rho _{DCCA}\), both are limited to correlating only two variables. As a generalization of \(\rho _{DCCA}\), the \(DMC^{2}_{x}\) (Detrended Multiple Cross-Correlation Coefficient) emerges, devised by Zebende and da Silva Filho30, capable of measuring the level of cross-correlation between a dependent variable and two or more independent variables.

According to Zebende and da Silva Filho30, it is assumed that we have several time series \((i+1)\) with N points each, with \(\{y\}\) being the dependent variable and \(\{x_1\}\), \(\{x_2\}\), ..., \(\{x_i\}\) being the independent variables. Therefore:

$$\begin{aligned} DMC_{x}^{2}(n) \equiv \rho _{y,x_i}(n)^T\rho ^{-1}(n)\rho _{y,x_i}(n) \end{aligned}$$
(2)

Here

$$\begin{aligned} \rho _{y,x_i}(n)^T = [\rho _{y,x_1}(n),\rho _{y,x_2}(n),...,\rho _{y,x_i}(n)] \end{aligned}$$
(3)

is the vector of detrended cross-correlations between the predictor variables (independent variables) and the target variable (dependent variable). \(\rho (n)\) is the detrended cross-correlation matrix, i x j, of the predictor variables (independent variables). By definition, \(\rho _{x_i,x_j}(n) = \rho _{x_j,x_i}(n)\) and \(\rho _{x_i,x_i}(n) = 1\). Therefore:

$$\begin{aligned} \rho ^{-1}(n) \equiv \begin{pmatrix} 1 & \rho _{x_1,x_2}(n) & \rho _{x_1,x_3}(n) & ... & \rho _{x_1,x_i}(n) \\ \rho _{x_1,x_2}(n) & 1 & \rho _{x_2,x_3}(n) & ... & \rho _{x_2,x_i}(n) \\ \vdots & \vdots & \vdots & ... & \vdots \\ \rho _{x_1,x_i}(n) & \rho _{x_2,x_i}(n) & \rho _{x_3,x_i}(n) & ... & 1 \\ \end{pmatrix}^{-1} \end{aligned}$$
(4)

Considering the DCCA (Detrended Cross-Correlation Analysis) correlations between the analyzed variables, we have the following relationships:

  • \(\rho _{y,x_1}\): DCCA coefficient between \(y\) and \(x_1\)

  • \(\rho _{y,x_2}\): DCCA coefficient between \(y\) and \(x_2\)

  • \(\rho _{y,x_3}\): DCCA coefficient between \(y\) and \(x_3\)

  • \(\rho _{x_1,x_2}\): DCCA coefficient between \(x_1\) and \(x_2\)

  • \(\rho _{x_1,x_3}\): DCCA coefficient between \(x_1\) and \(x_3\)

  • \(\rho _{x_2,x_3}\): DCCA coefficient between \(x_2\) and \(x_3\)

In the specific case of this study, the dependent variable y is the mean sea level and the three independent variables are GNSS altimetry (\(x_1\)), precipitation (\(x_2\)) and air temperature (\(x_3\)), so that the reduced expression can be expressed as follows

$$\begin{aligned} DMC_{x}^{2}(n)&= \frac{1}{ 1-\rho _{x_1,x_2}^2-\rho _{x_1,x_3}^2-\rho _{x_2,x_3}^2 + 2 \rho _{x_1,x_2}\rho _{x_1,x_3}\rho _{x_2,x_3}} \times \nonumber \\&\Big [ \rho _{y,x_1}^2 (1-\rho _{x_2,x_3}^2) + \rho _{y,x_2}^2 (1-\rho _{x_1,x_3}^2) + \rho _{y,x_3}^2 (1-\rho _{x_1,x_2}^2) \nonumber \\&\quad -2 \rho _{y,x_1} \rho _{y,x_2} (\rho _{x_1,x_2}-\rho _{x_1,x_3}\rho _{x_2,x_3}) \nonumber \\&\quad -2 \rho _{y,x_1} \rho _{y,x_3} (\rho _{x_1,x_3}-\rho _{x_1,x_2}\rho _{x_2,x_3}) \nonumber \\&\quad -2 \rho _{y,x_2} \rho _{y,x_3} (\rho _{x_2,x_3}-\rho _{x_1,x_2}\rho _{x_1,x_3}) \Big ]. \end{aligned}$$
(5)

The numerator representing the sum of the squared individual correlations between y and each \(x_i\), penalized by the interdependencies among the predictors. The denominator adjusts for collinearity among the predictors. The greater the correlation between \(x_1\), \(x_2\) and \(x_3\), the smaller the value of the denominator. The \(DMC^2_x\) coefficient also depends on the scale n, inheriting the multiscale nature of the DCCA analysis. In other words, the denominator adjusts for the influence of correlations among the predictors. The greater the multicollinearity among \(x_1\), \(x_2\) and \(x_3\), the smaller the denominator will be.

To validate the equivalence between the full matrix formulation of the coefficient (Eq. 2) and its reduced expression (Eq. 5), a direct comparative analysis was conducted. The objective was to numerically demonstrate the similarity of the two approaches, ensuring that the model simplification would not compromise its accuracy. The absolute residual difference between the predictions of the matrix model (DMCmat) and the reduced model (DMCred) was calculated for the entire time series, as defined by:

$$\begin{aligned} Dif = DMC_{mat}-DMC_{red} \end{aligned}$$
(6)

where (Dif) represents the difference between the two approaches.

The analysis of the resulting plot (Fig. 3) provides the necessary quantitative evidence. The discrepancies between the models are on the order of \(10^{-16}\), magnitudes that lie at the threshold of the floating-point numerical precision of the computational system used. Therefore, the results numerically confirm the equivalence between the two formulations.

Fig. 3
Fig. 3
Full size image

A-\(DMC^{2}_x\) coefficient for different time scales of size n for EMSAL time series (May/2007 to December/2023) defined with the reduced expression (Eq. 5) and by the matrix method (Eq. 2). B-Difference by the matrix method and by the reduced expression. Source: Created by the authors.

For any case, \(0 \le DMC^2_x \le 1\), which can be divided into intervals, suggesting an intensity of detrended multiple cross-correlation, as shown in Table 2.

Table 2 Detrended Multiple Cross-Correlation Intensity.

Specifically for the \(DMC^2_x\), the sliding window procedure56,57 was used. This, initially used with the Hurst exponent in the temporal evaluation of financial series, consists of choosing a window of size W, usually of 1000 observations58, but it is possible to find analyses with 365 points (59,60). With this, it is possible to measure the degree of temporal dependence of sea level on the other variables (GNSS altimetry, precipitation and air temperature) as a function of time dynamically.

In this research, the sliding window procedure was executed with the help of the R language61, using specific scales for n equal to 7, 15, 30 and 60 days.

Additionally, entropy was used to measure the level of uncertainty, randomness, or unpredictability associated with the mean sea level variable. Inspired by information theory developed by Claude Shannon (1948), it quantifies the degree of disorder or the amount of information needed to describe a given data set62.

Among the methods for measuring entropy, this research used the FastApEn method, an evolution of ApEn. The ApEn (Approximate Entropy) method was proposed by Pincus (1991) as a measure of the regularity and predictability of noisy time series. Its main advantage lies in its ability to handle relatively small series, being less computationally demanding in relation to the structural complexity of the series63,64.

Let \(x = \{x_1, x_2, \dots , x_N\}\) be a time series of length N.

  1. 1.

    For each \(i = 1, \dots , N-m + 1\), we define the vector:

    $$\textbf{u}(i) = \{x_i, x_{i+1}, \dots , x_{i+m-1}\}$$
  2. 2.

    The distance between two m-dimensional vectors is defined as:

    $$d[\textbf{u}(i), \textbf{u}(j)] = \max _{k = 1, \dots , m} |x_{i+k-1}-x_{j+k-1}|$$
  3. 3.

    For a tolerance \(r > 0\), we calculate:

    $$C_i^m(r) = \frac{1}{N-m + 1} \sum _{j = 1}^{N-m + 1} \Theta (r-d[\textbf{u}(i), \textbf{u}(j)])$$

    Where \(\Theta (\cdot )\) is the Heaviside function.

  4. 4.

    The approximate entropy is given by:

    $$\text {ApEn}(m, r, N) = \phi ^m(r)-\phi ^{m+1}(r)$$

    with:

    $$\phi ^m(r) = \frac{1}{N-m + 1} \sum _{i = 1}^{N-m + 1} \ln C_i^m(r)$$

Where:

  • m: Embedded dimension (usually \(m=2\));

  • r: Tolerance, usually a fraction of the standard deviation (SD) of the series (\(r = 0.2 \times \text {SD}\), for example);

  • N: Size of the series (or window) considered.

However, ApEn presents two important limitations: the first is that it strongly depends on the sample size and the parameters m, r and \(\tau\); and the second is its sensitivity to the inclusion of new data, since it considers self-comparisons in the subsequences63,64.

The FastApEn (Fast Approximate Entropy) method65,66 is a computationally optimized version of ApEn63. Its calculation is based on the same principle of counting similar patterns, but with simplifications that make it more efficient for large data sets. The main advantage is the significant improvement in computational performance without significant loss of accuracy, which makes it particularly useful for large data sets, such as the series evaluated in this research.

To facilitate interpretation and comparison between different windows, scales and/or series, the entropy measures were normalized to be understood in the range from 0 to 1 inclusive, through the Eq. 7.

$$\begin{aligned} E_{norm} = \frac{x-\min (x)}{\max (x)-\min (x)} \end{aligned}$$
(7)

In the Eq. 7, \(x_i\) denotes entropy for a given time series at point i and \(min(x_i)\) the minimum entropy value observed in the time series and \(max(x_i)\) the maximum entropy value observed in the time series under study.

Results

Exploratory data analysis

Table 3 presents the descriptive statistics (average, standard deviation, coefficient of variation, skewness, and the results of the KSPP stationarity test) for the time series of mean sea level, GNSS-derived altimetry, precipitation, and air temperature at the seven tide gauge stations analyzed. Moreover, the percentage of imputed data in each of the series considered is reported.

Table 3 Exploratory Analysis of Time Series and Imputation Percentage.

It should be note that the mean MSL variable uses an IBGE-specific reference, and only its variations should be considered absolute information. The coefficient of variation, however, reveals distinct patterns of stability and variability that help to better understand local climatic and oceanographic behavior. In summary, the MSL presents low coefficients of variation (CV) in all stations, particularly in EMFOR (0.72%) and EMSAL (0.85%), indicating high stability in these time series. Even in locations such as EMARC (6.04%) and EMMAC (8.53%), the CV remains relatively low, which is expected for this variable.

The analysis of the coefficients of variation (CV) reveals distinct patterns among the investigated variables. GNSS altimetry presents remarkable stability, with CVs below 0.4% at all stations, confirming its accuracy for geodetic monitoring and its suitability as a reliable reference for detecting real sea level variations. In contrast, precipitation demonstrates the greatest variability, with CVs above 100% at all stations-reaching extremes of 396.73% at EMARC and 256.70% at EMFOR-reflecting its intermittent nature and pronounced seasonality along the Brazilian coast, marked by alternating dry periods and intense rainfall events67,68,69. Air temperature exhibits intermediate behavior, with CVs between 3.18% (EMFOR) and 20.98% (EMIMB), indicating relative thermal stability, although with discernible seasonal fluctuations (Table 3). These findings are consistent with the results of the KPSS test70, which identified non-stationarity in this variable in most seasons, suggesting the presence of trends, seasonal cycles, heteroscedasticity, or long-term temporal dependence.

The missing data in the time series (Table 3) were imputed using a combined approach that used the Kalman filter, implemented through the R package imputeTS71, and seasonal decomposition, performed with the forecast package72. This dual strategy was adopted considering the specific characteristics of the analyzed time series, where conventional imputation methods, typically based on correlations between attributes, are inadequate. As demonstrated by Pereira et al. in a study with INMET data73, univariate time series require algorithms capable of exploring intrinsic temporal dependencies, such as the methods employed here. The choice to combine the Kalman filter with seasonal decomposition is justified by the former’s ability to capture underlying temporal patterns, while the latter efficiently isolates trend and seasonal components. The prior validation of these techniques in similar contexts73 attests to the robustness of the imputation methodology adopted in this study.

All time series presented missing values, which varied from 0.00 to 33.20%, being from 4.9 to 21.41% for Santana, 3.05 to 11.59% for Belém, 0.00 to 15.30% for Fortaleza, 1.60 to 6.72% for Salvador, 0.00 to 7.49% for Arraial do Cabo, 2.04 to 33.20% for Macaé and 3.83% to 13.80% for Imbituba (Table 3).

The following section presents the results of the entropy analysis applied to sea level time series, with the main objective of quantifying the degree of predictability of tide gauge behavior at the studied coastal stations. This approach will allow a comparative assessment of the dynamic complexity between the different locations analyzed, as well as the characterization of the observed temporal variability patterns. Furthermore, the results will provide support for investigating possible relationships between the obtained entropy values and local oceanographic factors specific to each station.

Entropy

Before correlating the tide gauge series with the other studied variables (GNSS altimetry, precipitation, and air temperature), an entropy analysis of the mean sea level was performed at each station (Arraial do Cabo, Belém, Fortaleza, Imbituba, Macaé, Salvador and Santana) to quantify the uncertainty associated with this variable. For this calculation, the FastApEn method65,66, a computationally optimized version of ApEn63, was adopted, applying a 30-day sliding window (w = 30), a period that corresponds to the lunar synodic cycle ( 29.5 days)74,75. The results are presented in Fig. 4, while Table 4 displays the descriptive statistics of the analysis, with normalized values in the interval [0,1].

Fig. 4
Fig. 4
Full size image

Entropy of the mean sea level time series of tide gauges on the Brazilian coast for a 30-day window (w=30). Note: EMARC, EMBEL, EMIMB, EMFOR, EMMAC, EMSAL, and EMSAN refer to the tide gauge stations analyzed, and the entropy values were normalized between 0 and 1, with the horizontal red line in the corresponding plots representing the mean entropy over the period. Source: Created by the authors.

Table 4 Mean Sea Level Entropy of Tide Gauge Stations on the Brazilian Coast by the FastApEn method normalized to a 30-day sliding window (w=30).

Analysis of the results reveals that the Salvador station presented the highest mean entropy value (0.5262 on a 0-1 normalized scale) among all the stations studied, indicating the greatest dynamic complexity and, consequently, less predictable behavior in its time series. At the same time, Salvador stood out for its lowest coefficient of variation (23.27%), demonstrating that, despite its high complexity, its dynamic patterns maintain remarkable temporal stability. These findings are visually corroborated by Fig. 4, which shows entropy fluctuations within a relatively narrow range throughout the analyzed time series with a 30-day window (w=30). The combination of these factors suggests a hydrodynamic system with marked nonlinear characteristics, but with an underlying structure that remains consistently stable throughout the observation period.

The Imbituba station presented a high average entropy (0.496), combined with one of the lowest coefficients of variation (25.31%) and the lowest asymmetry among all stations (0.019). These results suggest that Imbituba has a system with complex but stable dynamics, characterized by an approximately symmetric distribution of entropy values throughout the analyzed period (w = 30). In contrast, the Santana station showed the lowest average entropy (0.3631) and the highest coefficient of variation (32.98%), indicating a more predictable behavior, but with greater temporal variability in system complexity. These distinct patterns between Imbituba and Santana were consistently captured by the 30-day moving window analysis (w = 30), which, in this research, was adequate to characterize the temporal variability of coastal systems.

The Arraial do Cabo, Belém and Fortaleza stations presented relatively low mean entropy values (0.4415 to 0.4890 in 30-day windows-w=30), combined with coefficients of variation greater than 31.0%. These results indicate an intermediate dynamic, both in complexity and temporal stability, for the analyzed period. Additionally, these stations showed the highest asymmetry values (0.2549, 0.2312 and 0.2105, respectively), suggesting the occurrence of sporadic events of greater complexity, potentially associated with specific meteorological or oceanographic phenomena during the analysis cycle. In contrast, Macaé presented a mean entropy of 0.4852 (w=30), with a coefficient of variation of 28.82% and relatively low asymmetry (0.0837). These values reveal a slightly asymmetric distribution, with a tendency toward more stable behavior over the 30-day analysis windows.

The next section investigates the bivariate cross-correlations between sea level data and climate variables (GNSS altimetry, precipitation and air temperature) at multiple time scales (n days). This analysis will allow us to characterize the evolution of dependency relationships across time scales.

Cross-correlation coefficient

The cross-correlation analysis between mean sea level and each independent variable (GNSS altimetry, precipitation and air temperature) was performed using the \(\rho _{DCCA}\) coefficient at multiple time scales (n days), as shown in Figs. 5, 6 and 7.

Fig. 5
Fig. 5
Full size image

Mean sea level coefficient \(\rho _{DCCA}\) vs. GNSS altimetry of tide gauge stations along the Brazilian coast. Note: EMARC, EMBEL, EMIMB, EMFOR, EMMAC, EMSAL, and EMSAN refer to the tide gauge stations analyzed. Source: Created by the authors.

Figure 5 presents the cross-correlation analysis (\(\rho _{DCCA}\)) between mean sea level and GNSS altimetry, revealing distinct patterns among the analyzed coastal stations. The Belém and Fortaleza stations stand out for exhibiting significant positive correlations at longer time scales (\(n>\) 100 days), while Arraial do Cabo, Macaé, Salvador, Santana and Imbituba present predominantly negative values. In particular, Salvador and Santana show the most pronounced anticorrelations, a result consistent with the expected effect of vertical ground motions on mean sea level measurements.

These results highlight the striking heterogeneity in coastal dynamics along the Brazilian coast. The distinct patterns observed suggest that, in addition to crustal movements, other local and regional factors, such as geomorphological characteristics, ocean circulation patterns, and variations in the continental shelf, significantly influence the relationship between GNSS altimetry and mean sea level recorded at each station. Analysis across multiple time scales (n days) allowed us to capture these regional differences more comprehensively.

Fig. 6
Fig. 6
Full size image

Mean sea level coefficient \(\rho _{DCCA}\) vs. precipitation at tide gauge stations along the Brazilian coast. Note: EMARC, EMBEL, EMIMB, EMFOR, EMMAC, EMSAL, and EMSAN refer to the tide gauge stations analyzed. Source: Created by the authors.

Figure 6 shows the cross-correlation between sea level and daily accumulated precipitation, revealing distinct patterns among the analyzed stations. Salvador and Santana stand out with the highest positive correlation values (\(\rho _{DCCA}\) greater than 0.6) at scales greater than 200 days, while Belém exhibits a moderate trend (\(\rho _{DCCA}\) is approximately 0.4) at the same scales-a result that may be attenuated by the influence of Amazon River dynamics and the large local tidal amplitudes. Fortaleza and Imbituba display more variable behavior, alternating between positive and negative values across different temporal scales, suggesting a less stable relationship between precipitation and sea level. Macaé and Arraial do Cabo, in turn, exhibit positive but weaker correlations (\(\rho _{DCCA}\) less than 0.2), indicating a possibly more localized or indirect influence of precipitation. These results reinforce the need to consider both temporal scales and regional specificities when assessing ocean-atmosphere interactions.

Fig. 7
Fig. 7
Full size image

Coefficient \(\rho _{DCCA}\) of mean sea level vs. air temperature at tide gauge stations along the Brazilian coast. Note: EMARC, EMBEL, EMIMB, EMFOR, EMMAC, EMSAL, and EMSAN refer to the tide gauge stations analyzed. Source: Created by the authors.

Figure 7 presents the cross-correlations between mean sea level and air temperature, revealing distinct patterns among the analyzed stations. In general, negative correlations predominate on various time scales, although with significant regional variations: Arraial do Cabo (EMARC) stands out as a notable exception, presenting an increasing positive correlation that exceeds 0.3 on time scales longer than 100 days, suggesting a significant influence of temperature on sea level in this specific region. Fortaleza and Belém also exhibit positive correlations, but restricted to intermediate time scales (30 to 100 days), followed by sharp declines on larger time scales. In contrast, Imbituba, Santana and Salvador show a strong negative correlation, indicating an inverse relationship between the analyzed phenomena. Macaé presents the most complex pattern, with oscillating correlations without a defined trend, which points to a particularly complex local dynamic between surface temperature and sea level.

The next section presents the results of the multiple cross-correlation analysis between sea level and other variables, using the Detrended Multiple Cross-Correlation-\(DMC^2_x\) analysis method with and without the sliding window procedure.

Detrended multiple cross-correlation

This section presents the level of multiple correlation between the dependent variable (y), represented by mean sea level, and the independent variables: GNSS altimetry (x1), precipitation (x2), and air temperature (x3) (Fig. 8). To evaluate the temporal dynamics of these correlations, the \(DMC^2_x\) method was applied with a 365-day window (w=365), allowing the analysis of daily variations over the studied period for specific scales (Fig. 9). It is worth noting that the window size took into account previous studies that used the cross-correlation coefficient to analyze similar variables60,78.

Fig. 8
Fig. 8
Full size image

\(DMC^2_x\) of the Mean sea level with GNSS altimetry, precipitation and temperature. Note: EMARC, EMBEL, EMIMB, EMFOR, EMMAC, EMSAL and EMSAN refer to the tide gauge stations analyzed, and n corresponds to the temporal scale. Source: Created by the authors.

Figure 8 shows that stations such as Arraial do Cabo, Belém, Fortaleza, Salvador, Macaé and Santana exhibit significant correlation peaks at scales greater than 100 days. This behavior may indicate a cumulative influence of independent variables on mean sea level at seasonal to interannual scales, corroborating previous studies that identify that climate trends and interannual temperature variability significantly affect sea level79,80. In Imbituba, the values presented in the \(DMC^2_x\) are more significant in short time windows, which may be related to more immediate local effects, such as intense rainfall events temporarily affecting sea level81,82,83. Salvador and Santana stand out, as they present higher values at practically all scales, indicating a more persistent relationship between climatic and geodetic variables and mean sea level. One possibility is that these locations may be more sensitive to the combined effects of subsidence and climate variations84.

Fig. 9
Fig. 9
Full size image

Detrended multiple correlation for specific time scales (\(n=7\), \(n=15\), \(n=30\) and \(n=60\) days) between mean sea level (y), GNSS altimetry (\(x_1\)), precipitation (\(x_2\)) and air temperature (\(x_3\)) for a 365-day window (\(w=365\)). Note: EMARC, EMBEL, EMIMB, EMFOR, EMMAC, EMSAL, and EMSAN refer to the tide gauge stations analyzed. Source: Created by the authors.

Figure 9 shows the level of multiple correlation between mean sea level as the dependent variable and GNSS altimetry, precipitation and air temperature as independent variables, using the sliding window approach. Visually, it is already possible to perceive the large variation between seasons over the years, which highlights the non-stationary nature of the relationship between mean sea level and environmental forcings79,85. The Salvador and Santana stations present higher values for 60-day windows, which is related to a stronger effect to the independent variables on mean sea level on monthly scales. Dangendorf in 2014 already highlighted the delayed response of sea level to hydrometeorological processes80. These same stations present the largest \(DMC^2_x\) amplitudes (close to 0.8), indicating greater sensitivity of the dependent variable to environmental forcings, possibly due to more intense seasonal variations. In contrast, Belém and Macaé present more modest values, indicating that they are associated with regions less subject to the combined effects of the independent variables. Although the graph does not distinguish which variable is most correlated, it is possible that temperature and precipitation play an important role on 30- and 60-day timescales, as observed in other studies on oceanic and atmospheric forcings84. Abrupt peaks in the curves, as is the case in Fortaleza in 2021, may be related to extreme events, such as intense rainfall or heat waves, common in this region.

Discussion

The results of the exploratory analysis indicate that the MSL presents high stability at all stations analyzed, with consistently low coefficients of variation, a characteristic of tide gauge series and indicative that the detected variations reflect larger-scale physical processes rather than measurement noise. The greater stability identified at EMFOR and EMSAL, for example, may be associated with a lesser influence of high-energy hydrodynamic regimes or variable coastal currents, while slightly higher values at EMARC and EMMAC may reflect greater exposure to regional meteorological oscillations and more complex ocean-atmosphere interactions.

In GNSS altimetry, CVs remained below 0.4% in all stations, associated with the literature on vertical movements of the Earth’s surface86,87,88,89, only highlights its reliability as a geodetic reference for detecting real variations in sea level.

On the other hand, precipitation showed high variability, reflecting its intermittent and seasonal nature along the Brazilian coast, which is consistent with the literature on tropical and subtropical rainfall regimes90,91,92,93,94. It is important to pay attention to this rainfall instability, as it may indirectly influence the behavior of the MSL, especially in estuarine regions or near river mouths, where freshwater flows can induce temporary variations in sea level.

Air temperature showed intermediate stability, evidenced by moderate CVs, suggesting that this variable is more predictable than precipitation, but still subject to well-defined seasonal fluctuations. Even so, the detection of nonstationarity by the KPSS test in most of the series reinforces the need to consider long-term trends and cycles when modeling this variable.

The imputation of missing data, although necessary to enable continuous time series analyses, introduces potential biases that must be considered. High imputation rates (e.g., Macaé with 33.2% for the MSL variable) may affect the results in two main ways. First, for entropy analysis, algorithms such as the Kalman filter tend to generate smoother series, which may underestimate the true dynamic complexity, particularly at stations with the highest proportions of missing data (EMMAC, EMSAN). Second, for detrended correlation analyses (\(\rho _{DCCA}\) and \(DMC^{2}_{x}\)), imputation may artificially inflate short-term autocorrelation, potentially affecting the coefficients at shorter time scales. Nevertheless, the central findings of this study demonstrate robustness for several reasons: the strongest and most physically interpretable correlations emerge at long time scales (\(n > 100\) days), where the effect of high-frequency imputation is minimized; the coherent spatial patterns observed, such as the marked contrast in temperature correlation between Arraial do Cabo and Imbituba, are consistent with known geophysical processes and are unlikely to be artifacts of imputation; and stations with low imputation rates (EMSAL, EMARC) exhibit consistent and plausible signals. Therefore, although results for series with high imputation levels should be interpreted with caution, the main conclusions regarding the relative and integrated influence of the investigated variables remain valid.

Therefore, the combined imputation approach, joining the Kalman filter, implemented through the R package imputeTS71, and seasonal decomposition, performed with the forecast package72, from a methodological point of view, proved to be adequate to deal with the different natures of observed variability, ensuring statistical coherence of the reconstructed series and preserving their temporal structures.

The entropy analysis reinforced that the dynamics of the tide gauge series are not homogeneous across locations, reflecting the influence of local characteristics such as wind patterns, meteorological systems and coastal currents. The combination of high entropy and low variability observed in Salvador and Imbituba suggests the presence of structured chaotic processes, while the lower entropy associated with greater variability, as in Santana, indicates a greater influence of deterministic or periodic components. These results demonstrate that the MSL responds to multiple forcings scenarios and that the predictability of its dynamics depends on both the temporal scale and the regional oceanographic and meteorological context79,95.

Cross-analysis using the \(\rho _{DCCA}\) coefficient revealed distinct patterns of correlation between mean sea level and the evaluated variables (GNSS altimetry, precipitation, and air temperature) at the seven coastal stations studied. The comparison between the GNSS altimetry and mean sea level series generally showed stronger and more consistent correlations, especially at the Salvador and Santana stations, where \(\rho _{DCCA}\) values greater than 0.6 were observed on longer time scales, indicating good agreement between the variables.

In contrast, the relationship between mean sea level and precipitation was weaker and more variable across locations. Although Arraial do Cabo and Fortaleza showed positive correlations at intermediate scales, stations such as Imbituba and Santana exhibited negative correlations at longer scales, suggesting that the effect of precipitation on the mean sea level may be indirect, driven by factors such as surface runoff, the local hydrological regime and the topography of the coastal region96.

Regarding air temperature, the stations presented more pronounced contrasts. Arraial do Cabo stood out with an increasing positive correlation across the analyzed timescales, suggesting a consistent influence97,98,99, while other stations, such as Imbituba and Santana, revealed significant negative correlations. These patterns may be associated with local ocean-atmospheric processes, such as upwelling or variations in salinity and water density100,101.

The statistical patterns revealed by the \(\rho _{DCCA}\) and \(DMC^{2}_{x}\) analyses find coherent explanations within the specific oceanographic, hydrological, and geological contexts of each study region.

The strong anticorrelations between GNSS altimetry and MSL at Salvador (EMSAL) and Santana (EMSAN) (Fig. 5) provide robust statistical evidence of the predominance of vertical land motion (VLM) at these locations. In Salvador, this signal is attributed to differential subsidence along the geological fault system of the Recôncavo Graben, a structure associated with the sedimentary basin23. In Santana, located over the thick sedimentary basin of the Amazon River mouth, VLM likely results from isostatic compaction of sediments. In these cases, the tide gauge records a relative sea-level rise that is amplified or even dominated by land subsidence, highlighting the critical need for geodetic corrections to properly isolate the true oceanic signal.

The generally weak and unstable correlations with precipitation (Fig. 6) reflect the complex and multiscale nature of the hydrological forcing. Precipitation influences coastal MSL through mechanisms that act in opposite directions and across different time scales: the direct input of freshwater (raising MSL) and the elastic loading of the crust due to increased continental water storage (lowering relative MSL). The competition and partial cancellation between these effects, combined with the high spatial variability of rainfall, explain the low values of \(\rho _{DCCA}\) and the variable contribution of precipitation to \(DMC^{2}_{x}\) over time (Fig. 9).

The regional contrasts in the correlations with air temperature (Fig. 7) directly map distinct oceanographic regimes. The strong and persistent positive correlation at Arraial do Cabo (EMARC) is a clear statistical signature of thermal expansion (the steric effect) as the primary control on local MSL. In contrast, the negative correlations at Imbituba (EMIMB) and Santana (EMSAN) suggest the dominance of dynamic processes. In the Imbituba region, the Cape Santa Marta upwelling system brings cold, dense waters to the surface102,103, which may be associated with dynamic sea-level rise due to geostrophic adjustments of the Brazil Current, generating the observed inverse relationship. In Santana, the dynamics of the Amazon River plume govern variations in ocean dynamic height largely independently of local air temperature104,105.

The superior strength of the multiple correlation (\(DMC^{2}_{x}\)) at long time scales (Fig. 8) statistically encapsulates the integration of low-frequency processes. At seasonal to interannual scales, the combined effects of integrated thermal expansion, continental hydrological loading, and continuous crustal motions act in synergy, explaining a larger fraction of MSL variance than any individual forcing alone. The dynamic sliding-window analysis (Fig. 9) further captures the nonstationarity of these relationships. Sharp peaks, such as the one observed in Fortaleza in 2021, are likely signatures of compound extreme events, in which heat anomalies, precipitation, and possibly atmospheric loading acted concomitantly to produce an extreme MSL anomaly, whose variance is exceptionally well explained by the multivariate coefficient.

Study limitations

This study presents limitations that must be acknowledged for an appropriate interpretation of the results and to guide future research.

Regarding the data, the presence of missing values and the subsequent imputation process may have underestimated the true complexity (entropy) of the series and influenced the correlation estimates at shorter time scales.

Additionally, the spatial separation between the tide gauge stations and their corresponding geodetic and meteorological stations is also a limiting factor and may introduce uncertainties, which should be interpreted with caution.

Although the distances between some tide gauge stations and their corresponding GNSS stations are relatively large, particularly in Arraial do Cabo and Macaé, this limitation reflects the actual spatial distribution of the available geodetic infrastructure in Brazil. Nevertheless, previous studies have shown that regional vertical land motion signals exhibit spatial coherence over scales of hundreds of kilometers, especially when analyzed using long time series such as those employed in this study106,107,108. Furthermore, the internal consistency of the NGL time series and the multiscale nature of the applied analyses help mitigate the effects of this limitation, thereby ensuring the robustness of the identified correlations.

In contrast, the distances between the tide gauge stations and the meteorological stations are more critical, ranging from 0.85 km to approximately 73.21 km (Table 1), particularly with respect to the precipitation variable, given its well-known high micro-scale variability. Such separation may introduce noise into the high-frequency bivariate correlations between precipitation and MSL, possibly contributing to the greater instability observed in the \(\rho _{DCCA}\) values for this analysis (Fig. 6). Nevertheless, the multiscale and multivariate framework that constitutes the core of this study partially mitigates this limitation. The most robust and physically interpretable results, such as the high \(DMC^{2}_{x}\) values at seasonal and interannual scales, depend on large-scale climate variability, which is more consistently captured even by more distant stations. Moreover, the distinct regional patterns observed suggest that large-scale physical processes dominate the integrated relationship. Future studies could benefit from the use of spatially interpolated reanalysis products or denser sensor networks to more precisely isolate strictly local effects.

From a methodological and scope perspective, the analysis was limited to three selected non-oceanic forcings. Other important variables in coastal sea-level studies, such as atmospheric pressure (inverse barometer effect), wind stress (dynamic forcing), and variations in large-scale ocean currents, were not included. Consequently, a portion of the MSL variance remains unexplained by the applied methodology. It is also important to emphasize that the statistical methods employed (\(\rho _{DCCA}\) and \(DMC^{2}_{x}\)) are correlation-based tools and do not, by themselves, establish causal relationships. The physical interpretations were inferred based on established geophysical knowledge. Finally, the results are representative of the specific monitoring locations, and their spatial extrapolation should be approached with caution, particularly in heterogeneous coastal regions.

Despite these limitations, the consistency of the revealed spatial patterns with known physical processes, the robustness of the findings at long time scales (which are less susceptible to high-frequency noise), and the innovative nature of the multivariate and multiscale approach lend strong support to the main conclusions. Future work may partially overcome these limitations through the use of longer and more complete time series, the incorporation of additional relevant variables, the application of causal discovery methods within multiscale frameworks, and the integration of remote sensing data to achieve broader spatial coverage.

Conclusions

This study investigated, through a bivariate, multivariate, and multiscale approach, the dependency relationships between mean sea level (MSL) and geodetic-meteorological variables (GNSS altimetry, precipitation, and air temperature) along the Brazilian coast. The combined application of the \(\rho _{DCCA}\) and \(DMC^2_x\) coefficients, complemented by entropy analyses and sliding-window techniques, allowed us to conclude that GNSS altimetry is the most stable predictor variable and exhibits the strongest cross-correlation with MSL. The anticorrelation patterns identified-particularly pronounced in Salvador (EMSAL) and Santana (EMSAN)-provide robust evidence of the influence of vertical crustal movements on tide gauge measurements, reinforcing the critical need to incorporate geodetic corrections in order to distinguish between true sea-level variations and land motion.

In contrast, precipitation showed a weaker and more unstable relationship with MSL, with correlations varying significantly across stations and temporal scales. Its high intrinsic variability, along with occasionally negative correlation patterns, suggests an indirect influence modulated by local factors such as topography and hydrological regimes. Air temperature, on the other hand, exhibited distinctly regional behavior: the positive correlation in Arraial do Cabo (EMARC) is consistent with the thermal expansion mechanism, whereas the negative correlations in Imbituba (EMIMB) and Santana (EMSAN) point to complex local ocean-atmosphere processes.

The multiple cross-correlation (\(DMC^2_x\)) proved to be a superior tool for capturing the combined influence of the variables, with higher values at larger temporal scales (\(n > 100\) days) for most stations, indicating that cumulative effects become more noticeable at seasonal to interannual scales. The dynamic analysis with sliding windows confirmed the nonstationary nature of these relationships, capturing temporal fluctuations and specific extreme events. Complementarily, the entropy analysis revealed a spectrum of dynamic complexity along the coast, ranging from structured chaotic systems in Salvador and Imbituba to more predictable behaviors in Santana.

In summary, this research demonstrated the effectiveness and innovative character of applying the \(DMC^2_x\) coefficient in coastal studies, filling a methodological gap identified in the literature. The findings reinforce that MSL variability along the Brazilian coast is governed by a complex interaction of factors, in which geodetic and climatic influences are predominant but with varying magnitudes and behaviors across regions. The implications of these results are direct for coastal planning and for assessing risks associated with climate change, providing essential support for more assertive public policies and for improving forecasting model calibration. For future work, we suggest incorporating additional forcing variables and applying this methodology to longer time series, aiming to more clearly distinguish long-term climate-change effects from natural variability across multiple scales.