Accurate initial field estimation for weather forecasting with a variational constrained neural network

Wang, Wuxin; Zhang, Jinrong; Su, Qingguo; Chai, Xingyu; Lu, Jingze; Ni, Weicheng; Duan, Boheng; Ren, Kaijun

doi:10.1038/s41612-024-00776-1

Download PDF

Article
Open access
Published: 30 September 2024

Accurate initial field estimation for weather forecasting with a variational constrained neural network

Wuxin Wang^1,2,
Jinrong Zhang ORCID: orcid.org/0009-0000-5023-1139²,
Qingguo Su^1,2,
Xingyu Chai ORCID: orcid.org/0000-0002-0910-9876^1,2,
Jingze Lu^1,2,
Weicheng Ni²,
Boheng Duan² &
…
Kaijun Ren^1,2

npj Climate and Atmospheric Science volume 7, Article number: 223 (2024) Cite this article

3108 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Weather forecasting is crucial for scientific research and society. Recently, deep learning (DL) methods have achieved significant advancements in medium-range weather forecasting. However, they generally depend on the initial fields generated by the computationally expensive four-dimensional variational (4DVar) data assimilation (DA) technique, which limits their real-time applicability in multivariate three-dimensional (3D) weather forecasting. Here we propose 4DVarFormer by exploring the potential of integrating the 4DVar constraint into an attention-based neural network. 4DVarFormer eliminates the need for background error covariance statistics and the complex adjoint model development. It can generate multivariate 3D weather states within 0.37 s. Moreover, 4DVarFormer can capture inter-variable relationships, allowing the assimilation of observed variables to correct unobserved variables. Hence, medium-range forecasts initiated by 4DVarFormer outperform those of DL-based DA methods and achieve performance comparable to the forecasts initiated by ERA5 reanalyses. These promising findings contribute to future advancements in integrated end-to-end DL weather forecasting systems.

FuXi-DA: a generalized deep learning data assimilation framework for assimilating satellite observations

Article Open access 26 April 2025

Anthropogenic fingerprints in daily precipitation revealed by deep learning

Article Open access 30 August 2023

Gridded daily weather data for North America with comprehensive uncertainty quantification

Article Open access 23 July 2021

Introduction

The importance of accurate predictions for future Earth systems, ranging from minutes to centuries and from local to global scales, has significantly influenced humanity’s continuous efforts to observe and comprehend our planet¹. In recent decades, advancements in observation, theoretical understanding, and high-performance computing have led to the development of numerical weather prediction (NWP) models. These models provide warnings for local extreme events with lead times ranging from hours to days and can predict global weather up to 2 weeks in advance¹. However, the complexity of these physical models necessitates significant investments in supercomputing clusters to enable the development and operationalization of the NWP. In recent years, deep learning (DL)-based approaches for weather forecasting have emerged, demonstrating comparable performance to the state-of-the-art (SOTA) NWP model^2,3,4,5,6. Significantly, these methods exhibit forecasting efficiency in orders of magnitude higher than that of traditional NWP models. It is crucial to note that high-quality and accurate initial fields serve as the foundation for making skillful forecasts. These fields, when combined with the forecasting models, govern the evolution of the state trajectory in both space and time. The preparation of accurate initial fields has garnered increased attention in the domain of Earth science⁷. Therefore, significant efforts have been devoted to developing data assimilation (DA) techniques to effectively integrate existing atmospheric knowledge, often in the form of forecasting models, and observations to achieve the best estimation of the initial fields^8,9,10,11.

With notable advancements in DA methods and the proliferation of observations, substantial improvements have been achieved in the accuracy of initial fields^{12,13,14,15,16}. In particular, the four-dimensional variational (4DVar) method is widely acknowledged as one of the most advanced DA techniques, producing dynamically and physically constrained analysis fields¹⁷. It implicitly implements the flow-dependent background error covariance matrix and propagates information within assimilation windows using the tangent linear model (TLM)¹⁸ and the adjoint model (ADM)¹⁹. This method has been extensively adopted in most leading operational NWP centers. Furthermore, the current generation of initial fields for DL-based weather forecasting models also frequently depends on the analysis field produced by the conventional 4DVar approach^2,3,6. However, the background error covariance statistics in 4DVar, along with the development of the ADM, require highly trained experts to innovate better programs and algorithms. This process can be time-consuming and expensive. Additionally, the iterative minimization of the 4DVar cost function entails computational costs typically two to three orders of magnitude higher than those of a single model forecast¹¹. Thus, relying on initial fields derived from the traditional 4DVar method constrains the real-time applicability of current DL-based weather forecasting models for multivariate 3D weather systems.

Fortunately, the conceptual similarity between DA and DL^20,21, as well as the increasing use of DL techniques in the atmospheric and oceanic domains^{3,22,23,24,25}, has prompted exploration of the application of DL methods for DA²⁶. Unlike conventional DA approaches, the “pure" data-driven DL-based DA models use neurons to establish implicit mappings from input background fields and observations to target analysis or reanalysis fields automatically. This accelerates the DA process significantly^27,28,29. For instance, Pawar et al.²⁷ proposed a long short-term memory (LSTM) model to estimate the nudging term and tested on the Lorenz96 physical model, which not only guides the state trajectories toward the observations but also acts as a stabilizer. Additionally, Wu et al.²⁸ introduced a fast DA (FDA) method that replaces the DA process by training a neural network, which selects 4DVar results as target outputs. When tested on the Lorenz63 physical model, FDA demonstrated superior computational performance compared to 4DVar, while maintaining similar quality. However, these methods lack the incorporation of underlying physics^27,28,29, and fail to learn the propagation of observational information across variables adequately. Therefore, when applied to complex multivariate three-dimensional (3D) weather systems, these methods are theoretically unable to efficiently correct other variables using observed variables.

On the other hand, an approach worth mentioning involves the use of DL-based surrogate models to replace the forward or ADM traditionally used in DA^{11,23,30,31,32,33,34,35}. One prominent illustration of this methodology is the 4DVarNet model²³, which constitutes a neural and differentiable implementation of the 4DVar cost function as well as a gradient-based neural solver of the cost function. The trained model has demonstrated the capacity to expedite the 4DVar method with fewer gradient-based iterations and has exhibited effective performance in reconstructing the dynamics in the Lorenz96 system, as well as in the sea surface height (SSH)^23,34,35. However, the utilization of a neural network extracts exclusively the gradient information presents challenges in integrating physical constraints with background field information. Additionally, the ConvLSTM model used in 4DVarNet encounters difficulties in learning long-range dependencies in data owing to non-parallelizability and vanishing gradients issues²⁵. Additionally, the autoregressive strategy leads to high GPU memory costs during training and inference. The inherent limitations hamper the effectiveness of recent methods in real-time estimation of multivariate 3D weather states and their applicability to real-world scenarios. This is due to the excessive computing resource requirements and the limited representation of intrinsic relationships within the multivariate 3D weather system. Therefore, substantial advancements in multivariate 3D weather state estimation are necessary to enable the use of DL-based DA methods in real-world initial field estimation. These advancements should aim to comprehensively capture correlations within weather systems while reducing the demand for computational resources.

In response to the above requirements, we present 4DVarFormer, a variational-constrained neural network designed to real-timely estimate accurate multivariate 3D initial fields for the pre-trained medium-range forecasting model FourCastNet². FourCastNet is the first DL-based weather forecasting model that can achieve comparable performance to the high-resolution operational NWP model. If 4DVarFormer proves effective for ForeCastNet, it suggests the possibility of generalization to other DL-based weather forecasting models. Specifically, 4DVarFormer serves as a non-iterative solver of the 4DVar cost function by combining an attention-based neural network with the 4DVar physical constraint. This architecture enables capturing interactions between the kinetic information within the assimilation window and the background field, thereby enhancing its capacity to utilize observed variables to correct other variables. Notably, 4DVarFormer eliminates the necessity for background error covariance statistics and the development of complex adjoint models. Our observing system simulation experiments (OSSEs)^1,35,36,37 demonstrate the superior performance of 4DVarFormer over 4DVarNet and ViT in terms of initial field estimation. It showcases the capability to estimate multivariate 3D initial fields in the East China region within 0.37 s, effectively driving FourCastNet to generate medium-range forecasts comparable to those using ERA5 as initial fields. Furthermore, our model skillfully propagates information from 10 m wind field observations to upper-air geopotential fields and upper-air temperature observations to upper-air relative humidity, indicating its capacity to accurately depict the intrinsic correlation within the multivariable 3D weather system. Finally, the sensitivity experiments additionally illustrate 4DVarFormer’s effectiveness in adapting to variations in background fields, thereby highlighting its potential for real-world weather DA.

Overall, our contribution to this work can be summarized as follows:

We propose a 4DVar-constrained model, 4DVarFormer, for accurate real-time estimation of multivariate 3D weather states.
The medium-range forecast initialized by 4DVarFormer demonstrates comparable performance to the forecast driven by ERA5 while outperforming those initialized by 4DVarNet and ViT.
We assess the model’s ability to capture correlations between variables in the atmospheric system through a series of sensitivity experiments, thus enhancing the model’s interpretability.

Results

The performance evaluation of 4DVarFormer involved utilizing data from the East China region spanning the years 2021 and 2022. Our experiments focused on two aspects of 4DVarFormer’s performance. The first is the accuracy and stability of the 30-day DA cycle, and the second is the accuracy of the 7-day medium-term forecasts initialized using 4DVarFormer. These experiments were conducted at 5-day intervals, yielding a total of 139 sets. For details of the experimental methodology, please see Fig. S4. To determine how 4DVarFormer’s performance compares to other DL-based DA methods, we focus on 4DVarNet^23,34,35 and ViT³⁸. 4DVarNet is a strong DL-based baseline that has achieved superior performance when reconstructing the SSH. Further, ViT and its variants^2,3,6,39 have demonstrated remarkable performance in various computer vision tasks and complex dynamic modeling. Comparison with these two baseline models can effectively prove the effectiveness of 4DVarFormer.

Assessments of assimilation metrics for the 4DVarFormer model

We conducted a series of 30-day DA cycle experiments, where we utilized 3-day forecast background fields initialized by ERA5. These background fields were combined with noisy observations of surface wind and upper-air temperatures at 50, 500, 850, and 1000 hPa pressure levels as models’ inputs. In recent studies, ViT-based models such as FourCastNet², PanGu³, and FuXi⁶ have demonstrated remarkable effectiveness in capturing the intricate dynamics of the high-resolution weather system. This indicates the potential of ViT-based models for modeling the multivariate correlations within the weather system. Consequently, we conducted a comparison between our 4DVarFormer, the original ViT model³⁸, and the SOTA DL-based DA method called 4DVarNet. Both 4DVarFormer and 4DVarNet assimilate 6-hourly observations at the 24-h DA window, while ViT assimilates using observations at the initial moment.

Figure 1 illustrates the time series of the averaged latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC) for 4DVarFormer, 4DVarNet, and ViT models, considering 4 surface variables, and 20 upper-air variables (the details of the RMSE and ACC metrics can be seen Supplementary Text). Notably, 4DVarFormer consistently outperforms both 4DVarNet and ViT for all tested variables. In terms of RMSE (lower values indicate better performance), 4DVarFormer exhibits an overall 34.6% improvement compared to 4DVarNet and a 43.2% improvement compared to ViT. On the other hand, in terms of ACC (higher values indicate better performance), 4DVarFormer exhibits an overall 10.9% improvement compared to 4DVarNet and a 20.2% improvement compared to ViT. The advantage of 4DVarFormer is even more pronounced for variables like Z500 and R500.

Fig. 1: Comparison of the average latitude RMSE (first and second rows) and ACC (third and fourth rows) of the 139 sets of 30-day assimilation cycle using 4DVarNet (blue lines), ViT (yellow lines), and 4DVarFormer (red lines) for four surface variables, such as MSLP, T2M, U10, and V10, and six upper-air variables, including Z500, T850, T500, R500, U500, V500, using testing data from 2021 to 2022.

To gain a deeper understanding of the specific reason of 4DVarFormer’s advancements that contribute to its enhanced assimilation performance, we present the assimilation outcomes of all neural networks in Fig. 2. The analysis focused on two upper-air variables, Z500 and T850, as well as two surface variables, T2M, and 10-m wind speed, comparing the outcomes with the ERA5 ground truth. The depicted instances are all set at 00:00 UTC on 19 July 2021, when there is a tropical cyclone in the East China region. The results obtained from 4DVarFormer closely approximate the ground truth and demonstrate the best reconstruction of the tropical cyclone’s position and shape. It was observed that all DL-based models exhibit varying degrees of mesh artifacts, indicating that both ConvLSTM- and attention-based models tend to predict average values within the kernel or patch. This tendency is a common trait of DL-based vision algorithms, converging on average values³. In contrast, ERA5 appears less smooth due to its calculation of an ensemble average estimated value at each grid cell by solving a 4DVar system composed of a complex partial differential equation. Notably, the process of solving the numerical 4DVar cost function of a single member on a high-performance computer typically takes about 25 min⁴⁰. In contrast, 4DVarFormer expedites the 4DVar process by approximately 4000 times, requiring only 0.37 s, in contrast to the traditional 4DVar.

**Fig. 2: Visualization of assimilation results.**

In Figs. S5 and S6, we provide further evaluations of 4DVarFormer with the error profiles and the power spectral densities.

Assessments of 7-day forecast metrics driven by the 4DVarFormer model

Our research focuses on leveraging neural networks to acquire accurate multivariate 3D initial fields, thereby enabling accurate medium-range weather forecasts. To evaluate this ability, the analysis fields estimated by each model on every fifth day at 00:00 UTC are utilized as the initial fields for the pre-trained FourCastNet to generate forecasts. Following this, we conduct a comparative analysis of the 6-hourly forecasts spanning over 7 days, comparing these forecast results and contrasting them against forecasts generated using ERA5 as the initial field.

The overall forecast results for 2021 and 2022 are depicted in Fig. 3. Across all tested variables, encompassing upper-air and surface variables, 4DVarFormer demonstrates consistently superior accuracy compared to both 4DVarNet and ViT. Specifically, in terms of RMSE, 4DVarFormer exhibits overall 10.7% lower values than 4DVarNet and 13.6% lower values than ViT. When turned to ACC, 4DVarFormer exhibits overall 8.5% higher values than 4DVarNet and 11.8% higher values than ViT. This advantage is general across all lead times, ranging from 6 to 168 h (i.e., 7 days), with some variables such as T2M and R500 showing particularly significant advantages. Importantly, the accuracy of the forecasts initialized by 4DVarFormer is comparable to those driven by ERA5 for all variables. This heightened accuracy and alignment can be attributed to the capability of 4DVarFormer to utilize the physical constraints inherent in 4DVar.

Fig. 3: Comparison of the average latitude RMSE (first and second rows) and ACC (third and fourth rows) of the 139 sets of 7 day’s medium forecast using ERA5 (black lines) and outputs of 4DVarNet (blue lines), ViT (yellow lines), and 4DVarFormer (red lines) as initial fields.

Sensitivity experiments performed for interpretability

The skillful initial field estimation achieved by the 4DVarFormer model underscores the pivotal role played by multivariate coupling dynamics embodied within the model’s architecture. One of the advantages of the 4DVarFormer model is its ability to capture the relationship between temperature and relative humidity. This capability is facilitated by two key factors: the gradient of the 4DVar cost function, constructed by the pre-trained FourCastNet, and the robust representation of long-range correlations through the attention mechanism. In this subsection, we conduct sensitivity experiments to elucidate the fundamental physical constraints in 4DVarFormer. The control experiment discussed in the “Assessments of assimilation metrics for the 4DVarFormer model” and “Assessments of 7-day forecast metrics driven by the 4DVarFormer model” sections assimilated observations of 10-m wind and upper-air temperature. Now, we undertake several sensitivity experiments using the testing set to assess the model’s proficiency in assimilating different observational variables and their impacts. Specifically, one test experiment involves assimilating only the 10-m wind observations, while another focuses solely on assimilating upper-air temperature observations. The wind assimilation sensitivity experiment aims to explore its effects on the upper-air geopotential fields, whereas the temperature sensitivity experiment seeks to examine the interplay between temperature and relative humidity. All sensitivity analyses for the assimilation experiments are predicated on the aforementioned trained model without fine-tuning. Additionally, we also encapsulate the influence of different forecast timing background fields on the assimilation outcomes. The findings demonstrate the stability of our model when utilizing the background fields of 3–5 days’ forecasts as inputs, thereby ensuring consistent forecasting proficiency. To further investigate the impact of the 4DVar physical constraints on the assimilation performance of the 4DVarFormer model and the accuracy of the initialized forecasts, we perform a set of experiments by varying the length of the DAW. Furthermore, we compare the tropical cyclone track forecasts of two super typhoons in 2021 using 4DVarFormer and ERA5 initialized FourCastNet, respectively, in order to explore the characteristics of the flow dependence implicit in the 4DVarFormer model.

Assimilating the surface wind field effects on the upper-air geopotential

Surface wind significantly influences the atmospheric system, particularly through its relationship with pressure, which is a well-established feature of the atmosphere^41,42,43. To investigate the influence of surface wind observations on the 4DVarFormer’s estimation of upper-air geopotential states, we carried out an experiment where only 10m wind observations were assimilated. This was done to discern the specific impact of surface wind observations on the upper-air pressure, considering the high correlation between pressure and geopotential. Figure 4a presents the average RMSEs and ACCs of Z500, Z850, U10, and V10 for 7-day forecasts initiated from the estimated initial fields. While the forecast skills are somewhat diminished compared to the control experiment (section “Assessments of 7-day forecast metrics driven by the 4DVarFormer model”) when only 10 m wind observations are assimilated, the overall performance remains notably superior to that of 4DVarNet and ViT.

**Fig. 4: Visualization of examples when only assimilating 10-m wind observations.**

Figure 4b displays the background error (x^b − x^t) in the 10-m wind field and the upper-air geopotential fields 00:00 UTC on 19 July 2021, where there is a tropical cyclone at this time in the East China region. The FourCastNet forecasts reveal a negative error in the north and a positive error in the south for both Z500 and Z850, exhibiting a symmetrical structure on the north and south sides of the tropical cyclone. Furthermore, U10 and V10 exhibit negative errors in the tropical cyclone eye while positive errors around it. This suggests that FourCastNet is underestimating the strength of the tropical cyclone. Figure 4c depicts the analysis increments (x^a − x^b) resulting from the assimilation of 10-m wind observations. The spatial correspondence and accurate correction amount within the 10-m wind field increments from all three models are evident. Conversely, upon analyzing the increments of Z500 and Z850, it is evident that the corrections made by 4DVarNet and ViT are limited. Specifically, the analysis increments from 4DVarNet lack symmetry in the central structure of tropical cyclones in both the northern and southern regions, and hold relatively small corrections. This limitation hinders the complete correction of errors in the background field. On the other hand, ViT exhibits a significant positive bias throughout the entire East China region, with increments that poorly correspond to the background error. This result suggests that ViT fails to accurately propagate information from the 10-m wind field to the upper-air geopotential fields and does not adequately capture the wind-pressure relationship. In contrast, the analysis increments obtained by 4DVarFormer exhibit a strong correlation with the 10m wind field’s increments and closely correspond to the background errors, indicating that 4DVarFormer is capable of reasonably correcting the upper-air geopotential fields by assimilating the 10 m wind observations. This ability showcases the model’s capacity to capture the essential wind-pressure relationships within the atmospheric system.

In addition to the results already reported above, we also visualize the results for the initial fields assimilating only the 10-m wind observation in Fig. 5. It can be seen that both 4DVarFormer and ViT can correctly capture the location of the tropical cyclone, while 4DVarNet exhibits two tropical cyclone centers. This may be caused by the excessive weighting of the background field in the 4DVar cost function optimized by 4DVarNet. For the upper-air geopotential fields and mean sea level pressure, only 4DVarFormer can describe the spatial structures well. This indicates that 4DVarFormer can effectively extract the information from the observed data of the 10-m wind field to correct the geopotential fields and pressure fields. This result corresponds to the analysis increments in Fig. 4c and again shows that 4DVarFormer learned the wind-pressure relationship.

**Fig. 5: Visualization of assimilation results.**

In Fig. S7, we provide further evaluations of 4DVarFormer with the error profiles when only the 10-m wind observations are assimilated.

Assimilating the upper-air temperature effects on the relative humidity

The relative humidity is known to be dependent on air temperature^44,45. Therefore, the assimilation of upper-air temperature observations is expected to correct relative humidity errors. Here, we further examine the effects of assimilating upper-air temperature observations on the estimation and forecast of upper-air relative humidity. Figure 6a presents the RMSEs and ACCs of R500, R850, T500, and T850. While the forecast skill of the temperature and relative humidity fields is somewhat diminished in the forecasts driven solely by the assimilation of upper-air temperature observations in 4DVarFormer compared to the control experiment (section “Assessments of 7-day forecast metrics driven by the 4DVarFormer model”), the overall performance remains notably superior to that of 4DVarNet and ViT.

**Fig. 6: Visualization of examples when only assimilating upper-air temperature observations.**

Figure 6b depicts the background error (x^b − x^t) in the upper-air temperature and relative humidity fields at a specific moment in time, while Fig. 6c illustrates the analysis increments (x^a − x^b) resulting from assimilating the upper-air temperature observations by each neural network. The temperature and relative humidity increments from the three models exhibit distinct differentiation. Particularly, the increments of the ViT model exhibit significant anomalies in the northwest direction over land, suggesting that the model does not effectively learn the information present in temperature observations during its training phase. Thus, there is an underestimation of the influence of temperature features and a lack of separation of information from the wind and temperature observations. While 4DVarNet shows some improvement, it still exhibits explicit underestimated increments compared to 4DVarFormer. In contrast, the analysis increments obtained by 4DVarFormer present a more reasonable distribution, with the relative humidity increments showing a somewhat negative correlation with the temperature increments. This alignment is more consistent with the relationship between the two error distributions in the background field, suggesting that 4DVarFormer can effectively capture the fundamental temperature-humidity relationship in the atmospheric system and correct the corresponding relative humidity through the assimilation of upper-air temperature observations.

In addition to the findings already presented above, we have visualized the results of assimilating solely the upper-air temperature observations in Fig. 7. It is evident from the figure that only 4DVarFormer accurately captures the spatial structure of the relative humidity. On the other hand, both 4DVarNet and ViT do not significantly correct the background relative humidity. This result suggests that only 4DVarFormer possesses the ability to transfer information from temperature observations to humidity, while neither 4DVarNet nor ViT exhibits this capability. These results align once again to highlight that 4DVarFormer effectively learns the temperature-humidity relationship.

**Fig. 7: Visualization of assimilation results.**

In Fig. S8, we provide further evaluations of 4DVarFormer with the error profiles when only the upper-air temperature observations are assimilated.

Sensitivity experiments using different forecast timing background fields

We additionally tested the performance of 4DVarFormer in the face of different lead time forecasting background fields. Figure 8 shows the performance of the 7-day forecasts starting from assimilation results using the background field with 3-day (red line), 4-day (blue line), and 5-day (yellow line) lead times, respectively. It can be seen that 4DVarFormer shows similar performance on the three different forecast timing background fields, with both RMSEs and ACCs very close to those started from ERA5. There are some fluctuations in the ACC only on the T2M, but the overall ACC remains around the ACC from ERA5, which also shows that the model has a very good performance.

Fig. 8: Comparison of the average latitude-weighted RMSE (first and second rows) and ACC (third and fourth rows) of the 139 sets of 7-day’s medium forecast using ERA5 and outputs of 4DVarFormer (red lines for using 3-day forecasting background filed as 4DVarFormer’s input, blue lines for 4-day forecasting background field as the input, and yellow lines for 5-day forecasting background field as the input).

In Fig. S9, we provide further evaluations of 4DVarFormer with the error profiles when background fields with different lead times are used. Interestingly, as illustrated in Figs. 8 and S9, the forecast bias resulting from the assimilation of the 5-day background field is the largest, yet its metrics demonstrate the least oscillation amplitudes throughout the forecast horizon. This phenomenon may be attributed to the discontinuities in the analysis increments produced by 4DVarFormer between patches, whereas forecasts generated by FourCastNet tend to exhibit smoother transitions over extended forecast durations. Consequently, utilizing a background field with a 5-day forecast mitigates the discontinuities in the analytical field to some extent, thereby diminishing the impact of instability on subsequent forecasts.

Our experiments aimed to assess the performance of the 4DVarFormer model when assimilating once using background fields of varying forecast timing quality, to evaluate the model’s stability. As shown in Fig. S1, we observed an increasing trend in forecast errors as the lead time extended. Thus, this leads to larger biases when using background fields of different forecast timing for assimilation (see Fig. S9). This indicates that our model not only considers the impact of observations on assimilation but also accounts for the influence of background field quality. Hence, the error level of the background field in the training set influences the performance of our model. In future research, we will address the challenge of the deep dependence of neural networks on the background field distribution of the training set, thus leading to the development of more robust assimilation models.

Sensitivity experiments using different DAWs

In this section, we aim to further explore the role of 4DVar physical constraints by examining the impact of the analysis fields generated by 4DVarFormer, using different DAWs, on forecast accuracy. In our experiments, the assimilation window is left-closed and right-open, i.e., the 6-h DAW indicates that only observations at the initial moment are assimilated. The 12-h DAW indicates that only observations at the initial moment and 6 h after the initial moment are assimilated. We conducted experiments using DAWs of 6 h (without the 4DVar constraint), 12 h, 18 h, and 24 h to assess the effect of incorporating physical constraints of varying time lengths through the gradient of the 4DVar cost function. To begin with, we conducted a set of 30-day DA cycle experiments, and the results are presented in Fig. 9. As the length of the DAW increases, the analysis error decreases, indicating that including more temporal observations has a positive impact on the performance of the 4DVarFormer. Furthermore, we carried out a 7-day medium-range forecast experiment, and the results are shown in Fig. 10. The forecasts generated from analysis fields using longer DAWs exhibit lower RMSEs and higher ACCs. This demonstrates that incorporating information through the 4DVar constraints contributes positively to forecast accuracy.

Fig. 9: Comparison of the average latitude RMSE (first and second rows) and ACC (third and fourth rows) of the 139 sets of 30-day assimilation cycle using DAW of 6 h (blue lines), 12 h (yellow lines), 18 h (green lines), and 24 h (red lines) for four surface variables, such as MSLP, T2M, U10, and V10, and six upper-air variables, including Z500, T850, T500, R500, U500, V500, using testing data from 2021 to 2022.

Fig. 10: Comparison of the average latitude-weighted RMSE (first and second rows) and ACC (third and fourth rows) of the 139 sets of 7-day’s medium forecast using ERA5 and outputs of 4DVarFormer (red lines for using 6-day DAW, blue lines for 12-h DAW, green lines for 18-h DAW, and red lines for 24-h DAW).

Tropical cyclone forecasting experiments

This section aims to demonstrate the effectiveness of tropical cyclone track forecasting using initial fields generated by 4DVarFormer. The purpose of this demonstration is to evaluate the adaptability of 4DVarFormer to extreme weather conditions and its ability to capture flow-dependent information. We localized the cyclone eye by finding the minimum location of the MSLP. As depicted in Fig. 11, the FourCastNet model employed in this study has limited performance, resulting in challenges in accurately predicting the path of the tropical cyclones even when the ERA5 is utilized as input. However, in the case of strong tropical cyclone tracking tasks, the initial fields generated by 4DVarFormer have successfully guided forecasts that exhibited closer proximity to the actual outcomes. Regarding the prediction of Super Typhoon Surigae, forecasts initialized by ViT and 4DVarNet accurately predicted the tropical cyclone’s tracks within a 4-day period. However, only the forecast initialized by 4DVarFormer successfully captures the tropical cyclone’s change in direction beyond 4 days. This distinctive predictive capability was further confirmed in the case of Super Typhoon Chanthu. These successes can be attributed, at least in part, to the 4DVarFormer model’s implicit flow-dependent characteristics, thereby enhancing its ability to initialize forecasts of extreme weather events.

**Fig. 11: Comparison of the tropical cyclone forecasting initialized by 4DVarFormer and ERA5.**

Visualization of the attention weights

For the Transformer model, the diversity of features plays a crucial role in its ability to capture attention effectively⁴⁶. In our study, we aimed to investigate whether the model learns a wide range of diverse features. To accomplish this, we visualized the attention map’s rank. As an example, we extracted the attention weights from the last block of the 4DVarFormer model and visualized the resulting attention map in Figs. 12 and 13. The visualization revealed that both the cross-attention and self-attention of 4DVarFormer exhibited attention maps with nearly full rank. This indicates that the model successfully learns a diverse set of features. Additionally, the attention maps obtained from different heads in the model’s multi-head attention were all distinct. This suggests that each head captures different features. For example, when examining the cross-attention, we observed that the fifth head’s attention map displayed localized features. This observation implies that the fifth head focuses more on the relationship between each embedding of the background field and the embedding of gradients at spatially adjacent.

**Fig. 12: Attention map of the cross attention module (400 × 400) from the last block of 4DVarFormer.**

**Fig. 13: Attention map of the self-attention module (400 × 400) from the last block of 4DVarFormer.**

Discussion

DL-based weather forecasting models have made substantial progress in medium-range weather forecasting. Nonetheless, these models still depend on the outputs of the conventional 4DVar method for initializing. However, the NWP-based 4DVar method entails significant computational costs, impeding the real-time operationalization of DL-based weather forecasting. Recent advancements in DL-based DA models present a promising approach for nonlinear modeling and rapid assimilation. Nevertheless, the current DL-based DA models encounter challenges in accurately estimating multivariate 3D initial fields due to the absence of physical constraints and the inability to capture multivariate kinetic relationships.

Recently, a new direction has emerged to enable real-time 4DVar by utilizing DL-based forecasting models instead of the forward and ADM. Under this situation, 4DVarNet has been successfully applied to reconstruct Lorenz63/96 systems and SSH. However, the issues of non-parallelizability and vanishing gradients of 4DVarNet as well as the expensive autoregressive strategy make it hard to extend to real-world initial field estimation directly. This study introduces the 4DVarFormer model, which is built upon the constraint of 4DVar and incorporates the advancements of the ViT model in extracting global correlations. Unlike traditional 4DVar, 4DVarFormer does not need to count the complex background error covariance matrix or develop the complex ADM, which greatly reduces the development difficulty of assimilation systems. The 4DVar-based physical constraint in 4DVarFormer improves its capability to capture multivariate relationships by establishing connections between observed and unobserved variables using the auto-differentiation of the forecast model and the attention mechanism. The effectiveness and superiority of the 4DVarFormer model in estimating multivariate 3D states are demonstrated in this work, highlighting its ability to represent the interconnected physical relationships within the atmospheric system.

Certainly, in addition to integrating the DL-based model with 4DVar, current research includes several studies that combine the DL-based forecasting model with Ensemble Kalman Filter (EnKF) for ensemble assimilation^47,48,49. These studies leverage the rapid inference capability of the DL-based model to generate a large number of ensemble members. However, few related studies have applied these methods to the assimilation task of 3D multivariate fields. This may be because the research on ensemble forecasting techniques for DL-based forecasting is still in its early stages. Furthermore, generative adversarial networks (GANs) have also been employed for the development of accurate DA models^50,51. Nevertheless, owing to the instability of generative adversarial network training, a profound exploration is required to train a stable and reliable assimilation model for multivariate 3D complex weather systems. The 4DVarFormer proposed in this paper possesses a simple structure and training process. However, it is also promising to explore the combination of ensemble methods as well as generative models with 4DVarFormer, which would leverage the advantages of these methods and further enhance the practical utility of DL-based assimilation.

Furthermore, the 4DVarFormer model demonstrates the capability to estimate multivariate 3D initial fields in the East China region, effectively driving pre-trained FourCastNet to generate medium-range forecasts comparable to those using ERA5 as initial fields. In addition, the black-box nature of DL-based models presents a notable challenge to their interpretability in real-time DA. While prior studies have reported most DL-based DA results, there has been limited analysis applied to the mechanistic understanding of multivariate 3D weather state estimation within a DL context. To address this gap, we have designed and executed sensitivity experiments to effectively demonstrate the ability of 4DVarFormer to capture the wind-pressure relationship and temperature-humidity relationship. Hence, these meticulously planned and demonstrated sensitivity experiments are critical for advancing the understanding of the kinetic soundness in a DL-based DA model. In addition, we have conducted a sensitive experiment to evaluate the impact of using different DAWs on the performance of 4DVarFormer in terms of assimilating and forecasting results. The results demonstrate the positive influence of the 4DVar physical constraints. Furthermore, we have assessed the performance of 4DVarFormer in the tropical cyclone track task. The results of two super typhoon tracking tests show that the tropical cyclone tracks initialized with 4DVarFormer exhibited improvements compared to ERA5. This suggests that 4DVarFormer is capable of learning flow-dependent characteristics. Additionally, we have visualized the attention map in 4DVarFormer and observed that it captured a wide range of diverse features. It is crucial to note that 4DVarFormer is a model framework that primarily utilizes the Transformer model to extract gradient information from the background field and the 4DVar cost function. Importantly, this framework remains theoretically independent of the architecture of the forecasting model.

Further enhancements to the configuration of 4DVarFormer, as presented in this study, are warranted. Firstly, the initial fields produced by the DL-based model demonstrate some smoothing and mesh checkerboard issues, likely stemming from the model’s limited identifiable resolution compared to the data resolution. To mitigate this concern, exploring the use of GANs^52,53 or diffusion models^54,55 for post-processing the output results to enhance the model’s identifiable resolution is advisable. For example, the introduction of a diffusion model in the FuXi-Extreme model³⁹ has exhibited potential in enhancing forecast accuracy for extreme weather events, potentially attributed to the improved model identifiable resolution achieved through diffusion models.

Furthermore, the training of the 4DVarFormer model requires the use of a pre-trained forecast model to calculate the gradient as an input, leading to a substantial consumption of GPU memory and hindering efficient model training and scalability. A potential solution to this challenge involves developing lightweight forecast models, similar to the use of a low-resolution TLM in the inner loop of traditional 4DVar^56,57. This approach aims to optimize GPU memory utilization while ensuring the accuracy of the lightweight forecast model, thereby enabling the extension of the 4DVarFormer model to encompass additional variables, larger regions, or even global state estimation.

The effective implementation and robust performance of 3D weather state estimation using 4DVarFormer indicate its potential for practical real-world assimilation. Consequently, our promising findings contribute to future advancements in integrated end-to-end DL weather forecasting systems. Furthermore, the associated variational-constrained networks can be readily extended to other geoscience domains, including oceanography, demonstrating its promise as a valuable complement to the 4DVar technique.

Materials and methods

Datasets

ERA5, the fifth generation of the ECMWF reanalysis dataset, provides invaluable hourly data on surface and upper-air variables with a horizontal resolution of approximately 31 km. Covering the period from January 1940 to the present day, this extensive dataset is a comprehensive and enduring source of weather information⁵⁸. Produced by an ensemble 4DVar assimilation system that integrates high-quality global observations with Integrated Forecast System results, it is widely recognized as the most comprehensive and accurate reanalysis archive available. Due to its extensive coverage and accuracy, we have chosen to utilize the ERA5 reanalysis dataset as the ground truth for our work.

We have extracted a subset of the ERA5 dataset, specifically focusing on the East China region over a 13-year period. The spatial coverage spans from 100°E to 140°E in longitude and 10°N to 50°N in latitude, with a spatial resolution of 0.25° (160 × 160 latitude-longitude grid points) and a temporal resolution of 6 h. Our efforts concentrate on assimilating observations of upper-air temperature at four levels (50, 500, 850, 1000 hPa) and the 10-m wind to accurately estimate five upper-air atmospheric variables at four levels and four surface variables. The upper-air atmospheric variables include geopotential (Z), temperature (T), u component of wind (U), v component of wind (V), and relative humidity (R), while the surface variables encompass 2 m-temperature (T2M), 10 m u wind component (U10), 10 m v wind component (V10), and mean sea level pressure (MSLP). Consequently, our work involves the estimation, forecasting, and evaluation of a total of 24 variables.

We partitioned the data into training, validation, and testing sets. The training set comprises 14,600 samples, with 10 years’ worth of data from 2010 to 2019 (14,600 = 365 × 4 × 10). Similarly, the validation set consists of 1460 samples representing the year 2020 (1460 = 365 × 4 × 1), and the out-of-sample testing involves 2920 samples covering the years 2021 to 2022 (2920 = 365 × 4 × 2).

The experiments involved utilizing the OSSE setting^1,35,36,37 and training the FourCastNet forecast model with the experimental data (for details, please see Supplementary Text, Table S1 and Fig. S1). To process the simulated observed wind field data, we initially converted it to wind speed and direction using the U10 and V10 variables from the ERA5 reanalysis. Gaussian stochastic perturbations were then applied to the wind speed, with a standard deviation of 2 m/s for wind speeds below 20 m/s, and a standard deviation of 10% multiplied by the wind speed for wind speeds above 20 m/s⁵⁹. Subsequently, the perturbed wind field was transformed back to observations of U10 and V10. As for the simulated observed temperature data, we utilized the temperature in the ERA5 reanalysis and introduced Gaussian random perturbations to simulate relative humidity observations, with error standard deviations of 2K, 0.5K, 1.4K, and 2.2K for the 50, 500, 850, and 1000 hPa levels, respectively. These errors are assumed based on the retrieval error distribution described in the paper by Wang et al.⁶⁰, rounded to 1 decimal place.

4DVarFormer: the multivariate 3D weather state estimation model

Inspired by the dynamical modeling applications using the variant ViT models^2,3,6 and considering the physical consistency in the weather system, a computational efficiency 4DVar-constrained attention-based neural network model³⁸, known as 4DVarFormer, has been established for multivariate 3D weather initial field estimation. It takes the multivariate 3D background x^b(t₀) at time t₀ and observations of 10-m wind and 4 pressure level temperatures (${\bf{y}}({t}_{0}),\ldots ,{\bf{y}}({t}_{{N}_{f}})$, N_f = 3 represents the number of time points of observations in the assimilation window of 24 h) as inputs to generate accurate initial fields x^a(t₀). The theoretical foundation of the 4DVarFormer procedure aligns with the cost function of the 4DVar method, as elucidated by:

$${{\mathcal{J}}}_{4DV\,ar}=\frac{1}{2}\parallel {{\bf{x}}}^{b}({t}_{0})-{\bf{x}}({t}_{0}){\parallel }_{{{\bf{B}}}^{-1}}^{2}+\frac{1}{2}\mathop{\sum }\limits_{k=0}^{{N}_{f}}\parallel {\bf{y}}({t}_{k})-{{\mathcal{H}}}_{k}({{\mathcal{M}}}_{{t}_{0}\to {t}_{k}}({\bf{x}}({t}_{0}))){\parallel }\,_{{{\bf{R}}}^{-1}}^{2},$$

(1)

where ${{\mathcal{M}}}_{{t}_{0}\to {t}_{k}}$ is the pre-trained FourCastNet model that predicts the x(t_k) at time t_k by taking state x(t₀) at time t₀ as the input (for details to train FourCastNet please see Supplementary Text and Table S1). ${{\mathcal{H}}}_{k}$ represents the observation operator at time t_k. B and R separately represent the background and observation error covariance matrix which are assumed to be constant.

In the case of propagating the initial perturbation using a TLM, the expression for the analysis increment can be derived as follows:

$${{\Delta }}{\bf{x}}({t}_{0})=-2{\left({\bf{I}}+{\bf{B}}\mathop{\sum }\limits_{k = 0}^{K}{{\bf{M}}}_{{t}_{0}\to {t}_{k}}^{T}{{\bf{H}}}_{k}^{T}{{\bf{R}}}^{-1}{{\bf{H}}}_{k}{{\bf{M}}}_{{t}_{0}\to {t}_{k}}\right)}^{-1}{\bf{B}}{\left({\nabla }_{{{\bf{x}}}^{b}({t}_{0})}{{\mathcal{J}}}_{4DV\,ar}^{o}({{\bf{x}}}^{b}({t}_{0}))\right)}^{T},$$

(2)

where ${{\bf{M}}}_{{t}_{0}\to {t}_{k}}$ is the tangent linear version of ${{\mathcal{M}}}_{0\to k}$, H_k is the tangent linear version of ${{\mathcal{H}}}_{k}$. ${{\mathcal{J}}}_{4DVar}^{O}=\frac{1}{2}\parallel {\bf{y}}({t}_{k})-{{\mathcal{H}}}_{k}({{\mathcal{M}}}_{{t}_{0}\to {t}_{k}}({\bf{x}}({t}_{0}))){\parallel }_{{{\bf{R}}}^{-1}}^{2}$ denotes the observation term of the 4DVar cost function (for detail derivations please see Supplementary Text).

4DVarFormer is constructed based on this theory. Unlike traditional 4DVar systems, 4DVarFormer acts as a non-iterative 4DVar solver. It learns the operation $-2{\left({\bf{I}}+{\bf{B}}\mathop{\sum }\nolimits_{k = 0}^{K}{{\bf{M}}}_{0\to k}^{T}{{\bf{H}}}_{k}^{T}{{\bf{R}}}^{-1}{{\bf{H}}}_{k}{{\bf{M}}}_{0\to k}\right)}^{-1}{\bf{B}}$ by a neural network. The operation comprises the pertinent features of the background field and the relevant features of the ADM. Since the features of the ADM can be derived from the gradient, the 4DVarFormer model takes the background field and the gradient of ${{\mathcal{J}}}_{4DVar}^{O}$ as inputs. This process yields initial fields that align with the trajectory information of weather states. (see Fig. 14a). Notably, the 4DVarFormer does not need to estimate the covariance matrix B, while the covariance matrix R is set as a diagonal matrix:

$${\bf{R}}=\left[\begin{array}{llllll}{\sigma }_{T50}^{2}{{\bf{I}}}_{T50}&&&&&\\ &{\sigma }_{T500}^{2}{{\bf{I}}}_{T500}&&&&\\ &&{\sigma }_{T850}^{2}{{\bf{I}}}_{T850}&&&\\ &&&{\sigma }_{T1000}^{2}{{\bf{I}}}_{T1000}&&\\ &&&&{\sigma }_{U10}^{2}{{\bf{I}}}_{U10}&\\ &&&&&{\sigma }_{V10}^{2}{{\bf{I}}}_{V10}\\ \end{array}\right]\,,$$

(3)

where σ_variable (variable ∈ [T50, T500, T850, T1000, U10, V10]) represents the standard deviation of the variable’s observation error and I_variable denotes the number of observation points (in this study it is 25,600).

**Fig. 14: 4DVarFormer for 3D-multivariate weather state estimation.**

The model architecture of the 4DVarFormer consists of two main components: the 4DVar gradient block and the assimilation network. Firstly, in the 4DVar gradient block, the background field x^b(t₀) is fed into the pre-trained FourCastNet to predict the next N_f states of the weather on the grids (${{\bf{x}}}^{f}({t}_{1}),\cdots \,,{{\bf{x}}}^{f}({t}_{{N}_{f}})$, in our study N_f = 3). Then, as shown in Fig. 14b, the observation term ${{\mathcal{J}}}_{4DVar}^{O}({{\bf{x}}}^{b}({t}_{0}))$ of the 4DVar cost function is computed by using the prior states (${{\bf{x}}}^{b}({t}_{0}),{{\bf{x}}}^{f}({t}_{1}),\cdots \,,{{\bf{x}}}^{f}({t}_{{N}_{f}})$) and the observations (${\bf{y}}({t}_{0}),\cdots \,,{\bf{y}}({t}_{{N}_{f}})$). With the help of PyTorch’s automatic differentiation function, we can directly obtain the gradient ${\nabla }_{{{\bf{x}}}^{b}({t}_{0})}{{\mathcal{J}}}_{4DVar}^{o}({{\bf{x}}}^{b}({t}_{0}))$ of this cost function.

Subsequently, the gradient ${\nabla }_{{{\bf{x}}}^{b}({t}_{0})}{{\mathcal{J}}}_{4DVar}^{o}({{\bf{x}}}^{b}({t}_{0}))$ and the background field x^b(t₀) are used as the inputs of the following assimilation network (Fig. 14c) to generate the analysis increment Δx(t₀). Recently, the ViT³⁸ and its variants have demonstrated remarkable performance in various computer vision and complex dynamical systems modeling tasks, which learn the correlation of the input data by the attention mechanism. The attention mechanism is calculated as follows:

$$\,\text{Attention}\,({\bf{Q}},{\bf{K}},{\bf{V}})=\frac{{\bf{Q}}{{\bf{K}}}^{T}}{\sqrt{D}}{\bf{V}},$$

(4)

where Q, K, V represent the query, key, and value features respectively. D denotes the length of the features. To reduce the spatial dimensions of inputs and accelerate the training process, the patch embeddings divide the gradient and background field into 8 × 8 size patches with each patch projected as a feature vector, separately. Subsequently, the attention-based assimilation network processes the embedded data, and prediction follows a simple linear projection layer. The assimilation network comprises L stack blocks (in our work, L = 4), formed sequentially by a multi-head cross-attention (MHCA), a feedforward block, a multi-head self-attention (MHSA), and another feedforward block. Each MHCA takes in the linear projection of the background feature as the query feature, and the linear projections of the output from the previous block serve as the key and value features. The final analysis field is computed as the sum of the background field and the analysis increment.

Model training strategy

This section outlines the training process for DL-based DA models (4DVarFormer, 4DVarNet, and ViT). The training step involves supervised training and optimizing the DL-based DA models to estimate the initial fields using the training dataset. The loss function used is ${{\mathcal{L}}}_{1}$ loss, which is defined as follows:

$${{\mathcal{L}}}_{1}=\frac{1}{C\times H\times W}\mathop{\sum }\limits_{c=1}^{C}\mathop{\sum }\limits_{i=1}^{H}\mathop{\sum }\limits_{j=1}^{W}\vert {\hat{{\bf{X}}}}_{c,i,j}-{{\bf{X}}}_{c,i,j}\vert ,$$

(5)

where C, H, and W are the number of channels and the number of grid points in latitude and longitude direction, respectively. c, i, and j are the indices for the channel, the latitude, and the longitude coordinates, respectively. ${\hat{{\bf{X}}}}_{c,i,j}$ and X_c,i,j are the models’ estimated and ground truth for the same variable and locations (latitude and longitude coordinates) at the initial time. The ${{\mathcal{L}}}_{1}$ loss is averaged over all the grid points and variables.

All the DL-based DA models in the study were developed using the PyTorch framework. Specifically, the 4DVarFormer and ViT models were trained using a single NVIDIA V100 GPU. The training process involved 100 epochs and utilized the AdamW optimizer with parameters β₁ = 0.9 and β₂ = 0.95, as well as an initial learning rate of 5e−4. On the other hand, the 4DVarNet model was trained on two NVIDIA V100 GPUs and employed the Adam optimizer, as described in the original paper³⁵. Additionally, the autoregressive iteration step of 4DVarNet is set to be 10. It is worth noting that all of these models were trained using a batch size of 64 and a cosine learning rate decay. After each training epoch, we evaluate the ${{\mathcal{L}}}_{1}$ loss of the model estimation based on the validation set and save the model parameters with the minimum loss (for more details of the training hyperparameters, the evaluation metrics, the configurations of all neural networks as well as the flow chart of the evaluation framework, please see Supplementary Text, Tables S2–S4 and Figs. S2–S4.). All models were trained using only the background field of 3-day forecast duration as well as 10m wind field and high altitude temperature observations, and all relevant tests were performed on the trained models.

Data availability

We downloaded a subset of the ERA5 dataset from the official website of Copernicus Climate Data (CDS) at https://cds.climate.copernicus.eu/.

Code availability

We use the code of FourCastNet to build the weather forecasting model, available at the official implementation repository https://github.com/NVlabs/FourCastNet. The compared 4DVarNet model’s code is available at https://github.com/CIA-Oceanix/4dvarnet-core and the ViT model’s code is available at https://github.com/lucidrains/vit-pytorch. The source code used for training and running the 4DVarFormer model in this work is available in a Github repository https://github.com/wuxinwang1997/4DVarFormer.

References

Gettelman, A. et al. The future of earth system prediction: advances in model-data fusion. Sci. Adv. 8, eabn3488 (2022).
Article Google Scholar
Kurth, T. et al. Fourcastnet: Accelerating global high-resolution weather forecasting using adaptive fourier neural operators. 1–11 (2023).
Bi, K. et al. Accurate medium-range global weather forecasting with 3d neural networks. Nature 619, 533–538 (2023).
Article CAS Google Scholar
Nguyen, T., Brandstetter, J., Kapoor, A., Gupta, J. K. & Grover, A. Climax: A foundation model for weather and climate. 25904–25938 (PMLR, 2023).
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
Article CAS Google Scholar
Chen, L. et al. Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast. npj Clim. Atmos. Sc. 6, 190 (2023).
Article Google Scholar
Wang, B., Zou, X. & Zhu, J. Data assimilation and its applications. Proc. Natl Acad. Sci. USA 97, 11143–11144 (2000).
Article CAS Google Scholar
Courtier, P., Thépaut, J.-N. & Hollingsworth, A. A strategy for operational implementation of 4d-var, using an incremental approach. Q. J. R. Meteorol. Soc. 120, 1367–1387 (1994).
Evensen, G. et al. Data Assimilation: The Ensemble Kalman Filter, Vol. 2 (Springer, 2009).
Bocquet, M. & Sakov, P. An iterative ensemble Kalman smoother. Q. J. R. Meteorol. Soc. 140, 1521–1535 (2014).
Article Google Scholar
Kotamarthi, R. et al. Efficient high-dimensional variational data assimilation with machine-learned reduced-order models. Geosci. Model Dev. 15, 3433–3445 (2022).
Article Google Scholar
Minamide, M. & Zhang, F. Assimilation of all-sky infrared radiances from Himawari-8 and impacts of moisture and hydrometer initialization on convection-permitting tropical cyclone prediction. Mon. Weather Rev. 146, 3241–3258 (2018).
Article Google Scholar
Lu, X. & Wang, X. Improving hurricane analyses and predictions with TCI, IFEX field campaign observations, and CIMSS AMVs using the advanced hybrid data assimilation system for HWRF. Part II: observation impacts on the analysis and prediction of Patricia (2015). Mon. Weather Rev. 148, 1407–1430 (2020).
Article Google Scholar
Feng, J. et al. Improving typhoon predictions by assimilating the retrieval of atmospheric temperature profiles from the fengyun-4a’s geostationary interferometric infrared sounder (GIIRS). Atmos. Res. 280, 106391 (2022).
Article Google Scholar
Bhate, J., Munsi, A., Kesarkar, A., Kutty, G. & Deb, S. K. Impact of assimilation of satellite retrieved ocean surface winds on the tropical cyclone simulations over the north Indian ocean. Earth Space Sci. 8, e2020EA001517 (2021).
Article Google Scholar
Düben, P. et al. Machine Learning at ECMWF: A Roadmap for the Next 10 Years. European Centre for Medium-Range Weather Forecasts. Tech. Rep. 878 (2021).
Rabier, F., Järvinen, H., Klinker, E., Mahfouf, J.-F. & Simmons, A. The ECMWF operational implementation of four-dimensional variational assimilation. I: experimental results with simplified physics. Q. J. R. Meteorol. Soc. 126, 1143–1170 (2000).
Article Google Scholar
Lorenc, A. C. The potential of the ensemble Kalman filter for NWP—a comparison with 4d-var. Q. J. R. Meteorol. Soc. 129, 3183–3203 (2003).
Article Google Scholar
Lewis, J. M. & Derber, J. C. The use of adjoint equations to solve a variational adjustment problem with advective constraints. Tellus A 37, 309–322 (1985).
Article Google Scholar
Geer, A. J. Learning earth system models from observations: machine learning or data assimilation? Philos. Trans. R. Soc. A 379, 20200089 (2021).
Article CAS Google Scholar
Bocquet, M., Brajard, J., Carrassi, A. & Bertino, L. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Found. Data Sci. 2, 55–80 (2020).
Article Google Scholar
Reichstein, M. et al. Deep learning and process understanding for data-driven earth system science. Nature 566, 195–204 (2019).
Article CAS Google Scholar
Fablet, R. et al. Learning variational data assimilation models and solvers. J. Adv. Model. Earth Syst. 13, e2021MS002572 (2021).
Article Google Scholar
Ravuri, S. et al. Skilful precipitation nowcasting using deep generative models of radar. Nature 597, 672–677 (2021).
Article CAS Google Scholar
Zhou, L. & Zhang, R.-H. A self-attention–based neural network for three-dimensional multivariate modeling and its skillful ENSO predictions. Sci. Adv. 9, eadf2827 (2023).
Article Google Scholar
Cheng, S. et al. Machine learning with data assimilation and uncertainty quantification for dynamical systems: a review. IEEE/CAA J. Autom. Sin. 10, 1361–1387 (2023).
Article Google Scholar
Pawar, S., Ahmed, S. E., San, O., Rasheed, A. & Navon, I. M. Long short-term memory embedded nudging schemes for nonlinear data assimilation of geophysical flows. Phys. Fluids 32, 076606 (2020).
Article CAS Google Scholar
Wu, P. et al. Fast data assimilation (FDA): data assimilation by machine learning for faster optimize model state. J. Comput. Sci. 51, 101323 (2021).
Article Google Scholar
Arcucci, R., Zhu, J., Hu, S. & Guo, Y.-K. Deep data assimilation: integrating deep learning with data assimilation. Appl. Sci. 11, 1114 (2021).
Article CAS Google Scholar
Errico, R. M., VukiĆEviĆ, T. & Raeder, K. Examination of the accuracy of a tangent linear model. Tellus A Dyn. Meteorol. Oceanogr. 45, 462–477 (1993).
Article Google Scholar
Errico, R. M. What is an adjoint model? Bull. Am. Meteorol. Soc. 78, 2577–2592 (1997).
Article Google Scholar
Nonnenmacher, M. & Greenberg, D. S. Deep emulators for differentiation, forecasting, and parametrization in earth science simulators. J. Adv. Model. Earth Syst. 13, e2021MS002554 (2021).
Article Google Scholar
Dong, R., Leng, H., Zhao, J., Song, J. & Liang, S. A framework for four-dimensional variational data assimilation based on machine learning. Entropy 24, 264 (2022).
Article Google Scholar
Beauchamp, M., Febvre, Q., Georgenthum, H. & Fablet, R. 4DVarNet-SSH: end-to-end learning of variational interpolation schemes for nadir and wide-swath satellite altimetry. Geosci. Model Dev. Discuss. 2022, 1–37 (2022).
Google Scholar
Fablet, R., Febvre, Q. & Chapron, B. Multimodal 4dvarnets for the reconstruction of sea surface dynamics from sst-ssh synergies. IEEE Transactions on Geoscience and Remote Sensing. 61, 1–14 (2023).
Kleist, D. T. & Ide, K. An OSSE-based evaluation of hybrid variational–ensemble data assimilation for the NCEP GFS. Part I: system description and 3d-hybrid results. Mon. Weather Rev. 143, 433–451 (2015).
Article Google Scholar
Kleist, D. T. & Ide, K. An OSSE-based evaluation of hybrid variational–ensemble data assimilation for the NCEP GFS. Part II: 4DEnVar and hybrid variants. Mon. Weather Rev. 143, 452–470 (2015).
Article Google Scholar
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale (2020).
Zhong, X. et al. FuXi-Extreme: improving extreme rainfall and wind forecasts with diffusion model. Preprint at arXiv https://doi.org/10.48550/arXiv.2310.19822 (2023).
Zhu, S. et al. A four-dimensional ensemble-variational (4DEnVar) data assimilation system based on GRAPES-GFS: system description and primary tests. J. Adv. Model. Earth Syst. 14, e2021MS002737 (2022).
Article Google Scholar
Atkinson, G. D. & Holliday, C. R. Tropical cyclone minimum sea level pressure/maximum sustained wind relationship for the western north pacific. Mon. Weather Rev. 105, 421–427 (1977).
Article Google Scholar
Hagiwara, T., Osano, S. & Akashi, S. Relationship between the CI-number and central pressure and maximum wind speed in typhoons. J. Meteor. Res. 42, 59–67 (1990).
Google Scholar
Chavas, D. R., Reed, K. A. & Knaff, J. A. Physical understanding of the tropical cyclone wind-pressure relationship. Nat. Commun. 8, 1360 (2017).
Article Google Scholar
Yang, Y., Endreny, T. A. & Nowak, D. J. A physically based analytical spatial air temperature and humidity model. J. Geophys. Res. Atmos. 118, 10–449 (2013).
Article Google Scholar
Davis, R. E., McGregor, G. R. & Enfield, K. B. Humidity: a review and primer on atmospheric moisture and human health. Environ. Res. 144, 106–116 (2016).
Article CAS Google Scholar
Han, D., Pan, X., Han, Y., Song, S. & Huang, G. Flatten transformer: vision transformer using focused linear attention 5961–5971 (2023).
Brajard, J., Carrassi, A., Bocquet, M. & Bertino, L. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the Lorenz 96 model. J. Comput. Sci. 44, 101171 (2020).
Article Google Scholar
Chattopadhyay, A., Mustafa, M., Hassanzadeh, P., Bach, E. & Kashinath, K. Towards physics-inspired data-driven weather forecasting: integrating data assimilation with a deep spatial-transformer-based u-net in a case study with era5. Geosci. Model Develop. 15, 2221–2237 (2022).
Article Google Scholar
Chattopadhyay, A., Nabizadeh, E., Bach, E. & Hassanzadeh, P. Deep learning-enhanced ensemble-based data assimilation for high-dimensional nonlinear dynamical systems. J. Comput. Phys. 477, 111918 (2023).
Article Google Scholar
Bao, J., Li, L. & Davis, A. Variational autoencoder or generative adversarial networks? A comparison of two deep learning methods for flow and transport data assimilation. Math. Geosci. 54, 1017–1042 (2022).
Article Google Scholar
Silva, V. L., Heaney, C. E., Li, Y. & Pain, C. C. Data assimilation predictive GAN (DA-PredGAN) applied to a spatio-temporal compartmental model in epidemiology. J. Sci. Comput. 94, 25 (2023).
Article Google Scholar
Mirza, M. et al. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014).
Google Scholar
Stengel, K., Glaws, A., Hettinger, D. & King, R. N. Adversarial super-resolution of climatological wind and solar data. Proc. Natl Acad. Sci. USA 117, 16805–16815 (2020).
Article CAS Google Scholar
Hess, P., Drüke, M., Petri, S., Strnad, F. M. & Boers, N. Physically constrained generative adversarial networks for improving precipitation fields from earth system models. Nat. Mach. Intell. 4, 828–839 (2022).
Article Google Scholar
Croitoru, F.-A., Hondru, V., Ionescu, R. T. & Shah, M. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 45, 10850–10869 (2023).
Segami, A. et al. Operational mesoscale weather prediction with Japan spectral model. J. Meteorol. Soc. Jpn. Ser. II 67, 907–924 (1989).
Article Google Scholar
Zhang, X. et al. Development of an efficient regional four-dimensional variational data assimilation system for WRF. J. Atmos. Ocean. Technol. 31, 2777–2794 (2014).
Article Google Scholar
Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).
Article Google Scholar
Shi, X., Duan, B. & Ren, K. A more accurate field-to-field method towards the wind retrieval of HY-2B scatterometer. Remote Sens. 13, 2419 (2021).
Article Google Scholar
Wang, S., Lu, F. & Feng, Y. An investigation of the Fengyun-4A/B GIIRS performance on temperature and humidity retrievals. Atmosphere 13, 1830 (2022).
Article CAS Google Scholar

Download references

Acknowledgements

We appreciate the researchers at ECMWF for their efforts in collecting, archiving, disseminating, and maintaining the ERA5 reanalysis dataset, without which this study would not have been feasible. This work was supported by the Science and Technology Innovation Program of Hunan Province (2022RC3070) and the National Natural Science Foundation of China (Grant No. 42205161). We would like to express our sincere gratitude to Huipeng Wang and Taikang Yuan for their help in revising the manuscript and analyzing the experimental results.

Author information

Authors and Affiliations

College of Computer Science and Technology, National University of Defense Technology, Deya Street, Changsha, 410073, Hunan, China
Wuxin Wang, Qingguo Su, Xingyu Chai, Jingze Lu & Kaijun Ren
College of Meteorology and Oceanography, National University of Defense Technology, Deya Street, Changsha, 410073, Hunan, China
Wuxin Wang, Jinrong Zhang, Qingguo Su, Xingyu Chai, Jingze Lu, Weicheng Ni, Boheng Duan & Kaijun Ren

Authors

Wuxin Wang
View author publications
Search author on:PubMed Google Scholar
Jinrong Zhang
View author publications
Search author on:PubMed Google Scholar
Qingguo Su
View author publications
Search author on:PubMed Google Scholar
Xingyu Chai
View author publications
Search author on:PubMed Google Scholar
Jingze Lu
View author publications
Search author on:PubMed Google Scholar
Weicheng Ni
View author publications
Search author on:PubMed Google Scholar
Boheng Duan
View author publications
Search author on:PubMed Google Scholar
Kaijun Ren
View author publications
Search author on:PubMed Google Scholar

Contributions

K.J.R., B.H.D., and W.X.W. designed the project. K.J.R. and B.H.D. managed and oversaw the project. W.X.W. and Q.G.S. performed the model training and evaluation. W.X.W. and B.H.D. improved the model design. W.X.W., Q.G.S., X.Y.C., and J.R.Z. analyze the experiment results. W.X.W., J.Z.L., and W.C.N. wrote and revised the manuscript.

Corresponding authors

Correspondence to Weicheng Ni, Boheng Duan or Kaijun Ren.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, W., Zhang, J., Su, Q. et al. Accurate initial field estimation for weather forecasting with a variational constrained neural network. npj Clim Atmos Sci 7, 223 (2024). https://doi.org/10.1038/s41612-024-00776-1

Download citation

Received: 11 January 2024
Accepted: 16 September 2024
Published: 30 September 2024
DOI: https://doi.org/10.1038/s41612-024-00776-1