FuXi-DA: a generalized deep learning data assimilation framework for assimilating satellite observations

Xu, Xiaoze; Sun, Xiuyu; Han, Wei; Zhong, Xiaohui; Chen, Lei; Gao, Zhiqiu; Li, Hao

doi:10.1038/s41612-025-01039-3

Download PDF

Article
Open access
Published: 26 April 2025

FuXi-DA: a generalized deep learning data assimilation framework for assimilating satellite observations

Xiaoze Xu^1,2,3^na1,
Xiuyu Sun²^na1,
Wei Han^3,4,
Xiaohui Zhong^2,5,
Lei Chen²,
Zhiqiu Gao¹ &
…
Hao Li^2,5

npj Climate and Atmospheric Science volume 8, Article number: 156 (2025) Cite this article

5182 Accesses
13 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance. Nevertheless, developing an efficient DA system poses significant challenges, particularly in establishing intricate relationships between the background field and the vast amount of multi-source observation data within limited operational time windows. Recently, Deep learning-based (DL-based) weather forecast models have shown promise in matching, even surpassing, the leading operational NWP models worldwide. This success motivates the exploration of establishing DL-based DA frameworks. DL models possess multi-modal modeling capabilities, enabling the fusion of multi-source data in the feature space, which is very similar to the process of assimilating multi-source observational data in DA systems. In this study, we introduce FuXi-DA, a generalized DL-based DA framework for assimilating satellite observations. By assimilating data from Advanced Geosynchronous Radiation Imager aboard Fengyun-4B, FuXi-DA consistently mitigates analysis errors and significantly improves forecast performance. Furthermore, FuXi-DA has been validated against established atmospheric physics, demonstrating its consistency and reliability.

A data-to-forecast machine learning system for global weather

Article Open access 19 July 2025

Accurate initial field estimation for weather forecasting with a variational constrained neural network

Article Open access 30 September 2024

ML for fast assimilation of wall-pressure measurements from hypersonic flow over a cone

Article Open access 04 June 2024

Introduction

Accurate weather forecasts are crucial for saving lives, emergency management, mitigating disaster impacts, and preventing economic losses due to severe weather events. At present, weather forecasts primarily rely on numerical weather prediction (NWP) models¹. These traditional models generate weather forecasts by solving the governing partial differential equations based on the current state of atmosphere and surface as initial conditions. Since 2022, a new generation of deep learning (DL) models for medium-range weather forecasting has been recognized for their potential in medium-range weather forecasting, demonstrating forecast performance comparable to leading NWP models^2,3,4,5. Weather forecasting is essentially an “initial value” problem, where the greatest source of uncertainty is the initial conditions⁶. Therefore, the accuracy of these initial conditions is of paramount importance, regardless of the forecasting method used^7,8. Currently, both NWP and DL-based weather forecasting models rely on the initial fields, commonly referred to as the ‘analysis’, generated by DA systems. The analysis is considered the most accurate estimate of the current atmospheric state, obtained by combining short-term forecasts, referred to as the ‘background’, from NWP models with observational data. The DA systems establish a foundation for reliable weather forecasting and are equally important as weather forecasting models.

Over the past few decades, advancements in DA systems have significantly enhanced the accuracy of weather forecasts¹, such as improvements in assimilation algorithms^9,10,11, the refinement of assimilation techniques^12,13, and the increased use of observational data^14,15. Despite these advances, the development of DA systems has encountered several challenges. One primary challenge is that the large volume of observations is challenging to fully exploit in traditional DA systems. For instance, while the European Centre for Medium-Range Weather Forecasts (ECMWF) receives 800 million observations daily, only 60 million of these meet the quality control criteria for integration into the Integrated Forecasting System (IFS)¹⁶. One reason is that, due to computational considerations and limited understanding of error correlations, current DA systems handle correlated errors in observations through thinning processes, which consequently leads to the exclusion of numerous valuable observations^13,15. Moreover, a significant portion of satellite observations are assimilated under clear-sky conditions^17,18, neglecting valuable cloud and precipitation data. Although all-sky assimilation techniques for microwave observations have gradually matured¹⁴, all-sky assimilation of infrared observations remains challenging^14,18.

Another challenge lies in the high computational cost of widely adopted DA methods, such as four-dimensional variational (4D-Var)¹⁹, ensemble Kalman filter (EnKF)²⁰, and ensemble-variational (En-Var) methods¹⁰. As the volume of observational data increases, this issue is exacerbated. Notably, satellite observations account for the majority of modern operational observation systems and are crucial for DA systems¹⁵. However, satellite measure variables such as radiance, brightness temperature, or reflectivity do not directly correspond to model variables like temperature, humidity, or ozone. Therefore, observation operators must be used to convert model variables into model equivalents consistent with observed variables for comparison, further exacerbating computational demands and potentially introducing additional errors²¹. Moreover, the complexity of DA workflows and the heavy computational load necessitate that NWP centers initiate assimilation and forecasting operations early to ensure meteorological support²². Consequently, operational products are unable to assimilate the observations collected during the period from the start of product creation to delivery²². These observations typically contain crucial information about the current atmospheric state. These challenges underscore the need for continuous innovation in DA techniques to enhance the efficiency and accuracy of weather forecasting in the face of growing data volumes and computational demands.

In recent years, DL models have demonstrated significant potential across various aspects of weather and climate modeling, catalyzing numerous efforts to integrate DL models with DA techniques²³. Many of these efforts have focused on carrying out proof-of-concept investigations within simplified dynamical systems, such as the Lorenz systems and the shallow water equations^{24,25,26,27,28}. Some studies have utilized DL models to tackle specific challenges within the traditional DA framework. These applications encompass replacing physical parametrization schemes²⁹, constructing satellite observation operators^30,31, and optimizing cost functions^32,33. Recent studies have aimed to train DL models to replace the entire DA process. FengWu-Adas and DiffDA assimilated simulated observations from the ERA5 reanalysis dataset, establishing a DL-based assimilation and forecasting system^34,35. Maddy used DL models to achieve indirect assimilation of satellite observations by first retrieving satellite observations as temperature, humidity, and total precipitable water, and then assimilating them using AI-DA³⁶. However, this method heavily relies on the accuracy of retrieval algorithms, which may introduce additional errors^37,38.

Given the pivotal role of satellite DA in improving the skill of medium-range weather forecasts^15,39, this study introduces FuXi-DA, a pioneering, generalized DL-based DA framework. This framework is designed for the direct assimilation of satellite observations to produce the analysis that optimize the forecast performance of DL-based weather forecasting models. Figure 1 shows the comparison between FuXi-DA and the traditional DA process, taking the variational method as an example. In comparison with traditional assimilation method, FuXi-DA offers several new features:

To achieve spatial matching between observational data and background fields, FuXi-DA processes the observational data into ‘super-observations’ with a spatial resolution of 0.25° through spatial averaging. This operation resembles the traditional DA technique of selecting super-observations for assimilation^40,41. Since less information is discarded, super-observations allow a better utilization of the total information content¹³. Moreover, FuXi-DA does not perform further sparsification, ensuring the full utilization of the observational data.
To address the discrepancy in information content between background and observational data and facilitate their integration, FuXi-DA employs separate encoders to process background and observational data and uses an innovative unified fusion neural network to merge the information. These encoders transform the background and observations into a unified feature space, eliminating the need for observation operators. The fusion neural network enables the fusion of both types of information in a unified feature space, avoiding the iterative optimization process required in traditional variational methods. As a result of these processes, FuXi-DA significantly reduces the computational cost compared to traditional DA methods.
Furthermore, FuXi-DA implements the joint training of DA model with the DL-based weather forecasting model FuXi. By introducing forecast error loss to supervise the training of the assimilation model, FuXi-DA further enhances medium-range forecast performance.

In this study, we assess the efficacy of FuXi-DA by integrating it with the Advanced Geosynchronous Radiation Imager (AGRI) aboard Fengyun-4B satellite. Our results reveal that the assimilation of AGRI data consistently enhances forecast accuracy, demonstrating FuXi-DA’s capability to leverage satellite data for forecast improvement. Furthermore, a series of single-observation experiments confirm that FuXi-DA’s consistency with established atmospheric physics. It is important to note that while FuXi-DA was developed to refine the analysis for enhancing the forecast performance of FuXi, its framework is sufficiently flexible to be adapted for use with other DL-based weather forecasting models.

Results

To demonstrate the effectiveness of FuXi-DA in assimilating satellite observations, this study uses 6-h forecast fields generated by the DL-based weather forecasting model FuXi as the background, with ERA5 data as input, and assimilates gridded AGRI data from the Fengyun-4B satellite (a detailed description is provided in the Data section). As shown in Fig. 1, the FuXi-DA model is trained using ERA5 reanalysis data as labels and the fixed FuXi model is used to provide forecast supervision (a detailed description is provided in the Model training section). Data from June 2022 to June 2024 are used to train the FuXi-DA model, and data from August 2023 to December 2023 are used to evaluate its performance. The assimilation process is performed daily at 00:00, 06:00, 12:00, and 18:00 UTC. Due to the limited spatial coverage and informational content of the observations, the evaluation was not performed under a cycling assimilation scenario.

To evaluate assimilation performance, three experimental configurations were designed: EXP_ASSI, EXP_CORR, and EXP_CTRL. In EXP_ASSI, FuXi-DA was used to assimilate AGRI data into the background, producing the assimilated analysis fields. In EXP_CORR, AGRI data were removed, and a DL model was trained to correct errors between the background and ERA5, producing the corrected background field, which was then used as the analysis field for forecasting (a detailed description is provided in the Experimental configurations section). In EXP_CTRL, the background was left unprocessed and directly used as the analysis field for forecasting. EXP_CORR was designed to highlight the impact of AGRI observations, as the DL model could learn relevant information from historical background-ERA5 pairs, bringing the background field closer to ERA5.

The evaluation provides a comprehensive assessment of FuXi-DA by examining its impact on background and forecast accuracy while offering insights into its effectiveness. ERA5 data served as the ground truth for calculating metrics throughout the evaluation. The evaluation focuses on five upper-air atmospheric variables, specifically geopotential (Z), temperature (T), the u-component of wind (U), the v-component of wind (V), and the relative humidity (R). After assimilating AGRI data using FuXi-DA, a significant reduction in analysis error is observed, particularly in the metrics of relative humidity and geopotential. Additionally, these observations significantly affect the middle and upper troposphere in the area covered by satellite observations. Furthermore, the assimilation of AGRI data enhances the skill of global forecasts. Within the forecast lead time of 7 days, there are statistically significant improvements, with the RMSE of Z500 demonstrate notable reduction throughout the forecast period. Moreover, FuXi-DA model exhibits robust physical consistency and can automatically distinguishes between cloudy and clear conditions in observations. Consequently, there is no need to develop and apply cloud detection algorithms for preprocessing satellite data withn the FuXi-DA framework.

Analysis verifications

This section evaluates the improvements in the analyses after assimilating AGRI data using FuXi-DA. We first explore the differences within the observation region of the Fengyun-4B satellite between two experiments: EXP_CTRL and EXP_ASSI. Figure 2a presents time-averaged normalized difference of the regionally-averaged latitude-weighted RMSE in the EXP_ASSI experiment compared to the EXP_CTRL experiment. The results indicate that the assimilation of AGRI observations leads to a significant reduction in RMSEs across all variables, most notably in relative humidity and geopotential. Furthermore, to exclude the influence of background data corrections, we compare EXP_ASSI with EXP_CORR, the latter of which applies corrections but does not assimilate AGRI data. As shown in Fig. 2b, significant improvements in relative humidity and geopotential are also observed in the middle and upper troposphere, with the RMSE of R300, R500, Z300, and Z500 decreasing by ~4.47%, 2.77%, 3.01%, and 2.02%, respectively. The substantial improvement in humidity can be attributed to the three crucial water vapor channels of AGRI (channels 9-11). Additionally, a significant decrease in the RMSE of low-level temperatures, particularly the temperature at 1000 hPa, are associated with the assimilation of AGRI’s three window channels (channels 12–14).

**Fig. 2: The time-averaged normalized difference of the regionally-averaged latitude-weighted RMSE for 65 upper-air atmospheric variables.**

In addition to overall statistics, we present the time series of RMSE and normalized differences of globally-averaged latitude-weighted RMSE to illustrate whether FuXi-DA consistently enhances the quality of background (see Fig. 3). The experiments EXP_CTRL, EXP_CORR and EXP_ASSI are represented by black, blue, and red lines, respectively. The background fields at 06:00 and 18:00 UTC demonstrate higher accuracy than those at 00:00 and 12:00 UTC. To clarify these differences, the results for 00:00 and 12:00 will be presented separately from those for 06:00 and 18:00. The higher accuracy of the background fields at 06:00 and 18:00 UTC may result from the higher quality of ERA5 data at 00:00 and 12:00, as the background fields correspond to the 6-hour forecast fields generated by inputting ERA5 data into FuXi. The higher quality of ERA5 data at 00:00 and 12:00 UTC from the fact that radiosondes provide data for assimilation almost exclusively at 00:00 and 12:00 UTC.

**Fig. 3: Comparison of EXP_CTRL, EXP_CORR and EXP_ASSI analysis.**

As shown in Fig. 3, the EXP_CORR and EXP_ASSI experiments demonstrate superior performance compared to the EXP_CTRL experiment throughout the testing period, as indicated by negative values in the normalized RMSE differences (see Fig. 3e–h, m–p). Meanwhile, the EXP_CTRL experiment exhibited smaller errors at 06:00 and 18:00 UTC (see Fig. 3i–l), leading to less pronounced RMSE reductions in the EXP_CORR and EXP_ASSI experiments, especially in the EXP_ASSI experiment, during these times (see Fig. 3m–p). This suggests that FuXi-DA exhibits varying degrees of trust in the background fields according to their accuracy, similar to how background error estimation and usage are handled in traditional DA methods. Overall, the EXP_ASSI experiment consistently outperformed the other experiments across all variables during the testing period. Figure 3q–t illustrate the spatial distribution of RMSE percentage reductions in the EXP_ASSI experiment relative to the EXP_CORR experiment for the 4 variables: R300, T300, U300, and V300. These reductions were quantified by calculating the normalized RMSE difference, which was derived from regionally-averaged latitude-weighted RMSE, calculated within a 5° × 5° range. The reduction in RMSE within the observation area is significantly greater in the EXP_ASSI experiment compared to the EXP_CORR experiment (see Fig. 3q–t). This improvement is expected, as EXP_ASSI incorporates observations from the Fengyun-4B, unlike EXP_CORR.

To further examine the impact of assimilating AGRI observations, Fig. 4 shows the error distributions and analysis increments at 12:00 on October 15 2023. Figure 4a–f present the spatial distributions of R300 and Z500 for ERA5, EXP_CORR, and EXP_ASSI, revealing negligible differences among them. Figure 4g, j show the differences between EXP_CORR and ERA5, while Fig. 4h, k demonstrate the differences between EXP_ASSI and EXP_CORR. The analysis increments arising from the assimilation of AGRI data are predominantly localized within the observational area (see Fig. 4h, k), which results in a reduction of analysis errors. Specifically, Fig. 4g shows a significant negative anomaly to the north of Australia. After assimilating AGRI data, a positive increment emerges (see Fig. 4h). This enhancement in geopotential is further illustrated by the positive anomalies to the east of Australia and negative anomalies to the west of Australia (see Fig. 4j). After assimilating AGRI, a negative increment is observed on the east, and a positive increment on the west (see Fig. 4k).

**Fig. 4: Visualization of spatial distributions of ERA5, EXP_CORR, EXP_ASSI, biases, and analysis increments at 12:00 on October 15 2023.**

Forecast verifications

The primary objective of DA is to improve forecast accuracy. This subsection evaluates the effect of FuXi-DA on forecast performance. Figure 5 presents the time-averaged normalized RMSE difference for 10-day global forecasts among EXP_CTRL, EXP_CORR and EXP_ASSI, across three variables at three pressure levels. The RMSE of EXP_CTRL is shown in Fig. S1. The RMSE of the forecast field in EXP_CTRL is slightly higher after five days compared to the results presented in Chen et al.⁴, as Chen et al.⁴ employs the FuXi-Short and FuXi-Medium models for 0–5 day and 5–10 day forecasts, respectively, whereas only the FuXi-Short model is utilized in this study. Forecast skill improves in EXP_ASSI through both correction and assimilation processes, whereas EXP_CORR benefits solely from correction, thereby making EXP_ASSI superior in performance. Similar to analysis, the assimilation of AGRI significantly improves the accuracy of relative humidity. The impact of observations on forecast accuracy is mainly observed in the middle and upper troposphere, with larger differences at 300 and 500 hPa, and smaller differences at 850 hPa between EXP_CORR and EXP_ASSI. As forecast lead times increase, the improvements from assimilating AGRI data diminish gradually. Nevertheless, during the first 7 days, EXP_ASSI consistently exhibits significant improvement over EXP_CORR, as evidenced by the red shaded area below the blue solid line in the figure. Specifically, the reduction of Z500 errors decreases from 0.67% on day 1 to 0.34% by day 7. The normalized RMSE difference for the Northern Hemisphere, Southern Hemisphere, and Tropics is shown in Figs. S2–S4. Similar to the global results, after the assimilation of AGRI data, the humidity forecast RMSE shows varying degrees of reduction across all three regions. Compared to the Northern Hemisphere and Southern Hemisphere, the improvement in the Tropics dissipates more rapidly as the forecast lead time increases. Compared to the EXP_CORR, EXP_ASSI exhibits poorer long-term forecasts of geopotential in the Tropics, although the 6-h forecast performs better. This phenomenon is partly attributed to the relatively small RMSE of the forecasts in the Tropics (Fig. S5), which restricts the improvement in forecast accuracy. The adaptability issues associated with both the assimilation and forecasting models may also contribute to this phenomenon. In the future, we will further investigate improved methods for joint optimization and application. Nevertheless, the geopotential forecast accuracy of EXP_ASSI in the Tropics is significantly better than that of EXP_CTRL.

**Fig. 5: Comparison of global 10-day forecasts after a single assimilation.**

FuXi-DA incorporates joint optimization of analysis and forecasting during its training process, in which it not only minimizes the analysis loss but also inputs the analysis field into FuXi for forecasting and minimizes the forecast field loss (a detailed description is provided in the Model training section). This innovative training approach represents one of the key advantages of FuXi-DA. By incorporating joint optimization of analysis and forecasting, the accuracy of the forecast has been significantly improved (see Fig. 6). Compared to the results using only the analysis loss, the results utilizing both the analysis and forecast losses show similar accuracy in 6-hour forecasts but exhibit a significant improvement in accuracy for longer-term forecasts. This improvement is likely attributed to the enhanced dynamic consistency of the initial fields resulting from the supervision of the forecast.

**Fig. 6: Comparison of global 10-day forecasts with and without forecast loss after a single assimilation.**

The AGRI equipped with 3 channels designed to measure water vapor, plays an important role in enhancing forecasts of humidity. Figure 7 illustrates the spatial distribution of improvements in relative humidity forecasts attributed to the assimilation of AGRI data. These improvements were quantified by calculating the normalized RMSE difference, which was derived from regionally-averaged latitude-weighted RMSE, calculated within a 5° × 5° range. At the forecast lead time of 1 day, these improvements are primarily observed within the observational areas. As the forecast lead time increases, the influence of DA expands across a broader spatial area. Driven by the mid-latitude westerlies, these improvements predominantly propagate eastward. By the seventh day of the forecast, the impact of assimilating AGRI data becomes global. Although the forecast performance in some regions may be negatively affected, the majority of these effects were not significant, and the overall impact of assimilating AGRI data led to positive effects across most regions globally.

**Fig. 7: Visualization of spatial distribution of time-averaged normalized RMSE difference for humidity forecasts: EXP_ASSI compared with EXP_CORR.**

Interpretation

After evaluating the impact of FuXi-DA on background states and forecast performance, we further investigated the extent to which the FuXi-DA aligns with prior knowledge of atmospheric physics. Traditional DA methods often utilize single-observation experiments to confirm the validity of the impact generated by observations^17,42,43. The rationale and methodology for these experiments are discussed in the Single-observation test section. In this work, we employed a similar approach, with necessary modifications due to the inherent characteristics of the FuXi-DA. Unlike traditional methods, the FuXi-DA network does not support the use of a single observation as input. Consequently, we devised an alternative strategy involving two separate runs of FuXi-DA. The first run involved using a 6-h forecast and the original observations as inputs. For the second run, we introduced a perturbation at a specific point within the original observations. By analyzing the differences in the outputs from these two runs, we assessed the analysis increment attributable to the perturbation. This approach starts with the traditional incremental analysis formula, the validity of which is substantiated in the Single-observation test section.

We selected two observation points at 00:00 on October 26, 2023, one under clear conditions and another under cloudy conditions (see Fig. 8). The AGRI Cloud Mask (CLM) product⁴⁴ was utilized to distinguish between these conditions. Given that relative humidity showed the most significant improvement after assimilating AGRI data, channels 9 (a high-level water vapor channel) and 11 (a low-level water vapor channel) were used for single-observation experiments. Table 1 lists the five perturbation experiments conducted in this study. The assimilation time window employed in this study is two hours, consisting of one hour before and one hour after the analysis time. To investigate the impact of introducing perturbations at different times within the assimilation window, perturbations were introduced at the beginning, middle, and end of this window, respectively. The effects of a 1 K perturbation at the clear point are presented in Fig. 9. After introducing a 1 K perturbation to channel 9, the relative humidity decreased (see Fig. 9a–c), which is consistent with the theory of radiative transfer. Specifically, reduced water vapor leads to decreased absorption of surface-emitted radiation, thereby increasing the brightness temperature⁴⁵. For a clear illustration, Fig. 9g shows the humidity Jacobian function for channel 9, which represents the sensitivity of the satellite-observed brightness temperature to humidity. As shown in the figure, the negative values of the humidity Jacobian indicate a negative correlation between brightness temperature and humidity. Further experiments revealed the effects of different perturbation magnitudes. Introducing a perturbation of −1 K resulted in positive increments similar in magnitudes to those from a 1 K perturbation (see Fig. S6a–c). Additionally, a 5 K perturbation produced increments in relative humidity approximately five times greater than those from a perturbation of 1 K (see Fig. S6d–f), and exhibits nonlinearities similar to those observed in previous work³². The analysis increments of R500 resulting from introducing a perturbation of 1 K to channel 11 exhibits patterns similar to those of channel 9 (see Fig. 9h–j). It is important to note that the characteristic of the analysis increments is the dispersion from the observation point towards the surroundings, which is attributed to the spatial correlations in the background field.

**Fig. 8: Visualization of spatial distributions of AGRI CLM product and the selected observation locations at 00:00 on October 26, 2023.**

Table 1 Observation perturbations test

Full size table

**Fig. 9: The analysis increment of relative humidity resulting from introducing a perturbation of 1 K to the clear observation (pentagram).**

In theory, the vertical distribution of relative humidity increments corresponds to the Jacobian functions of the local temperature and moisture profile at the location of the single observation (a detailed description is provided in the Jacobian section). To enhance the demonstration of physical consistency, Fig. 9 also shows the vertical profile of the relative humidity increment and the humidity Jacobian functions. The atmospheric variables generated by FuXi-DA are constrained to only 13 vertical layers. Inputting such a sparse vertical atmospheric profile into a fast radiative transfer model is insufficient to generate an accurate Jacobian matrix. Therefore, the Jacobian matrix calculated from the standard atmospheric profile was chosen as a reference. The peak of the analysis increment caused by perturbing channel 9 and the peak of the humidity Jacobian function are observed near 300 hPa (see Fig. 9d–g), while the peak of the analysis increment caused by perturbing channel 11 and the peak of the humidity Jacobian function are observed at a lower altitude, approximately 500 hPa (see Fig. 9k–n). Although the locations of the increments for both channels align well with their respective Jacobian functions, we cannot conclude that this represents a manifestation of physical consistency, as the displayed Jacobian functions correspond to a standard atmospheric profile rather than the background field profile. However, the relative positional differences of the increments between channel 9 and channel 11, consistent with the disparities in their Jacobians (one high and one low) provide sufficient evidence to confirm the physical consistency of FuXi-DA. For both channel 9 and channel 11, the analysis increments remain small after introducing a 1 K perturbation. This could be attributed to the non-sparsification of observations, allowing for correlation between adjacent observations. When multiple observations jointly influence the same location in the background, a larger analysis increment is typically expected. Another possible reason is the high accuracy of the background, which prompts the assimilation model to trust the background more. Additionally, all perturbation experiments show a notable trend: the impact of observations increases towards the end of the assimilation window. The pattern is consistent with that observed in the traditional 4D-Var assimilation system⁴⁶.

Traditional satellite DA methods require complex modeling for assimilating observations under cloudy conditions^15,17,18. Additionally, if the DA system lacks all-sky assimilation capabilities, numerous observations impacted by clouds are discarded during the cloud detection process. In this study, the FuXi-DA model assimilates AGRI observations without distinguishing between cloudy and clear-sky conditions. To assess whether the model can assimilate cloudy observations, we examined the analysis increment of relative humidity induced by a 1K perturbation applied to channels 9 and 11 under cloudy conditions (see Fig. 8). Notably, the analysis increment around the cloudy observation point is insignificant (see Fig. S7), suggesting that FuXi-DA may not efficiently assimilate cloudy observations. This is acceptable, as the considerable absorption of infrared radiation by clouds complicates the retrieval of information from beneath the cloud layer. Furthermore, the absence of hydrometeors (such as liquid water, ice, snow, hail, graupel) and other particles (including those in mixed phases) in the background makes it difficult for FuXi-DA to establish connections between the cloudy observations and the background. Nevertheless, the varying performance of FuXi-DA under clear and cloudy conditions illustrates its ability to effectively differentiate between cloudy and clear-sky conditions, thereby circumventing the cloud detection process. Cloud information is not directly included in FuXi-DA’s input but is learned from the data. On one hand, the total precipitation variable in the background field implicitly includes cloud information. As shown in Fig. 8 and Fig. S8, the distribution of total precipitation closely aligns with that of the clouds. On the other hand, the original observations encompass both high-level channels (e.g., channel 9) and window channels (e.g., channel 13). The differences between the observations from high-level and low-level channels can provide information about clouds. Future efforts to incorporate hydrometeor variables into the background field are expected to enhance the model’s all-sky assimilation capability.

Discussion

Both numerical weather prediction (NWP) systems and the recently developed deep learning-based (DL-based) weather forecasting models rely on accurate initial conditions. Currently, satellite observations have become the primary source of observational information for DA systems. Assimilating these data is more complex and computationally intensive. In this paper, we have developed a scalable satellite DA framework using DL models, named FuXi-DA. Using the FuXi-DA assimilation framework, we have successfully assimilated data from Fengyun-4B/AGRI. Results indicate a significant improvement in both analysis and forecast accuracy after assimilating AGRI data. Additionally, FuXi-DA demonstrates consistency with prior knowledge of atmospheric physics. The successful assimilation of AGRI data in FuXi-DA represents a pivotal advancement towards transitioning from traditional DA methods to DL-based DA models.

FuXi-DA utilizes an encoder-decoder architecture to process super-observation data at a 0.25° resolution, eliminating the need for sparsification and enabling the effective use of all available observational information. By introducing new encoder-decoder pairs, it can be readily adapted for the assimilation of other regular satellite observations, such as those obtained from advanced infrared sounders, microwave sounders, and imagers¹⁵. Future research will explore the balance between gridded observation resolution, computational resources, and assimilation performance, while developing new technologies to handle heterogeneous and sparse observational data, including radio occultation data from Global Navigation Satellites, radiosonde soundings, and aircraft meteorological reports^47,48,49, with the goal of enhancing the application of FuXi-DA in real-world observational systems characterized by a wide variety and large volume of data.

FuXi-DA utilizes encoder networks to project both background and observational data into a unified feature space and integrates these information through a fusion network, eliminating the need for observation operators. The introduction of satellite zenith angle and spatiotemporal information ensures that FuXi-DA addresses systematic biases associated with satellite zenith angle or spatiotemporal position. Additionally, the variation in assimilation performance of FuXi-DA across background fields with different accuracies indicate its ability to effectively weight background and observational data. Single observation perturbation experiments also demonstrate FuXi-DA’s capacity to autonomously differentiate between clear-sky and cloudy observations. Currently, FuXi-DA is yet to demonstrate its ability to assimilate observational data under all-sky conditions. The incorporation of hydrometeors and other particles into the forecast model may extend its all-sky assimilation capabilities. To further enhance FuXi-DA’s feature extraction and integration capabilities, we plan to implement state-of-the-art training strategies, such as contrastive learning techniques to align disparate features⁵⁰, and explore advanced architectures like Transformers to achieve feature fusion⁵¹. Additionally, it is essential to develop an algorithm within DL-based assimilation systems to detect and correct sudden observational biases, such as those resulting from satellite instrument degradation or contamination during extreme events⁵², thereby mitigating their impact on the assimilation process.

AI-based DA methods rely on high-quality datasets. Accurate reanalysis data, such as ERA5 and the forthcoming ERA6, are crucial for enhancing FuXi-DA’s performance. Research on supervised learning directly from sparse observational data is also essential. For instance, radio occultation data from global navigation satellites is especially effective for constraining assimilation results in the upper atmosphere, as it remains unbiased in the upper troposphere and stratosphere⁴⁷. Additionally, long-term forecast supervision was incorporated into the training of FuXi-DA, offering benefits over traditional DA methods. Looking ahead, specialized assimilation models can be developed for specific forecast lead times or particular weather events, such as typhoons.

In summary, this study explores the feasibility of assimilating satellite data using DL methods. FuXi-DA has proven to be highly effective in assimilating satellite observations to improve forecast accuracy. Its strong performance in merging background and observational information, coupled with its design simplicity and scalability, and the rapidity of computation, provides a robust basis for developing an end-to-end DL-based weather prediction system.

Methods

Data

ERA5 is the fifth generation of the ECMWF reanalysis dataset, generated by assimilating high-quality and abundant global observations using ECMWF’s IFS model. The ERA5 data is widely regarded as the most comprehensive and accurate reanalysis archive. Therefore, we use the ERA5 reanalysis dataset as the ground truth for model training. Moreover, ERA5 data is also used as the input for generating the background in the FuXi model. In this work, we use 5 upper-air atmospheric variables at 13 pressure levels (50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa), and 5 surface variables. The 5 upper-air atmospheric variables are geopotential (Z), temperature (T), u component of wind (U), v component of wind (V), and relative humidity (R). Additionally, 5 surface variables are T2M, 10-m u wind component (U10), 10-m v wind component (V10), mean sea level pressure (MSL), and total precipitation (TP). In total, 70 variables are used in this study, with a spatial resolution of 0.25° and a temporal resolution of 6 h.

Fengyun-4B/AGRI has a total of 15 channels, including 3 visible (VIS) channels, 3 near-infrared (NIR) channels, 2 mid-wave infrared (IR) channels, 3 water vapor channels, and 4 long-wave IR channels. The nadir footprint for the VIS channels is 1 km, for the NIR channels and one mid-wave IR channel, it is 2 km, and for the remaining channels, it is 4 km⁵³. In this study, brightness temperatures from channels 8–15 are employed for assimilation (see Table 2), with a 2-h assimilation time window (one hour before and after the analysis time). The AGRI observation interval is 15 minutes, and the 2-h assimilation time window encompasses 8 time frames. The selection of a 2-h assimilation time window was not determined through rigorous experiments. However, we tested extending the time window to 6 h, which did not yield significant benefits for the assimilation results. This may be attributed to the fixed observation range of AGRI.

Table 2 The characteristics of Fengyun-4B/AGRI channel 8-15

Full size table

Additionally, considering that satellite observation biases usually vary with scan angle and geographic location^52,54,55, the longitude, latitude, satellite zenith angle, and observation time of each observation are encoded as additional channels. These include the cosine of longitude, the sine of latitude, and the cosine of the satellite zenith angle. For time encoding, the “Day of year" is normalized to the [0, 1] range, and its cosine and sine values are computed by $cos(\frac{day\,of\,year}{366}\times 2\pi )$ and $sin(\frac{day\,of\,year}{366}\times 2\pi )$. Similarly, the hour and minute of the observation time are processed in the same way, and their corresponding cosine and sine values are calculated by $cos(\frac{hour\times 60+minute}{60\times 24}\times 2\pi )$ and $sin(\frac{hour\times 60+minute}{60\times 24}\times 2\pi )$. Finally, the observation input consists of a total of 15 channels, including 8 brightness temperature channels and 7 spatiotemporal encoding channels. Since the footprint of these channels is 4 km (2748 × 2748 latitude-longitude scattered points), and the background fields have a spatial resolution of 0.25° (721 × 1440 latitude-longitude grid points), achieving correspondence of the respective positions within the network for these two types of data poses a challenge. We processed the satellite observations into “super-observations" that matched the resolution of the background fields by averaging multiple observations within the background field grid. This approach makes the observations more representative of the spatial scales in the model and effectively reduces the uncertainty of the observations^40,41. The specific process includes: (1) Cropping a regular grid area that includes the observation area based on the background coordinates. For AGRI data, the corresponding background size is approximately 640 × 640. (2) Averaging the satellite observations within ±0.125° around each grid point of the cropped area to represent the satellite observations at that grid point. (3) Applying a mask to positions within the cropped area lacking matched observations by assigning a value of 0. After processing, the satellite observations were processed into 640 × 640 latitude-longitude grid point data with a spatial resolution of 0.25°.

The data for training FuXi-DA is provided in the form of input-output pairs, where the inputs include background fields and satellite observations, and the output target is ERA5 data. The available data spans a period from June 2022 to December 2023. The data is divided into training, validation, and testing sets. The training includes 1460 samples spanning from June 2022 to May 2023. The validation set contains 244 samples, corresponding to the period from June 2023 to July 2023, while out-of-sample testing is performed using 612 samples from August 2023 to December 2023.

FuXi-DA model

In the three-dimensional variational (3D-Var), the general form for the analysis can be expressed as ref. ⁸:

$${{\bf{x}}}^{{\bf{a}}}={{\bf{x}}}^{{\bf{b}}}+{{\bf{BH}}}^{{\bf{T}}}{({{\bf{HBH}}}^{{\bf{T}}}+{\bf{R}})}^{-1}({{\bf{y}}}^{{\bf{o}}}-H[{{\bf{x}}}^{{\bf{b}}}])$$

(1)

where x^a is the state vector of analysis field, x^b is the state vector of background field, y^o is the observation vector, H is the observation operator, ${\bf{H}}=\frac{\partial H({\bf{x}})}{\partial {\bf{x}}}$ is the tangent linear operator of H, B is the covariance matrix of background error and R is the covariance matrix of observation errors. ${\bf{K}}={{\bf{BH}}}^{{\bf{T}}}{({{\bf{HBH}}}^{{\bf{T}}}+{\bf{R}})}^{-1}$ is usually called the gain, or weight matrix of the analysis. Inspired by this form, in FuXi-DA, the DA process is treated as an incremental learning process. As shown in Fig. 10, FuXi-DA adopts a multi-branch U-net architecture⁵⁶ to learn representation of the multi-source information x^b and y^o. The contracting path of the U-net architecture reduces the resolution of the input data progressively to capture global context, thus enabling the model to have a larger receptive field to capture long-range dependencies and learn spatial correlations such as background and observation errors. The symmetric expanding path helps the network recover spatial details by progressively increasing the resolution of the feature maps, allowing the model to accurately localize and refine the learned representations. Fusion modules are applied on the feature spaces of x^b and y^o for learning the analysis increments and adjust the background information at various scales. The fusion module also follows the overall U-net architecture.

**Fig. 10: Architecture of FuXi-DA model.**

The inputs for Fuxi-DA include background field and satellite observations. The background fields have dimensions of 70 × 721 × 1440, where 70, 721, and 1440 represent the total number of background variables, latitude and longitude grid points, respectively. The satellite observations have dimensions of 8 × 15 × 640 × 640, where 8, 15, 640, and 640 represent the total number of time frames, observation variables, latitude and longitude grid points, respectively. It is worth noting that the observation variables include 8 channels and 7 encoded observation information (one for latitude, one for longitude, one for satellite zenith angle, and four for time). Before being fed to FuXi-DA, the background input is reshaped into 70 × 720 × 1440 through bilinear interpolation to facilitate downsampling operations within the model. The observation input is reshaped into 120 × 640 × 640 by merging the time and variable dimensions together. This reshaping operation is carried out because the current version of FuXi-DA does not model the temporal dimension and requires dimensionality reduction to accommodate the 2-dimensional (2D) convolution. The temporal information has been pre-encoded and integrated into the model as the channel dimension. In the model, we also construct conditional information from the background and observational data to integrate background and observational information. By cropping the background input corresponding to the observation area and concatenating it with the observation input, we obtain the initial condition information with dimensions of 190 × 640 × 640.

As shown in Fig. 10a, the multimodal inputs mentioned above are fed to separate branches: the observation branch (blue), the condition branch (yellow), and the background branch (green), respectively. The background and observation branches independently learn and represent their modality-specific information, and the condition branch captures the differences and complementarities between them. Communication among these branches is facilitated by the fusion modules. In the fusion module, multi-source information interacts within a unified feature space, thus avoiding the introduction of observation operators. All branches utilize the U-net architecture to capture the large receptive field of context information. For the background branch, a skip connection is employed to concatenate the output of the first fusion module stage with that of the last fusion module stage, thereby enhancing the focus on local details that might otherwise be lost during the downsampling process.

To accurately adjust weight of observations and background, fusion modules are applied at different scales by using the stage-by-stage corrected condition information. The details are shown in Fig. 10b. This module receives information from the three branches and facilitates information interaction through the U-net architecture, which includes two downsample and two upsample stages. The fusion module produces three outputs. The first output, which implicitly contains information about analysis increments, is added to the background branch. This operation is consistent with the description in Equation (1), where the weight matrix is learned from historical data. The second output, which consists of observed bias information, is added to the observation branch. This operation is based on the traditional bias correction process, in which predictions computed from the background field are typically used for bias correction^52,54,57. The third output is propagated to the subsequent network as the corrected condition information. It is viewed as the new condition that initiates the next correction.

The downsample and upsample modules used in FuXi-DA are shown in Fig. 10c. The downsample consists of a 2 × 2 2D convolution layer with a stride of 2, a layer normalization layer⁵⁸, a sigmoid-weighted linear unit (SiLU) activation⁵⁹ and a 3 × 3 2D convolution layer with a stride of 1. And the upsample consists of a 3 × 3 2D convolution layer with a stride of 1, a layer normalization layer, a sigmoid-weighted linear unit (SiLU) activation, a 3 × 3 2D convolution layer with a stride of 1 and a pixel-shuffle layer⁶⁰ with an upscaling factor of 2.

Model training

The FuXi-DA model is built using the PyTorch framework⁶¹. The model was trained on 4 Nvidia A100 GPUs for a total of 6000 iterations, with a batch size of 1 per GPU. The entire model training process lasted about 8 h. The AdamW⁶² optimizer is used with parameters β₁ = 0.9, β₂ = 0.999 and a weight decay coefficient of 1e⁻⁵. The learning rate was set using a warmup and a cosine annealing schedule⁶³. First, the learning rate increases during the training process according to the formula:

$$lrate=[(step\,{\_}num-1)/warmup{\_}\,steps]\times (stop\,{\_}lrate-start{\_}lrate)+start{\_}\,lr$$

(2)

This corresponds to increasing the learning rate linearly for the first warmup_steps training steps. We used warmup_steps = 500, start_lrate = 1e⁻⁸, and stop_lrate = 2e⁻³. Second, the learning rate decreases through a cosine annealing schedule with an initial learning rate of 2e⁻³ and maximum number of iterations of 6000. We use the latitude-weighted L1 loss to minimize the errors between Fuxi-DA’s outputs and the ERA5, which is defined as follows:

$$L1\_Loss=\frac{1}{C\times H\times W}\mathop{\sum }\limits_{c=1}^{C}\mathop{\sum }\limits_{i=1}^{H}\mathop{\sum }\limits_{j=1}^{W}{\alpha }_{i}| {\widehat{{\bf{X}}}}_{c,i,j}-{{\bf{X}}}_{c,i,j}|$$

(3)

where C, H and W are the number of channels and the number of latitude and longitude grid points, respectively. $\widehat{{\bf{X}}}$ is the ground truth. $\scriptstyle{\alpha }_{i}=H\times {cos}{\Phi }_{i}/\mathop{\sum }\limits_{i=1}^{H}cos{\Phi }_{i}$ is the weight at latitude Φ_i. In the actual training process, we joint train FuXi-DA with the DL-based weather forecasting model FuXi by a multi-time-step loss. The output of FuXi-DA is used as the input for FuXi for long-range forecasting, and the errors between the forecast results and ERA5 are minimized. Notably, the parameters of the FuXi model are not updated; instead, it provides gradients during training to optimize the parameters of FuXi-DA. This long-range forecasting supervision enhances the assimilation results of FuXi-DA, improving its performance in long-term forecasting. The final loss function used in Fuxi-DA is defined as follows:

$$Total\,{\_}Loss=L1{\_}Los{s}^{0}+\frac{1}{T}\mathop{\sum}\limits_{t=1}^{T}L1{\_}\,Los{s}^{t}$$

(4)

T represents the number of time steps, where at t = 0, X denotes the current analysis field in L1_Loss, and at t > 0, X represents the forecast field in L1_Loss. In this study, we set T = 10, indicating the inclusion of the analysis field and 10 forecast fields spanning 6–60 h. This approach provides a larger optimization window beyond the traditional assimilation window, which leverages supervised information from medium-range forecasts during training.

Experimental configurations

This study designs three experimental configurations to assess the assimilation performance of FuXi-DA. In the assimilation experiment (EXP_ASSI), FuXi-DA assimilates AGRI data into the background field, generating the assimilated analysis field. In the correction experiment (EXP_CORR), a DL model is employed to correct the error between the background field and ERA5, yielding the corrected background field, which is then used as the analysis field for forecasting. In the control experiment (EXP_CTRL), the background field remains unprocessed and is directly used as the analysis field.

The EXP_CORR experiment aims to more effectively assess the impact of assimilating AGRI data. The FuXi model is not perfect, and the background field generated by its forecasts may contain errors, which can be reduced through additional correction. Moreover, the FuXi model is trained on ERA5 data from 1979 to 2015, whereas the dataset used in this study spans from June 2022 to December 2023. The quality of ERA5 varies over time, as the types and quantity of observations assimilated in ERA5 change over time. When the FuXi model is directly employed to generate the background field for this study, additional errors may arise, as the model is not adapted to the new data. In this case, the DL model, trained with the background field as input and ERA5 data as the target label, will learn the relationship between the background field and ERA5 data, and its output will be closer to ERA5. Thus, the improvements observed in the EXP_ASSI experiment result from both the assimilation of AGRI data and background correction. The EXP_CORR experiment aims to assess the effect of AGRI data assimilation alone, excluding the impact of background correction. To ensure consistency with the EXP_ASSI experiment, the EXP_CORR model retains the fundamental architecture of FuXi-DA but only includes the background branch, removing the fusion module (see Fig. S9). The training process of the EXP_CORR model mirrors that of FuXi-DA.

Evaluation method

The latitude-weighted root mean square error (RMSE) are used to evaluate analysis and forecast performance, which is calculated as follows:

$$\,\text{RMSE}\,(c)=\sqrt{\frac{1}{H\times W}\mathop{\sum }\limits_{i=1}^{H}\mathop{\sum }\limits_{j=1}^{W}{\alpha }_{i}{({\widehat{{\bf{X}}}}_{c,i,j}-{{\bf{X}}}_{c,i,j})}^{2}}$$

(5)

where H and W represent the total number of latitude and longitude grid points. $\scriptstyle{\alpha }_{i}=H\times \cos {\Phi }_{i}/\mathop{\sum }\limits_{i=1}^{H}\cos {\Phi }_{i}$ is the weight at latitude $\Phi_{i}.$ Similarly, the regionally-average latitude-weighted RMSE is calculated as follows:

$$\,\text{RMSE}\,(c)=\sqrt{\frac{1}{({H}_{\max }-{H}_{\min })\times ({W}_{\max }-{W}_{\min })}\mathop{\sum }\limits_{i={H}_{\min }}^{{H}_{\max }}\mathop{\sum }\limits_{j={W}_{\min }}^{{W}_{\max }}{\widehat{\alpha }}_{i}{({\widehat{{\bf{X}}}}_{c,i,j}-{{\bf{X}}}_{c,i,j})}^{2}}$$

(6)

where ${H}_{\min }$, ${H}_{\max }$, ${W}_{\min }$ and ${W}_{\max }$ represent the range of grid point indices corresponding to the selected region. ${\widehat{\alpha }}_{i}=({H}_{\max }-{H}_{\min })\times \cos {\Phi }_{i}/\mathop{\sum }\nolimits_{i={H}_{\min }}^{{H}_{\max }}\cos {\Phi }_{i}$ is the weight at latitude Φ_i.

The normalized RMSE difference between experiment A and experiment B calculated as (RMSE_A − RMSE_B)/RMSE_B. Additionally, we used the paired normalized RMSE difference to perform the t-test of significance. Given that the normalized RMSE difference exhibits a small auto-correlation and the sample size of the test set is large (approximately 612 samples), this study did not apply inflation⁶⁴.

Single-observation test

To verify that the new DA system worked properly, traditional method typically involves running a full DA process but assimilated only one observation, which was referred to as “single-observation test”. Here, we provide a simple explanation using the 3D-Var method as an example. According to Equation (1), the analysis increment can be represented as:

$${{\bf{x}}}^{{\bf{a}}}-{{\bf{x}}}^{{\bf{b}}}={{\bf{BH}}}^{{\bf{T}}}{({{\bf{HBH}}}^{{\bf{T}}}+{\bf{R}})}^{-1}({{\bf{y}}}^{{\bf{o}}}-H[{{\bf{x}}}^{{\bf{b}}}])$$

(7)

For a single observation, ${({{\bf{HBH}}}^{{\bf{T}}}+{\bf{R}})}^{-1}({{\bf{y}}}^{{\bf{o}}}-H[{{\bf{x}}}^{{\bf{b}}}])$ can be considered as a scalar. The matrix B describes how the increment is spread. The matrix H represents the sensitivity of the observations to the background variables, which is generally referred to as the Jacobian matrix. In traditional DA, y^o − H[x^b] can be directly calculated, and H can be derived from a fast radiative transfer model. By analyzing whether the distribution of analysis increments is consistent with the theory, the proper operation of the DA system can be verified.

Considering the uniqueness of the FuXi-DA model, we specifically designed a method for single-observation test. This method involves two runs of FuXi-DA. The first run uses the background field and the original observations as inputs to obtain the first analysis field. For the second run, we introduce a perturbation at a specific point within the original observations, and use the background field and the modified observations as inputs to obtain the second analysis field. The difference between the two analysis fields is the analysis increment resulting from the perturbation. Beginning with the traditional incremental analysis formula, a proof of this method is provided next.

According to Equation (1), the analysis can be simplified as:

$${{\bf{x}}}^{{\bf{a}}}={{\bf{x}}}^{{\bf{b}}}+{\bf{K}}({{\bf{y}}}^{{\bf{o}}}-H[{{\bf{x}}}^{{\bf{b}}}])$$

(8)

If a perturbation Δy is added to y^o, the new analysis ${{\bf{x}}}^{{{\bf{a}}}^{{\prime} }}$ can be expressed as:

$${{\bf{x}}}^{{\bf{a}}}={{\bf{x}}}^{{\bf{b}}}+{\bf{K}}({{\bf{y}}}^{{\bf{o}}}+\Delta {\bf{y}}-{\bf{H}}[{{\bf{x}}}^{{\bf{b}}}])$$

(9)

From equations (8) and (9), the analysis increment ${\mathbf{\Delta }}{{\bf{x}}}^{{{\bf{a}}}^{{\prime} }}$ resulting from the perturbation can be expressed as:

$${\mathbf{\Delta }}{{\bf{x}}}^{{{\bf{a}}}^{{\prime} }}={\bf{K}}[\Delta {\bf{y}}]$$

(10)

It can be found that the analysis increment is only related to the perturbation, provided that the gain matrix is fixed.

Jacobian

The radiative transfer model (RTM) RTTOV version 13.2 is used as the observation operator to simulate the clear-sky radiances and calculate the Jacobian matrix. The atmospheric profile used in this study comes from the US standard atmosphere. RTTOV is developed by the Satellite Application Facility on Numerical Weather Prediction (NWP SAF) of EUMET- SAT. The Jacobian is used to evaluate the sensitivity of a radiance to a physico-chemical parameter. For a channel with central wavenumber v, it represents the sensitivity of the brightness temperature with respect to a change in a geophysical parameter (X) such as humidity in our case. It is defined as:

$${{\bf{J}}}_{v}({\bf{X}})=\frac{\partial {\bf{BT}}({\bf{v}})}{\partial {\bf{X}}}$$

(11)

Data availability

The ERA5 reanalysis dataset is available at https://cds.climate.copernicus.eu/; the AGRI data and AGRI CLM products are available at http://satellite.nsmc.org.cn/PortalSite/Data/DataView.aspx; the FuXi model is available at https://github.com/tpys/FuXi; and the source code employed for running FuXi-DA model is available at https://github.com/xuxiaoze/FuXi-DA.

References

Bauer, P., Thorpe, A. & Brunet, G. The quiet revolution of numerical weather prediction. Nature 525, 47–55 (2015).
Article CAS Google Scholar
Pathak, J. et al. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214 (2022).
Bi, K. et al. Accurate medium-range global weather forecasting with 3d neural networks. Nature 619, 533–538 (2023).
Article CAS Google Scholar
Chen, L. et al. Fuxi_ a cascade machine learning forecasting system for 15-day global weather forecast. npj Clim. Atmos. Sci. 6, 190 (2023).
Article Google Scholar
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
Article CAS Google Scholar
Bjerknes, V. Das problem der wettervorhersage, betrachtet vom standpunkte der mechanik und der physik. Meteor. Z. 21, 1–7 (1904).
Google Scholar
Lorenz, E. N. Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130–141 (1963).
Article Google Scholar
Lorenc, A. C. Analysis methods for numerical weather prediction. Q. J. R. Meteorol. Soc. 112, 1177–1194 (1986).
Article Google Scholar
Houtekamer, P. L. & Zhang, F. Review of the ensemble kalman filter for atmospheric data assimilation. Monthly Weather Rev. 144, 4489–4532 (2016).
Article Google Scholar
Bannister, R. N. A review of operational methods of variational and ensemble-variational data assimilation. Q. J. R. Meteorol. Soc. 143, 607–633 (2017).
Article Google Scholar
Carrassi, A., Bocquet, M., Bertino, L. & Evensen, G. Data assimilation in the geosciences: an overview of methods, issues, and perspectives. Wiley Interdiscip. Rev. Clim. Change 9, e535 (2018).
Article Google Scholar
Coopmann, O., Guidard, V., Fourrié, N., Josse, B. & Marécal, V. Update of infrared atmospheric sounding interferometer (iasi) channel selection with correlated observation errors for numerical weather prediction (nwp). Atmos. Measure. Techniques 13, 2659–2680 (2020).
Article CAS Google Scholar
Duncan, D. I., Bormann, N., Geer, A. J. & Weston, P. Superobbing and thinning scales for all-sky humidity sounder assimilation. Monthly Weather Rev. 152, 1821–1837 (2024).
Geer, A. J. et al. All-sky satellite data assimilation at operational weather forecasting centres. Q. J. R. Meteorol. Soc. 144, 1191–1217 (2018).
Article Google Scholar
Eyre, J. et al. Assimilation of satellite data in numerical weather prediction. part ii: recent years. Q. J. R. Meteorol. Soc. 148, 521–556 (2022).
Article Google Scholar
Observations used in ECMWF. https://www.ecmwf.int/en/research/data-assimilation/observations. Accessed: 2024-04-05.
Bauer, P., Geer, A. J., Lopez, P. & Salmond, D. Direct 4d-var assimilation of all-sky radiances. part i: Implementation. Q. J. R. Meteorol. Soc. 136, 1868–1885 (2010).
Article Google Scholar
Li, J. et al. Satellite all-sky infrared radiance assimilation: recent progress and future perspectives. Adv. Atmos. Sci. 39, 9–21 (2022).
Article Google Scholar
Rabier, F., Thépaut, J.-N. & Courtier, P. Extended assimilation and forecast experiments with a four-dimensional variational assimilation system. Q. J. R. Meteorol. Soc. 124, 1861–1887 (1998).
Google Scholar
Evensen, G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Oceans 99, 10143–10162 (1994).
Article Google Scholar
Watts, P. D. & McNally, A. P. Identification and correction of radiative transfer modelling errors for atmospheric sounders: AIRS and AMSU-A. Proceedings of the ECMWF Workshop on Assimilation of High Resolution Sounders in NWP. Reading, UK, 23–38 (2004).
Haseler, J. Early-delivery suite. (2004). https://www.ecmwf.int/node/9793.
Cheng, S. et al. Machine learning with data assimilation and uncertainty quantification for dynamical systems: a review. IEEE/CAA J. Automatica Sin. 10, 1361–1387 (2023).
Article Google Scholar
Brajard, J., Carrassi, A., Bocquet, M. & Bertino, L. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: a case study with the lorenz 96 model. J. Comput. Sci. 44, 101171 (2020).
Article Google Scholar
Arcucci, R., Zhu, J., Hu, S. & Guo, Y.-K. Deep data assimilation: integrating deep learning with data assimilation. App. Sci. 11, 1114 (2021).
Article CAS Google Scholar
Fablet, R. et al. Learning variational data assimilation models and solvers. J. Adv. Model. Earth Syst. 13, e2021MS002572 (2021).
Article Google Scholar
Legler, S. & Janjić, T. Combining data assimilation and machine learning to estimate parameters of a convective-scale model. Quart. J. R. Meteorol. Soc. 148, 860–874 (2022).
Article Google Scholar
Li, X., Xiao, C., Cheng, A. & Lin, H. Joint estimation of parameter and state with hybrid data assimilation and machine learning. Authorea Preprints (2022).
Hatfield, S. et al. Building tangent-linear and adjoint models for data assimilation with neural networks. J. Adv. Model. Earth Syst. 13, e2021MS002521 (2021).
Article Google Scholar
Liang, X. et al. A deep-learning-based microwave radiative transfer emulator for data assimilation and remote sensing. IEEE J. Selected Topics Appl. Earth Observ. Remote Sens. 15, 8819–8833 (2022).
Article Google Scholar
Stegmann, P. G., Johnson, B., Moradi, I., Karpowicz, B. & McCarty, W. A deep learning approach to fast radiative transfer. J Quantitative Spectrosc Radiative Transfer 280, 108088 (2022).
Article CAS Google Scholar
Melinc, B. & Zaplotnik, Ž. 3d-var data assimilation using a variational autoencoder. Q. J. R. Meteorol. Soc. 150, 2273–2295 (2024).
Article Google Scholar
Xiao, Y. et al. Towards a self-contained data-driven global weather forecasting framework. Forty-first International Conference on Machine Learning. https://openreview.net/forum?id=Y2WorV5ag6 (2024).
Chen, K. et al. Towards an end-to-end artificial intelligence driven global weather forecasting system. arXiv preprint arXiv:2312.12462 (2023).
Huang, L., Gianinazzi, L., Yu, Y., Dueben, P. D. & Hoefler, T. Diffda: a diffusion model for weather-scale data assimilation. Proceedings of the 41st International Conference on Machine Learning. 19798–19815 (Vienna, Austria, 2024).
Maddy, E. S., Boukabara, S. A. & Iturbide-Sanchez, F. Assessing the feasibility of an nwp satellite data assimilation system entirely based on ai techniques. IEEE J. Selected Topics Appl. Earth Observa. Remote Sens. 17, 9828–9845 (2024).
Andersson, E. et al. Global observing system experiments on operational statistical retrievals of satellite sounding data. Monthly Weather Rev. 119, 1851–1865 (1991).
Article Google Scholar
Eyre, J. R., English, S. J. & Forsythe, M. Assimilation of satellite data in numerical weather prediction. part i: the early years. Q. J. R. Meteorol. Soc. 146, 49–68 (2020).
Article Google Scholar
Geer, A. et al. The growing impact of satellite observations sensitive to humidity, cloud and precipitation. Q. J. R. Meteorol. Soc. 143, 3189–3206 (2017).
Article Google Scholar
Geer, A. J. & Bauer, P. Enhanced use of all-sky microwave observations sensitive to water vapour, cloud and precipitation (ECMWF Reading, 2010).
Bell, W. et al. The assimilation of SSMIS radiances in numerical weather prediction models. IEEE Trans. Geosci. Remote Sens. 46, 884–900 (2008).
Article Google Scholar
Xie, H., Han, W. & Bi, L. Assimilating fy3d-mwri 23. 8 ghz observations in the cma-gfs 4dvar system based on a pseudo all-sky data assimilation method. Q. J. R. Meteorol. Soc. 149, 3014–3043 (2023).
Article Google Scholar
Andersson, E. et al. Use of cloud-cleared radiances in three/four-dimensional variational data assimilation. Q. J. R. Meteorol. Soc. 120, 627–653 (1994).
Google Scholar
Min, M. et al. Developing the science product algorithm testbed for chinese next-generation geostationary meteorological satellites: Fengyun-4 series. J. Meteorol. Res. 31, 708–719 (2017).
Article Google Scholar
Sieglaff, J. M., Schmit, T. J., Menzel, W. P. & Ackerman, S. A. Inferring convective weather characteristics with geostationary high spectral resolution ir window measurements: a look into the future. J. Atmos. Ocean Tech. 26, 1527–1541 (2009).
Article Google Scholar
McNally, A. P. On the sensitivity of a 4d-var analysis system to satellite observations located at different times within the assimilation window. Q. J. R. Meteorol. Soc. 145, 2806–2816 (2019).
Article Google Scholar
Kursinski, E., Hajj, G., Schofield, J., Linfield, R. & Hardy, K. R. Observing earth’s atmosphere with radio occultation measurements using the global positioning system. J. Geophys. Res. Atmos. 102, 23429–23465 (1997).
Article Google Scholar
Moninger, W. R., Mamrosh, R. D. & Pauley, P. M. Automated meteorological reports from commercial aircraft. Bull. Am. Meteorol. Soc. 84, 203–216 (2003).
Article Google Scholar
Ingleby, B. et al. Progress toward high-resolution, real-time radiosonde reports. Bull. Am. Meteorol. Soc. 97, 2149–2161 (2016).
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning. 1597–1607 (2020).
Vaswani, A. Attention is all you need. Adv. Neural Inform. Process. Syst. 30, 5998–6008 (2017).
Auligné, T., McNally, A. P. & Dee, D. P. Adaptive bias correction for satellite data in a numerical weather prediction system. Q. J. R. Meteorol. Soc. 133, 631–642 (2007).
Article Google Scholar
Fengyun-4B/AGRI observation. http://www.nsmc.org.cn/nsmc/en/instrument/AGRI.html. Accessed: 2024-04-11.
Harris, B. & Kelly, G. A satellite radiance-bias correction scheme for data assimilation. Q. J. R. Meteorol. Soc. 127, 1453–1468 (2001).
Google Scholar
Yin, R., Han, W., Gao, Z. & Di, D. The evaluation of FY4A’s Geostationary Interferometric Infrared Sounder (GIIRS) long-wave temperature sounding channels using the GRAPES global 4D-Var. Q. J. R. Meteorol. Soc. 146, 1459–1476 (2020).
Article Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention, 234-241 (Springer, Cham, 2015).
Eyre, J. A bias correction scheme for simulated TOVS brightness temperatures. ECMWF Tech. Memo. Shinfield Park, Reading, 35 (1992).
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
Elfwing, S., Uchibe, E. & Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018).
Article Google Scholar
Shi, W. et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1874–1883 (Las Vegas, NV, USA, 2016).
Paszke, A. et al. Automatic differentiation in pytorch. 31st Conference on Neural Information Processing Systems (NIPS), https://openreview.net/forum?id=BJJsrmfCZ (Long Beach, CA, USA, 2017).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7 (2019).
Loshchilov, I. & Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. International Conference on Learning Representations. https://openreview.net/forum?id=Skq89Scxx (2017).
Geer, A. J. Significance of changes in medium-range forecast scores. Tellus A: Dyn. Meteoro. Oceanogr. 68, 30229 (2016).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (U2442219, 42175082) and National Key R&D Program of China(2022YFC3004004). We are grateful to ECMWF for providing the ERA5 reanalysis dataset, the CMA Earth System Modeling and Prediction Centre for providing the AGRI data and AGRI CLM products, as well as the Satellite Application Facility on Numerical Weather Prediction of EUMET-SAT for providing the RTTOV.

Author information

These authors contributed equally: Xiaoze Xu, Xiuyu Sun.

Authors and Affiliations

School of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing, China
Xiaoze Xu & Zhiqiu Gao
Shanghai Academy of Artificial Intelligence for Science, Shanghai, China
Xiaoze Xu, Xiuyu Sun, Xiaohui Zhong, Lei Chen & Hao Li
Earth System Modeling and Prediction Centre, China Meteorological Administration, Beijing, China
Xiaoze Xu & Wei Han
State Key Laboratory of Severe Weather Meteorological Science and Technology (LASW), Beijing, China
Wei Han
Artificial Intelligence Innovation and Incubation Institute, Fudan University, Shanghai, China
Xiaohui Zhong & Hao Li

Authors

Xiaoze Xu
View author publications
Search author on:PubMed Google Scholar
Xiuyu Sun
View author publications
Search author on:PubMed Google Scholar
Wei Han
View author publications
Search author on:PubMed Google Scholar
Xiaohui Zhong
View author publications
Search author on:PubMed Google Scholar
Lei Chen
View author publications
Search author on:PubMed Google Scholar
Zhiqiu Gao
View author publications
Search author on:PubMed Google Scholar
Hao Li
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: W.H., H.L. and X.S. Supervision: H.L. and W.H. Methodology: H.L., W.H., X.S. and X.X. Training: X.S. and X.X. Evaluation: X.X. and X.S. Interpretation: X.X. and W.H. Model training environment: L.C. Writing–original draft: X.X., X.Z. and X.S. Writing–review and editing: X.S., X.Z., H.L., W.H., Z.G. and X.X.

Corresponding authors

Correspondence to Wei Han or Hao Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, X., Sun, X., Han, W. et al. FuXi-DA: a generalized deep learning data assimilation framework for assimilating satellite observations. npj Clim Atmos Sci 8, 156 (2025). https://doi.org/10.1038/s41612-025-01039-3

Download citation

Received: 05 November 2024
Accepted: 07 April 2025
Published: 26 April 2025
DOI: https://doi.org/10.1038/s41612-025-01039-3

This article is cited by

End-to-end data-driven weather prediction
- Anna Allen
- Stratis Markou
- Richard E. Turner
Nature (2025)
A data-to-forecast machine learning system for global weather
- Xiuyu Sun
- Xiaohui Zhong
- Yuan Qi
Nature Communications (2025)
Evaluating the Shanghai Typhoon Model against State-of-the-Art Machine-Learning Weather Prediction Models: A Case Study for Typhoon Danas (2025)
- Zeyi Niu
- Wei Huang
- Xu Zhang
Advances in Atmospheric Sciences (2025)
Modeling the formation of aerosols and their interactions with weather and climate: critical review and future perspectives
- Ying Xiong
- Qianqian Yang
- Meng Gao
Frontiers of Environmental Science & Engineering (2025)