Main

Atmospheric aerosols have a critical role in Earth’s climate system, affecting radiative forcing, cloud microphysics and atmospheric chemistry1,2. Owing to their diverse optical and microphysical properties, combined with complex chemical compositions, aerosols influence weather and climate in various ways11,12. Key components, such as black carbon (BC) and dust, show considerable variability in terms of radiative forcing, making aerosols a major source of uncertainty in climate change assessments13,14. In addition, the complex chemical reactivity and wide particle size ranges of aerosols can degrade air quality15,16, posing health risks that include respiratory, cardiovascular and neurological diseases17. Accurate forecasting of aerosol distributions and compositions is therefore essential for improving air-quality management, protecting public health and mitigating climate change.

However, aerosol forecasting presents markedly greater complexity and cost than weather forecasting owing to the need to account for diverse aerosol sources and types, intricate chemical reactions, physical processes, and multiscale interactions with weather systems3,4. These complexities result in nonlinear and highly variable processes for aerosol generation, transport, transformation and removal, contributing substantially to forecast uncertainty18. To enable short- to medium-term aerosol forecasting, traditional physics-based forecasting systems, such as the Copernicus Atmospheric Monitoring Service (CAMS) from the European Centre for Medium-Range Weather Forecasts5 and NASA’s Global Earth Observing System Forward Processing (GEOS-FP)10, couple numerical weather prediction (NWP) models with atmospheric chemical transport models. These systems must simultaneously resolve atmospheric dynamics and compute thousands of aerosol-related chemical reactions and microphysical interactions, further intensifying the already high computational cost of NWP19,20. Recent advances in machine learning have opened new avenues of investigation, leading researchers to explore advanced neural networks as complementary tools for NWP21,22,23,24,25 and its downstream tasks, such as oceanic variables26,27. These neural network models have shown considerable promise in enhancing computational efficiency and accuracy in weather forecasting; however, machine-learning research specifically targeting global aerosol forecasting remains notably underdeveloped. Although recent studies have begun applying deep learning to aerosol forecasting on both global and regional scales28,29, these efforts depend largely on NWP inputs and are often restricted to single aerosol metrics such as total aerosol optical depth (AOD). The operational integration of machine-learning models for simultaneous global-scale aerosol component and meteorological forecasting remains incomplete, particularly because of the challenges in representing coupled aerosol–weather processes, generalizing across diverse aerosol types and addressing computational constraints.

Here we present a machine-learning-driven Global Aerosol–Meteorology Forecasting System (AI-GAMFS), designed to rapidly simulate complex aerosol–meteorology interactions across spatial and temporal scales. AI-GAMFS was trained on 42 years of Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2)30 atmospheric reanalysis data. Evaluation with global Aerosol Robotic Network (AERONET)31 and Chinese Aerosol Remote Sensing Network (CARSNET)32 observations suggests improved performance of the operational AI-GAMFS, compared with the state-of-the-art CAMS and regional dust models, in forecasting AOD and dust components. Compared with GEOS-FP, it also provides improved global AOD forecasts, comparable dust forecasting skill and improved key surface aerosol component forecasts over the USA and China, with an order-of-magnitude reduction in computational cost.

AI-GAMFS

AI-GAMFS is designed to provide global 5-day aerosol–meteorology forecasts at approximately 50-km spatial resolution and 3-hourly temporal intervals (01:30, 04:30, 07:30, …, 22:30 UTC), forecasting AOD, the optical properties and surface concentrations of key aerosol components—including sulfate, dust, BC, organic carbon (OC) and sea salt (SS)—as well as surface and upper-level meteorological variables (Supplementary Table 1) that govern aerosol lifecycle dynamics. Its architecture comprises three core modules (Fig. 1a): (1) cube embedding, which extracts three-dimensional spatiotemporal features from the input feature matrix; (2) a vision transformer, which uses a multiheaded self-attention mechanism to process and understand complex relationships between features; and (3) cube unembedding, which reconstructs high-dimensional features back to the original spatial resolution using deconvolution and upsampling techniques. To ensure the accuracy and fidelity of the forecasts, skip connections are incorporated. Working synergistically, these modules accurately forecast the spatial fields of aerosol and meteorological states at the next time step, using the previous time step as input.

Fig. 1: Architecture of the machine-learning-driven AI-GAMFS.
Fig. 1: Architecture of the machine-learning-driven AI-GAMFS.
Full size image

a, The AI-GAMFS model consists of three primary components: cube embedding, a vision transformer and cube unembedding. T, initial time; Δt, forecast lead time; C1 and C2, channels for aerosol and meteorological variables, respectively; H and W, height and width of the embedded feature tensor, respectively; A, embedded feature tensor before downsampling; A′, feature tensor after upsampling, which has the same dimensions as A. b, The temporal aggregation strategy used in AI-GAMFS for relay forecasting at specified lead times, achieved by integrating four pretrained models—3-h, 6-h, 9-h and 12-h models—each trained with identical configurations.

We trained four base models separately with a forecast lead time of 3 h, 6 h, 9 h and 12 h. Each base model was trained for 80 epochs using the same framework and settings, containing approximately 1.2 billion parameters, and was trained for 10 days on 8 L40 graphics processing units (GPUs). To mitigate error accumulation owing to long-term iterations in a single model, a temporal aggregation strategy was used to perform relay forecasting with the four base models (Fig. 1b). Once pretraining and relay connection are completed, the final AI-GAMFS model generates 5-day operational forecasts in approximately 39 s on a single L40 GPU, using real-time GEOS-FP analysis fields as input. This represents computational speed that is approximately 360-times faster than that associated with conventional GEOS-FP forecasts (which require approximately 4–6 h).

Relay forecasting reduces accumulation errors

Building on this relay architecture designed to curb error growth, we proceeded to systematically evaluate and optimize its configuration. Using 4 pretrained base AI-GAMFS models with forecast lead times of 3 h, 6 h, 9 h and 12 h, we designed 4 progressive forecasting schemes to identify the optimal relay forecasting strategy: the 3-h single model, the 3- and 6-h relay, the 3-h, 6-h and 9-h relay, and the 3-h, 6-h, 9-h and 12-h relay. Extended Data Fig. 1a illustrates the frequency with which the 4 pretrained base models (with lead times of 3 h, 6 h, 9 h and 12 h) are invoked under these 4 forecasting strategies. Notably, for forecasts with a specific lead time, when at least two pretrained base models are used in the relay, we prioritize models with longer lead times, iteratively using their forecast results as inputs for the next forecast time step, thereby minimizing the number of iterations as much as possible.

We compared the global 5-day forecasting accuracy of AI-GAMFS—initialized by MERRA-2 reanalysis and run daily at 22:30 UTC—for all 12 aerosol variables using different relay forecasting strategies, with the 2022 MERRA-2 data (test set) as a baseline. The aerosol variables include AOD, total scattering AOD (TSAOD), sulfate, dust, BC, OC and SS AOD (SUAOD, DUAOD, BCAOD, OCAOD and SSAOD, respectively), and sulfate, dust, BC, OC and SS surface mass concentration (SUSMC, DUSMC, BCSMC, OCSMC and SSSMC, respectively). Extended Data Fig. 1b,c shows time series of the global spatial correlation coefficient (R) and latitude-weighted root-mean-square error (RMSE) for these aerosol variables, respectively. The results indicate that within a 24-h forecast horizon, the performance of the 3-h, 6-h, 9-h and 12-h relay model is similar to that of the 3-h single model and the other relay models. However, for forecast lead times beyond 24 h, the 3-h, 6-h, 9-h and 12-h relay model shows superior accuracy for nearly all aerosol variables, in terms of both R and RMSE. For example, for all aerosol variables at a 120-h lead time, the average RMSE value for the 3-h, 6-h, 9-h and 12-h relay model is typically 15.1%, 5.6% and 3.2% lower than that of the 3-h single model, the 3-h and 6-h relay model, and the 3-h, 6-h and 9-h relay model, respectively. This advantage is also evident in global forecasts for various meteorological variables (Extended Data Fig. 2). The use of 4 base models in the relay strategy generally yields results comparable to or slightly better than those from the 3-h and 6-h relay and the 3-h, 6-h and 9-h relay models, but that substantially outperform those of the 3-h single model. However, we note that while the aggregation strategy helps alleviate short- to medium-term error accumulation, the improvement tends to plateau as the number of base models increases. Therefore, we ultimately selected the 3-h, 6-h, 9-h and 12-h relay model strategy as the final AI-GAMFS model, which was used in all subsequent evaluations and analyses.

Enhanced global aerosol forecasts

Globally, AOD is one of the most widely observed atmospheric aerosol parameters and is used extensively in climate change research, air-quality monitoring and environmental assessments. As a key component of AOD, DUAOD serves as an essential metric for monitoring the global dust cycle and its impacts. This study provided comprehensive evaluation of the 5-day, 3-hourly global AOD and DUAOD forecasts generated by AI-GAMFS, initialized daily at 22:30 UTC, utilizing MERRA-2 evaluation data from 2023. The performance of operational AI-GAMFS, initialized by GEOS-FP analyses, is compared with that of CAMS (run daily at 00:00 UTC), one of the leading global aerosol forecast models, as illustrated in Fig. 2a. Meanwhile, forecasts from AI-GAMFS initialized with MERRA-2 reanalysis are also presented to evaluate the impact of initial conditions. Over the 0–120-h forecast period, operational AI-GAMFS consistently outperforms CAMS in forecasting both AOD and DUAOD, as measured by both R and RMSE. Specifically, AI-GAMFS shows a clear advantage during the 0–2-day period, improving the average R value by 11.5% and 13.8%, and reducing the average RMSE by 22.3% and 37.3%, for AOD and DUAOD relative to CAMS, respectively. However, as the forecast lead time increases, the advantage of operational AI-GAMFS diminishes. Nevertheless, at a 120-h lead time, operational AI-GAMFS still produces a lower RMSE than CAMS, with reductions of approximately 11.3% and 25.2% for AOD and DUAOD, respectively. Our analysis also shows that the impact of initial conditions on AI-GAMFS is most pronounced within the first 48 h, with little to no effect thereafter.

Fig. 2: Superior performance of AI-GAMFS in global AOD and DUAOD forecasting throughout 2023.
Fig. 2: Superior performance of AI-GAMFS in global AOD and DUAOD forecasting throughout 2023.
Full size image

a,b, Global 5-day forecast accuracy for AOD and DUAOD at 3-hourly intervals from AI-GAMFS (initialized by MERRA-2 reanalysis), operational AI-GAMFS (initialized by GEOS-FP analyses) and CAMS. Performance was evaluated against MERRA-2 (a) and AERONET observations (b) using R and RMSE as metrics. For AERONET, evaluation metrics for each step were calculated by aggregating all matched samples globally. c, Spatial distribution of the average RMSE for each step of the 5-day AOD and DUAOD forecasts (a total of 40 steps) from operational AI-GAMFS, alongside the RMSE difference between CAMS and operational AI-GAMFS. Percentages in the lower-left corners of the RMSE difference panels indicate the proportion of sites where operational AI-GAMFS showed lower RMSEs than those of CAMS.

Given the differences in initial conditions between AI-GAMFS and CAMS, AI-GAMFS might benefit from using MERRA-2 as the reference data for evaluation. To ensure a fairer comparison, we additionally used Level-2.0 instantaneous global aerosol observations from AERONET in 2023 to evaluate the 5-day, 3-hourly AOD and DUAOD forecast performance of both operational AI-GAMFS and CAMS, for which DUAOD was evaluated against the AERONET coarse-mode AOD (AODc). Figure 2b presents time series of the R and RMSE values, calculated from all matched global samples for 2023, at each forecast lead time. Operational AI-GAMFS shows high forecasting skill against AERONET observations, albeit with a predictable degradation over time. Specifically, the model maintains reasonable accuracy throughout the forecast period (days 1–5), with mean AOD R ranging from 0.57 to 0.78 (RMSE 0.12 to 0.15) and DUAOD R ranging from 0.65 to 0.73 (RMSE 0.04 to 0.06). Consistent with the evaluation using MERRA-2 as the reference, operational AI-GAMFS also provides more accurate AOD and DUAOD forecasts than CAMS. Statistically, across all 40 forecast steps (3-h intervals), operational AI-GAMFS outperforms CAMS at 31 and 36 steps for R, and at 37 and 40 steps for RMSE, for AOD and DUAOD, respectively. Importantly, this consistent accuracy between the MERRA-2-driven and operational (GEOS-FP-driven) configurations affirms the reliability and effectiveness of AI-GAMFS in a real-world operational environment.

Figure 2c further illustrates the spatial distribution of the average RMSE for each step of the 5-day AOD and DUAOD forecasts (a total of 40 steps) from operational AI-GAMFS at each AERONET site, alongside the RMSE difference between CAMS and operational AI-GAMFS. Overall, for AOD, operational AI-GAMFS shows lower RMSE values than those of CAMS at 61.6% of AERONET sites, located primarily in the USA, Europe, Africa and Southeast Asia. Given that China is one of the regions with considerable aerosol loading, yet critically lacks AERONET coverage, we conducted a complementary evaluation using continuous AOD observations from 26 CARSNET sites in 2023 (Extended Data Fig. 3a,b). The results show that operational AI-GAMFS provides acceptable forecasting skill over China, with the mean R ranging from 0.44 to 0.65 and the mean RMSE ranging from 0.26 to 0.34 throughout the forecast period (days 1–5). This evaluation also confirms the robust superiority of operational AI-GAMFS compared with CAMS, achieving a higher R at 63.3% of the forecast steps and a lower RMSE at 61.5% of the sites. Moreover, for global DUAOD forecasting, operational AI-GAMFS shows a clear advantage over CAMS, achieving a lower RMSE at 86.0% of the sites worldwide (Fig. 2c). These results robustly demonstrate the superior performance of operational AI-GAMFS compared with CAMS in terms of global AOD and DUAOD forecasting.

Regional dust storms

East Asia is one of the regions affected most severely by dust storms, highlighting the critical need for accurate forecasts of such events. The operational AI-GAMFS model forecasts both DUAOD and DUSMC, thereby presenting the opportunity to assess its performance relative to several well-established physics-based dust forecasting models. For this evaluation, we used East Asia dust forecast products for 2023 derived from forecasts of CAMS and four physics-based dust forecasting models deployed at the Sand and Dust Storm Warning Advisory and Assessment System (SDS-WAS) Asian Regional Centre. These models include SILAM from the Finnish Meteorological Institute (FMI-SILAM)6, CUACE/Dust from the China Meteorological Administration (CMA-CUACE/Dust)7, MASINGAR from the Japan Meteorological Agency (JMA-MASINGAR)8 and ADAM3 from the Korea Meteorological Agency (KMA-ADAM3)9. We evaluated the 5-day forecast accuracy of DUAOD and DUSMC from operational AI-GAMFS (initialized daily at 22:30 UTC), CAMS, FMI-SILAM, CMA-CUACE/Dust and KMA-ADAM3 (initialized daily at 00:00 UTC), relative to MERRA-2 data from 2023. Because JMA-MASINGAR initializes daily at 12:00 UTC and provides 3-day forecasts, we adjusted the initialization time for AI-GAMFS to 10:30 UTC for comparison. In addition, owing to differences in forecast coverage areas and temporal resolutions across models, we conducted the evaluation only for the overlapping East Asia region (Extended Data Fig. 4a) and applied temporal interpolation to the different models.

Extended Data Fig. 4b,c shows the time series of spatial R and latitude-weighted RMSE for these models. Consistent with its global performance, operational AI-GAMFS notably outperforms the 5 physics-based dust forecast models over East Asia across all forecast periods at 72 h (JMA-MASINGAR; Extended Data Fig. 4c) and 120 h (other 4 models; Extended Data Fig. 4b). Specifically, the spatial R for DUAOD at a 72-h lead time is improved by 12.0%, 21.4%, 34.2%, 105.1% and 199.7% relative to FMI-SILAM, CAMS, JMA-MASINGAR, CMA-CUACE/Dust and KMA-ADAM3, respectively. At the 120-h lead time (that is, 5 days), the improvement of operational AI-GAMFS is 4.9%, 16.9%, 90.4% and 133.5% relative to FMI-SILAM, CAMS, CMA-CUACE/Dust and KMA-ADAM3, respectively. For DUSMC, operational AI-GAMFS has a latitude-weighted RMSE of 82.5 μg m−3 at a 72-h lead time, which is approximately 34.4%, 42.7% and 60.3% lower than that of FMI-SILAM, KMA-ADAM3 and CMA-CUACE, respectively; likewise, a more substantial reduction of approximately 74.1% is observed against JMA-MASINGAR. This regional advantage of operational AI-GAMFS over physics-based dust models is further confirmed by 1-year records of AODc from the AERONET Beijing-CAMS site and AOD from four CARSNET sites in the northwestern desert region of China (Supplementary Figs. 1 and 2a).

Taking the mega dust storm in northern China in April 2023 as an example, we found that operational AI-GAMFS can reliably reproduce the entire dust transport process, including the affected areas and the intensity (Extended Data Fig. 5). This is further confirmed by better statistical metrics compared with those of other models. More importantly, operational AI-GAMFS not only forecasts dust transport paths within 1–2 days, but also forecasts enhanced dust emissions in the Gobi Desert up to 3–4 days in advance. Typically, it is a challenge for regional dust forecasting models to capture such features.

Aerosol component forecasting

In addition to forecasting AOD and dust-related properties, AI-GAMFS simultaneously forecasts TSAOD, the optical properties of other aerosol components (that is, sulfates, BC, OC and SS) and their surface concentrations. These component forecasts enable precise assessments of their specific impacts on climate, air quality and public health. We used conventional GEOS-FP as a reference baseline because it represents state-of-the-art atmospheric aerosol component forecasting and provides output configurations fully consistent with operational AI-GAMFS. Using MERRA-2 data collected from July to August 2024 as reference, we evaluate accuracy using the spatial R and latitude-weighted RMSE values, as shown in Fig. 3a,b. Additional metrics for surface and upper-level meteorological variables are provided in Extended Data Fig. 6.

Fig. 3: Comparison of global aerosol component forecast accuracy between operational AI-GAMFS and GEOS-FP.
Fig. 3: Comparison of global aerosol component forecast accuracy between operational AI-GAMFS and GEOS-FP.
Full size image

a,b, Scorecards of spatial R (a) and latitude-weighted RMSE (b) for 5-day, 3-hourly global forecasts of 12 aerosol variables from operational AI-GAMFS and GEOS-FP, evaluated against MERRA-2 reanalysis during July–August 2024. c, The 5-day forecast accuracy (measured by RMSE) of operational AI-GAMFS and GEOS-FP, evaluated against independent aerosol observations from July–August 2024: AOD and DUAOD were evaluated against global AERONET observations, and BCSMC, OCSMC and SUSMC were evaluated against the IMPROVE network observations in the USA. For comparison with the daily IMPROVE data, 3-hourly forecasts were averaged to daily means.

The scorecards indicate that operational AI-GAMFS delivers exceptional forecasting performance across all 12 aerosol variables. For the first 1–3 days, operational AI-GAMFS outperforms GEOS-FP in terms of all variables and at all lead times, except for BCSMC and OCSMC at specific time points (based on the R value). At longer lead times, AI-GAMFS consistently outperforms GEOS-FP, except for two SS-related variables: SSAOD and SSSMC. Aerosol component forecasts are highly sensitive to the accuracy of weather forecasts. Although operational AI-GAMFS does not surpass GEOS-FP in terms of the accuracy of certain meteorological variables, such as wind speed, sea-level pressure and temperature, improvements in the forecast accuracy of key variables—such as specific humidity and precipitation—that influence aerosol emissions, transformation and deposition, enable AI-GAMFS to enhance its aerosol simulations (Extended Data Fig. 6). However, forecast accuracy for wind speed declines beyond 2 days, which negatively impacts the forecast of SS aerosols.

Independent evaluation beyond MERRA-2 was conducted using global aerosol observations from July–August 2024. Using ground-based observations of AOD and AODc from AERONET, AOD from CARSNET, and BCSMC, OCSMC and SUSMC from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network33, the Environmental Protection Agency Chemical Speciation Network (EPA-CSN)33 and the China Atmospheric Watch Network (CAWNET)34, we compared aerosol component forecast performance between operational AI-GAMFS and GEOS-FP. Globally, operational AI-GAMFS achieves mean RMSE values of 0.11–0.16 for AOD and 0.03–0.05 for DUAOD across forecast days 1–5 (Fig. 3c). It outperforms GEOS-FP for AOD, achieving lower RMSE in 36 out of 40 forecast steps against AERONET. Over China, as evaluated by 24 CARSNET sites, the model shows RMSE values between 0.33 and 0.35 (Extended Data Fig. 3c). Despite these regionally elevated errors, it maintains an advantage over GEOS-FP, achieving lower RMSE in 20 out of 25 steps and at 58.3% of sites (Extended Data Fig. 3c,d). In contrast, the performance of global DUAOD forecasts is overall comparable to that of GEOS-FP during days 1–4, albeit weaker on day 5 (Fig. 3c). Furthermore, AOD observations from two independent CARSNET sites in dust-dominated regions provide additional evidence for the improved regional dust forecasting by operational AI-GAMFS in China (Supplementary Fig. 2b).

For surface aerosol components evaluated against the IMPROVE network across the USA, operational AI-GAMFS consistently shows lower RMSE in daily forecasts over the 5-day period compared with GEOS-FP. The RMSE ranges for BCSMC, OCSMC and SUSMC were 0.45–0.51 μg m−3, 4.6–6.7 μg m−3 and 1.0–1.3 μg m−3, representing reductions of approximately 64.4–86.2%, 74.5–88.3% and 42.2–61.0%, respectively (Fig. 3c). Improvements in BCSMC and OCSMC are most pronounced across the western USA, influenced by frequent wildfires, whereas SUSMC improvements are clustered in the eastern USA, which is strongly influenced by anthropogenic emissions (Supplementary Figs. 35). Similar advantages were confirmed against the EPA-CSN network (Supplementary Figs. 68), indicating the overall superiority of operational AI-GAMFS in forecasting wildfire-related BC and OC, as well as anthropogenic sulfate. In China, CAWNET observations reveal that operational AI-GAMFS outperforms GEOS-FP in BCSMC forecasts across all 5 days, with a higher daily R at 52.5–75.0% of sites and a lower RMSE at 62.5–72.5% of sites (Supplementary Figs. 9 and 12). Comparable superiority is also observed for OCSMC and SUSMC (Supplementary Figs. 1012).

Tracking pollutant-type events

A distinguishing feature of data-driven forecasting is its ability to rapidly track and segregate aerosol pollution types, closely mirroring real-world patterns, at a fraction of the computational cost and time required by traditional physics-based aerosol forecasting models. Figure 4 illustrates a case study with a 3-day lead time, highlighting the performance of operational AI-GAMFS in forecasting global AOD and its 5 key components: SUAOD, DUAOD, BCAOD, OCAOD and SSAOD. A more comprehensive evaluation is provided in Extended Data Figs. 7 and 8, which include maps of the spatiotemporal evolution of additional aerosol and meteorological variables.

Fig. 4: Case study of operational medium-range global aerosol optical component forecasts.
Fig. 4: Case study of operational medium-range global aerosol optical component forecasts.
Full size image

ac, The 3-day lead-time global forecasts for AOD and its 5 key components—SUAOD, DUAOD, BCAOD, OCAOD and SSAOD—from operational AI-GAMFS (a), GEOS-FP (b) and MERRA-2 (c), initialized at 22:30 UTC on 26 July 2024. d,e, Forecasting errors of operational AI-GAMFS (d) and GEOS-FP (e) relative to MERRA-2 reanalysis data. Overall accuracy metrics (that is, spatial R and RMSE values) for operational AI-GAMFS and GEOS-FP are also indicated in the lower-left corner of each panel in d and e.

In general, operational AI-GAMFS produces forecasts that align more closely with MERRA-2 than with GEOS-FP, effectively mitigating spatial smoothing artefacts commonly introduced by longer forecast lead times. In terms of AOD forecasting, it demonstrates superior R and notably lower RMSE values compared with those of GEOS-FP. This improvement is consistent across forecasts of various aerosol optical components and surface concentrations (Extended Data Fig. 7). Overall, the exceptional performance of operational AI-GAMFS is attributable largely to its precise forecasting of key meteorological variables (Extended Data Fig. 8).

Saharan dust and Central African wildfires are long-established sources of global aerosol loading, and their accurate forecasting remains a major challenge. We further evaluate the performance of operational AI-GAMFS in forecasting regional dust and BC at half-day intervals, as shown in Extended Data Figs. 9 and 10. Compared with GEOS-FP, operational AI-GAMFS markedly improves the simulation of Saharan dust and Central African wildfire aerosols, as evidenced by enhanced R and reduced RMSE values. Furthermore, it successfully captures the trans-Atlantic transport of dust and smoke aerosols, underscoring its robustness in forecasting long-range aerosol transport. Notably, it also captures the spatiotemporal evolution of smoke aerosols in South America, in addition to those in Africa (Extended Data Fig. 10).

Discussion

We introduce AI-GAMFS, a data-driven system for operational global aerosol forecasting. By extracting valuable insights from 42 years of coupled aerosol–meteorology reanalysis data, it demonstrates the potential to propel operational weather forecasting towards more complex forecasts in environmental meteorology. Our results show that operational AI-GAMFS outperforms several physics-based global and regional aerosol forecasting systems, delivering improved deterministic forecasts for AOD and multiple aerosol components. Unlike physics-based models, such as GEOS-FP and CAMS running on 6- or 12-hourly cycles, AI-GAMFS generates forecasts at 3-hourly intervals, thereby enhancing timeliness and better capturing the spatiotemporal variations of aerosols.

Although operational AI-GAMFS shows considerable potential in refining global aerosol forecasting, further enhancements are attainable through several strategic improvements. First, training strategies could be strengthened by incorporating multi-time-step rolling inputs23 to improve temporal coherence and autoregressive performance. Future models should also integrate dynamic anthropogenic emission inventories25 and time-varying background fields to better capture long-term aerosol trends and mitigate the impact of shifting anthropogenic activities. Second, the current training dataset of approximately 120,000 time steps is notably smaller than that of other data-driven weather models, such as Pangu-Weather (approximately 340,000 samples)22. This limited data volume probably contributes to the model’s lower accuracy in forecasting key meteorological variables, including wind speed and temperature, compared with GEOS-FP, thereby affecting the accuracy of SS forecasts. Expanding the training dataset with higher temporal resolution records, where computationally feasible, will be critical to improving skill across aerosol components. In addition, owing to limitations in accessing GEOS-FP historical forecast data and the scarcity of surface aerosol component observations, accumulating longer time series of both observational and forecast data will be necessary to enable robust cross-seasonal evaluations of operational AI-GAMFS’s stability and forecasting capabilities. Finally, a fundamental direction for future development lies in embedding physicochemical constraints and atmospheric dynamics as inductive biases into the model architecture—shifting from purely data-driven learning towards a hybrid physics–machine-learning framework that ensures forecasts remain inherently consistent with Earth system principles.

Methods

Datasets details

MERRA-2 reanalysis

MERRA-2, developed by NASA’s Global Modeling and Assimilation Office (GMAO), is a comprehensive atmospheric reanalysis dataset that spans global atmospheric and climate conditions from 1980 to the present30,35. By assimilating satellite, ground-based and additional observational data into the Goddard Earth Observing System, version 5 (GEOS-5) Earth system model, MERRA-2 provides high-precision meteorological parameters and multilayer atmospheric profiles. With its extensive temporal coverage, high spatial resolution (approximately 50 km) and robust consistency, MERRA-2 has become an indispensable tool for climate change research, air-quality monitoring and environmental policymaking. A defining innovation of MERRA-2 is its aerosol dataset, which integrates joint meteorological and aerosol data assimilation. To our knowledge, this marks the first time aerosol radiative effects have been incorporated directly into the atmospheric model30,35, enhancing the fidelity of aerosol–meteorology interactions. MERRA-2 provides high-resolution data across multiple aerosol components—dust, sulfate, BC, OC and SS—with precise parameters such as spatial distribution, optical depth, concentration and radiative properties.

We used three subsets of the MERRA-2 time-averaged products—aerosol variables (tavg1_2d_aer_Nx), surface atmospheric variables (tavg1_2d_flx_Nx) and upper-air atmospheric variables (tavg3_3d_asm_Nv)—to train, test and evaluate AI-GAMFS, covering 44 years of data from 1980 to 2023. The dataset has spatial resolution of 0.5° × 0.625° (361 × 576 latitude–longitude grid points). For each subset, only data from timestamps corresponding to the 3-hourly overlapping periods (01:30, 04:30, 07:30, …, 22:30 UTC) were used. We focused on forecasting 12 aerosol variables: AOD, TSAOD, SUAOD, DUAOD, BCAOD, OCAOD, SSAOD, SUSMC, DUSMC, BCSMC, OCSMC and SSSMC. All aerosol optical variables are available at the wavelength of 550 nm. In addition, we forecasted 6 surface atmospheric variables and 4 upper-air atmospheric variables at 9 model levels (72, 68, 63, 60, 56, 53, 51, 48 and 45, corresponding to pressure levels of 985 hPa, 925 hPa, 850 hPa, 800 hPa, 700 hPa, 600 hPa, 525 hPa, 413 hPa and 288 hPa, respectively). Specifically, the six surface atmospheric variables are: surface specific humidity (QLML), surface air temperature (TLML), surface eastwards wind (ULML), surface northwards wind (VLML), sea-level pressure (SLP) and total precipitation (PRECTOT). The four upper-air atmospheric variables are: specific humidity (QV), air temperature (T), eastwards wind (U) and northwards wind (V). In total, we forecast and evaluated 54 variables. Detailed information on the MERRA-2 variables used in this study is provided in Supplementary Table 1.

GEOS-FP analyses and forecasts

GEOS-FP is a near-real-time analysis and forecasting system developed by the GMAO10. This system provides global meteorological and aerosol analyses (that is, assimilation fields) and generates 5-day (or 10-day) global forecasts, initialized daily at 00:00 UTC (12:00 UTC). It has grid resolution of approximately 25 km (latitude 0.25°, longitude 0.3125°). GEOS-FP uses the same model configuration as MERRA-230, including the simulation of dust, sulfate, BC, OC and SS via the Goddard Chemistry Aerosol Radiation and Transport model36,37. In addition, GEOS-FP incorporates the assimilation of satellite-based bias-corrected AOD data38.

We used three subsets from the GEOS-FP time-averaged analysis and forecast products (see also Supplementary Table 1), which are consistent with the nomenclature of the MERRA-2 data, to conduct a 5-day comparison experiment between AI-GAMFS historical deterministic and operational forecasts. These subsets contain 54 target variables that fully align with the inputs and outputs of the AI-GAMFS forecasts. To evaluate the forecast performance of AI-GAMFS relative to other global and regional aerosol forecasting models, we used the historical GEOS-FP analyses and MERRA-2 reanalysis data from 22:30 UTC each day in 2023 to drive AI-GAMFS and generate daily 5-day forecasts for the entire year of 2023. In contrast, collecting historical GEOS-FP forecast data is more challenging because the GMAO archives only the most recent 2 weeks of forecast data. Consequently, we collected only the GEOS-FP analyses at 00:00 UTC and 5-day forecast data (initialized daily at 00:00 UTC) from July to August 2024 for the near-real-time operational comparison between AI-GAMFS and GEOS-FP. To drive AI-GAMFS and conduct the comparative analysis, we used bilinear interpolation to resample the GEOS-FP analysis and forecast data to match the spatial resolution of 0.5° × 0.625°.

CAMS aerosol forecasts

CAMS, developed by the European Centre for Medium-Range Weather Forecasts, is one of the most advanced global aerosol forecasting systems5. It provides twice-daily forecasts of global atmospheric composition, including 5-day forecasts of AOD and DUAOD. Using data assimilation techniques, CAMS integrates prior forecasts with current satellite observations to derive optimal initial conditions. It then applies a numerical atmospheric model based on physical and chemical principles to forecast the evolution of aerosols and other atmospheric compositions over the next 5 days5,39. The spatial resolution of the CAMS aerosol forecast product at a single level is 0.4° × 0.4°, with 1-h temporal resolution.

In this study, CAMS served as the baseline for global AOD and DUAOD forecasts based on a physical model, facilitating comprehensive comparison with AI-GAMFS. We used the 5-day global AOD and DUAOD forecasts for the entire year of 2023, initialized daily at 00:00 UTC. To align with AI-GAMFS for comparison or analysis, we resampled the CAMS forecast data to match the spatial resolution of 0.5° × 0.625° and the temporal resolution of 3 h, using time interpolation and bilinear interpolation.

Physical-based dust forecasts

In this study, we used the 2023 dust forecast products from five physical-based dust forecasting models developed by various institutions and deployed in the SDS-WAS Asian Regional Centre. These products include two global models: CAMS and FMI-SILAM6, with FMI-SILAM having 1-h temporal resolution and spatial resolution of 0.2° × 0.2°. In addition, we analysed 3 regional models: CMA-CUACE/Dust7, with 3-h temporal resolution and a spatial resolution of 0.5° × 0.5°; JMA-MASINGAR8, with 1-h temporal resolution and a spatial resolution of 0.5° × 0.5°; and KMA-ADAM39, with 3-h temporal resolution and a spatial resolution of 0.5° × 0.5° (Supplementary Table 1). The JMA-MASINGAR model provides forecasts with a 3-day lead time, and all other models provide forecasts with a 5-day lead time. Detailed descriptions of these models can be found in their respective technical documentation5,6,7,8,9.

Owing to differences in initialization times and dust output variables across the models, we used DUAOD forecast outputs from CAMS, FMI-SILAM, CMA-CUACE/Dust, JMA-MASINGAR and KMA-ADAM3, and for DUSMC, we utilized outputs from all models except CAMS. Notably, except for JMA-MASINGAR, which initializes at 12:00 UTC, all other forecast products begin at 00:00 UTC. To facilitate comparison, we unified the spatial and temporal resolutions of all model outputs to match the AI-GAMFS spatial resolution of 0.5° × 0.625° and 3-h temporal resolution.

AERONET and CARSNET measurements

AERONET is a global aerosol observation network that provides high-quality ground-based measurements of aerosol optical properties31. The network consists of numerous automated stations equipped with sun photometers to monitor AOD and other aerosol parameters in real time. AERONET data are widely regarded as the ‘gold standard’ in atmospheric aerosol observations, serving as high-precision references for climate studies, air-quality monitoring and satellite remote-sensing validation. In this study, we used instantaneous AOD and AODc data (Version 3.0, Level 2.0)40 from all available AERONET sites worldwide during 2023 and July–August 2024. Owing to the lack of a direct method to derive DUAOD, we used AODc as a proxy41 to evaluate the DUAOD forecasts. Furthermore, AERONET does not provide AOD or AODc measurements at 550 nm; therefore, we derived AODc at 550 nm from AODc at 500 nm using the ångström exponent. We used the following quadratic polynomial interpolation method42,43 to convert AOD observations at 4 adjacent wavelengths (440 nm, 500 nm, 675 nm and 870 nm) into AOD values at 550 nm:

$$\mathrm{ln}({\tau }_{\lambda })={a}_{0}+{a}_{1}\mathrm{ln}(\lambda )+{a}_{2}{[\mathrm{ln}(\lambda )]}^{2}$$
(1)

where a0, a1 and a2 represent fitting coefficients, and τλ denotes the AOD value at the respective wavelength λ.

To complement the sparse distribution of AERONET sites across China, we incorporated cloud-screened instantaneous AOD observations at 550 nm (Level 2.0) from CARSNET during 2023 and July–August 2024 into the regional evaluation. CARSNET, which is a ground-based aerosol monitoring network established by the China Meteorological Administration in 200232, uses a systematic calibration protocol: field instruments are calibrated against CARSNET reference standards that are themselves regularly calibrated in coordination with the AERONET programme32,44. Consequently, CARSNET provides AOD data with accuracy comparable to that of AERONET, showing an estimated uncertainty of 0.01–0.02 (ref. 32). To ensure the accuracy of the evaluation, we averaged the AERONET or CARSNET instantaneous observations within a half-hour window before and after the forecast lead time, which served as the reference truth.

In situ aerosol component measurements

To evaluate the forecast accuracy of operational AI-GAMFS for surface aerosol components against GEOS-FP, we collected in situ measurements of aerosol chemical components over the USA and China during July–August 2024. For the USA, daily data on BCSMC, OCSMC and SUSMC were obtained from the IMPROVE network33, with additional daily OCSMC and SUSMC data sourced from the EPA-CSN network33. All datasets were screened using available data-quality flags. For China, we used quality-controlled data from the CAWNET network34, including hourly BCSMC and daily OCSMC and SUSMC measurements.

AI-GAMFS details

As illustrated in Fig. 1a, the AI-GAMFS architecture consists of three primary modules: cube embedding, a vision transformer and cube unembedding. The base model of AI-GAMFS is an autoregressive model that uses the spatial feature tensor at the previous time step (Xtn) as input to forecast the spatial feature tensor at the next time step (Xt). Here t − n and t represent the previous and upcoming time steps, respectively. The base model considers time steps of 3 h, 6 h, 9 h and 12 h. Using the output of the base model as input, AI-GAMFS can generate forecasts for different lead times. For detailed descriptions of the modelling process for each module and the sensitivity analysis of key hyperparameters, see Supplementary Note 1 and Supplementary Fig. 15.

Training strategy

We utilized the MERRA-2 reanalysis with 3-h temporal resolution to train the AI-GAMFS model. Data from 1980 to 2021 were used for training, data from 2022 served as the test set and data from 2023 were used for validation. All input variables, except for time features, were standardized before being processed by the embedding layer, and the output from the unembedding layer was unstandardized to generate the final forecasts. The model uses a rolling training approach, where pairs of samples from two consecutive time points (Xtn and Xt) are fed iteratively into the model for training.

For the standardized samples, the mean absolute error was used as the loss function:

$${L}_{1}=\frac{1}{C\times H\times W}\mathop{\sum }\limits_{c=1}^{C}\mathop{\sum }\limits_{i=1}^{H}\mathop{\sum }\limits_{j=1}^{W}|{\hat{X}}_{c,i,j}^{t}-{X}_{c,i,j}^{t}|$$
(2)

where C, H and W denote the number of variables, the latitudinal grid points and the longitudinal grid points, respectively; c, i and j are the indices for variables, latitude and longitude coordinates, respectively; and \({X}_{c,i,j}^{t}\) and \({\hat{X}}_{c,i,j}^{t}\) represent the ‘ground truth’ (that is, MERRA-2) and the forecasted value at the specified forecasting time, respectively.

The AI-GAMFS framework was implemented on the PyTorch platform. Each model, corresponding to a specified lead time and containing approximately 1.2 billion parameters, was trained on a server equipped with 8 L40 GPUs for 80 epochs (approximately 10 days). We used the Adam optimizer with \({\beta }_{1}\) = 0.9 and \({\beta }_{2}\) = 0.999, and an initial learning rate of 3 × 10−4, which was decayed using a cosine annealing schedule to 0.001 of its initial value. Training was conducted in 32-bit floating-point precision with a dropout rate of 0.15 to mitigate overfitting.

To evaluate the temporal robustness of AI-GAMFS against potential long-term aerosol trends within MERRA-2, we conducted a stratified cross-validation experiment. The training set (1980–2021) was partitioned into 7 contiguous 6-year subsets. The model was iteratively trained on six out of seven subsets and validated on the remaining withheld subset (Supplementary Fig. 13). Results from all seven validations were compared with the model’s performance on the 2022 test set, which was trained on data for the full 1980–2021 period. The overall consistent performance observed across all validations demonstrates that AI-GAMFS captures underlying evolution patterns rather than temporal artefacts, ensuring its reliability for extrapolative forecasting (Supplementary Fig. 14).

Forecasting strategy

Similar to physics-based forecasting models, we observed that forecast errors in deep-learning models accumulate and amplify as the number of rolling iterations increases. Inspired by the temporal aggregation method from Pangu-Weather22, we adopted a relay forecasting strategy that reduces the number of model iterations without compromising the forecast time resolution. Using the same modelling framework and configurations, we trained 4 pretrained AI-GAMFS models with lead times of 3 h, 6 h, 9 h and 12 h, referred to as the 3-h, 6-h, 9-h and 12-h models, respectively. For forecasts with specific lead times, we prioritize the 12-h model and combine it with shorter timescale models in a relay fashion (Fig. 1b and Extended Data Fig. 1). As an example, for a lead time of 54 h, the 12-h forecast model is first invoked 4 times, followed by a single invocation of the 6-h forecast model (Extended Data Fig. 1a). Although this strategy sacrifices some computational efficiency, it takes advantage of the high-speed capabilities of GPUs, enabling the model to produce a 5-day forecast in approximately 39 s on a single L40 GPU.

Evaluation experiment

To rigorously evaluate the forecasting capabilities of AI-GAMFS, we conducted a series of evaluation experiments, using MERRA-2 reanalysis and observational data as reference baselines.

AI-GAMFS relay forecast evaluation

We compared 4 AI-GAMFS model configurations on the 2022 test set, encompassing all 54 aerosol and meteorological variables. These configurations included: a 3-h single model, a 3-h and 6-h relay model, a 3-h, 6-h and 9-h relay model, and a 3-h, 6-h, 9-h and 12-h relay model. This evaluation provides insight into the optimal relay configurations for enhanced predictive performance.

AI-GAMFS versus regional dust forecasting models

We evaluated AI-GAMFS forecasts against five physics-based dust forecasting models across East Asia, using the 2023 validation dataset with MERRA-2 as the baseline. The models included in this comparison—CAMS, CMA-CUACE/Dust, FMI-SILAM, JMA-MASINGAR and KMA-ADAM3—are either specialized dust forecasting models or aerosol models with dust-specific outputs. The evaluation focused on two critical parameters: DUAOD and DUSMC, which allowed us to evaluate AI-GAMFS’s accuracy and reliability in forecasting dust storm events. We also used the full year of AODc observations for 2023 from AERONET at the Beijing-CAMS site for independent evaluation.

AI-GAMFS versus CAMS in global AOD and DUAOD forecasts

We conducted a spatial comparison of AI-GAMFS and CAMS in terms of their 5-day AOD and DUAOD forecasts on a global scale in 2023, using MERRA-2 as the baseline. In addition, the forecasts from AI-GAMFS and CAMS were further evaluated against both global AERONET observations and CARSNET observations from China throughout 2023.

Operational performance of AI-GAMFS versus GEOS-FP

AI-GAMFS is designed for real-time operational forecasting and utilizes GEOS-FP real-time analyses to generate global 5-day aerosol–meteorology forecasts. To evaluate its operational forecasting capabilities for various aerosol components and meteorological variables, we analysed GEOS-FP forecast outputs for July and August 2024. A detailed comparative assessment of AI-GAMFS and GEOS-FP was performed using MERRA-2 as the reference baseline, focusing on all 54 target aerosol and meteorological variables. In addition, the aerosol component forecasts from AI-GAMFS and GEOS-FP were evaluated further against AERONET, CARSNET, IMPROVE, EPA-CSN and CAWNET observations.

Evaluation metrics

For the site-scale evaluation, using independent observations as the reference baseline, we used two metrics: RMSE and Pearson’s R. For the spatial evaluation, using MERRA-2 as the reference baseline, we used two metrics: latitude-weighted RMSE and spatial R, defined as follows:

$$\mathrm{RMSE}(c,t)=\sqrt{\frac{\mathop{\sum }\limits_{i=1}^{{N}_{\mathrm{lat}}}\mathop{\sum }\limits_{j=1}^{{N}_{\mathrm{lon}}}{w}_{i}{({\hat{X}}_{c,i,j}^{t}-{X}_{c,i,j}^{t})}^{2}}{{N}_{\mathrm{lat}}\times {N}_{\mathrm{lon}}}}$$
(3)
$$R(c,t)=\frac{\mathop{\sum }\limits_{i=1}^{{N}_{\mathrm{lat}}}\mathop{\sum }\limits_{j=1}^{{N}_{\mathrm{lon}}}({\hat{X}}_{c,i,j}^{t}-{\overline{\hat{X}}}_{c,i,j}^{t})({X}_{c,i,j}^{t}-{\bar{X}}_{c,i,j}^{t})}{\sqrt{\mathop{\sum }\limits_{i=1}^{{N}_{\mathrm{lat}}}\mathop{\sum }\limits_{j=1}^{{N}_{\mathrm{lon}}}{({\hat{X}}_{c,i,j}^{t}-{\overline{\hat{X}}}_{c,i,j}^{t})}^{2}}\times \sqrt{\mathop{\sum }\limits_{i=1}^{{N}_{\mathrm{lat}}}\mathop{\sum }\limits_{j=1}^{{N}_{\mathrm{lon}}}{({X}_{c,i,j}^{t}-{\bar{X}}_{c,i,j}^{t})}^{2}}}$$
(4)

where the latitude weight wi is given by \({w}_{i}={N}_{\mathrm{lat}}\times \frac{\cos {\phi }_{i}}{{\sum }_{i=1}^{{N}_{\mathrm{lat}}}\cos {\phi }_{i}}\), c represents the specified variable, ϕi refers to the latitudinal value, Nlat is the total number of latitudinal grids, and \({\bar{\hat{X}}}_{c,i,j}^{t}\) and \({\bar{X}}_{c,i,j}^{t}\) correspond to the spatial averages over all grid points of the forecast field and the ground-truth field, respectively.