Abstract
Numerical weather prediction is the cornerstone of modern weather forecasting, yet its operational implementation demands vast computational resources. While artificial intelligence (AI)-based forecasting models offer a computationally efficient alternative, these purely data-driven approaches often sacrifice physical consistency. Here, we bridge physics-based and AI-based models through a novel, efficient hybrid framework that integrates a low-resolution atmospheric dynamical core with a neural operator in the multigrid architecture. This framework achieves performance comparable to that of state-of-the-art medium-range global weather forecasting models, while incurring much lower training costs, and simultaneously enhances the physical consistency that black-box models often lack. Furthermore, our framework provides substantial flexibility in the choice of dynamical cores, since the training process of the neural network does not require gradient propagation through the dynamical core, which ensures scalability to a wide range of operational forecasting systems.
Similar content being viewed by others
Introduction
Accurate and timely weather forecasting is crucial for a wide range of societal and economic activities1, such as disaster preparedness, agricultural planning, transportation management, and energy grid operation. Over the past decades, numerical weather prediction (NWP) models, grounded in the governing equations of atmospheric dynamics and thermodynamics, have continuously evolved and remain the cornerstone of operational weather prediction1,2. Despite substantial progress, NWP models inevitably contain errors, largely due to the approximations in parameterization schemes, uncertainties in initial conditions, and the incompleteness of the NWP models themselves3. Furthermore, NWP systems face a trade-off when striving to enhance forecasting capabilities, primarily due to the immense computational resources required for their operation. This trade-off is mainly reflected in key factors such as spatio-temporal resolution, the complexity of physical parameterization schemes, and the number of ensemble members in ensemble forecasts4.
However, in recent years, artificial intelligence (AI)-based forecasting methods have brought new possibilities to overcome these limitations3,5,6,7,8,9. Unlike physics-based NWP methods, these data-driven models learn complex spatio-temporal patterns directly from massive historical datasets10,11. Once trained, these models exhibit inference speeds that are orders of magnitude faster than conventional NWP systems while demonstrating superior predictive skills across various deterministic metrics. These significant advantages in speed and energy consumption make AI-based models highly attractive for scenarios requiring rapid updates and large-scale deployment.
Despite the success of numerous purely data-driven models, their predictions may violate physical consistency because they do not explicitly integrate fundamental physical laws12, thereby undermining forecast reliability and trustworthiness. To address this, researchers have sought to develop “physics-AI hybrid models” that aim to combine the strengths of both approaches for predictions that are both efficient and physically consistent. Key strategies include using AI to enhance specific NWP components (e.g., closure or parameterization)13,14,15,16, leveraging AI to generate initial/boundary conditions17,18,19, embedding physical knowledge into AI models20,21,22, and performing online error correction for NWP models2,23.
While existing hybrid methods have shown significant progress, they still face several challenges. First, AI-enhanced NWP model components often face challenges in generalizing from offline training to online deployment24. Although online training can mitigate this issue, it demands the meticulous design of specialized interfaces between NWP and AI models25,26, and may even require rewriting NWP models for differentiability16,22. Second, while AI-assisted techniques for generating initial conditions or optimizing data assimilation provide valuable enhancements, the computational burden of high-resolution NWP simulations remains substantial. Third, physics-guided AI models commonly employ simplified physical equations as soft constraints, and their validation primarily relies on low-resolution datasets. Finally, although AI-based online error correction shows great potential, most studies have not demonstrated performance surpassing the state-of-the-art (SOTA) high-resolution NWP models.
To overcome these limitations, we introduce the hybrid multigrid neural operator (HMgNO), which fuses a low-resolution dynamical core and a neural operator within a multigrid architecture to harness the distinct strengths of both paradigms. The dynamical core provides a strong physical foundation by solving the atmospheric primitive equations, while the neural operator acts as a learned corrector for systematic biases. Furthermore, this framework is designed for modularity: during training, the two components interact solely through data exchange, requiring no gradient backpropagation through the dynamical core. This “plug-and-play” architecture facilitates the effortless integration of diverse dynamical cores. Experiments on medium-range global weather forecasting demonstrate that the accuracy of our framework is competitive with the SOTA data-driven models while significantly enhancing physical consistency and substantially reducing training costs.
The main contributions of this work are summarized as follows:
-
(1)
We introduce a novel framework that effectively integrates an atmospheric dynamical core with data-driven models, achieving comparable performance with SOTA methods in medium-range global weather forecasting.
-
(2)
Compared to purely data-driven methods, our hybrid framework demonstrates enhancing physical consistency and autoregressive forecast stability.
-
(3)
Our framework demonstrates superior computational efficiency, requiring substantially fewer training resources and achieving faster training speeds compared to existing SOTA data-driven and hybrid models.
Results
Deterministic evaluation
To conduct a comprehensive evaluation of the deterministic forecasting performance, we benchmark HMgNO against three advanced models: the operational NWP model HRES, the purely data-driven model Pangu-Weather5, and the SOTA hybrid model NeuralGCM16. Following the established methodology in the field27, we employ the latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC) as our primary evaluation metrics. Given the forecast result \({\hat{x}}_{t,h,w}\) and its ground truth \({x}_{t,h,w}\) at the same time, the latitude-weighted RMSE and ACC for each variable are defined as follows:
where \(m\) represents the climatological mean calculated between 2004 and 2015 as a function of the day of year and the time of day. For consistency, the evaluation period ranges from 6 to 240 h (10 days). All evaluations were conducted at 00 and 12 UTC initialization times on the test set.
Figure 1 illustrates these two deterministic metrics across multiple lead times for three surface variables (T2M, 10U, and 10V) and five upper-air variables (Z500, T850, U500, V500, and Q700). As shown, except for short-term forecasts, the RMSE of the Pangu-Weather and NeuralGCM 1.4 nearly perfectly coincide. HMgNO exhibits a slightly higher RMSE than the operational HRES in the short term, but its error increases slowly, resulting in the smallest error in long-term forecasts. For example, for 500 hPa geopotential height with a 5-day lead time, the RMSE of Pangu-Weather is 293.14, the RMSE of HMgNO is 333.86, slightly higher than the HRES (319.99) and NeuralGCM (297.23). However, for the 10-day lead time, the RMSE of HMgNO is 765.06, which is the lowest compared to HRES (812.71), Pangu-Weather (786.64), and NeuralGCM (784.56). This trend holds for other upper-air variables, as well as the surface variables.
The figure displays the globally averaged latitude-weighted root mean square error (RMSE; first and second rows) and anomaly correlation coefficient (ACC; third and fourth rows) of the HRES (grayish lime green lines), Pangu-Weather (blue lines), NeuralGCM 1.4 (orange lines), and HMgNO (red lines) for three surface variables (T2M, 10U, and 10V) and five upper-air variables (Z500, T850, U500, V500, and Q700), using testing data from 2018. For RMSE, lower is better. For ACC, higher is better. Note that NeuralGCM 1.4 does not produce forecasts for 2-meter temperature or 10-m wind components.
The ACC is used to quantify the spatial correlation between the predicted and the observed anomaly field (deviations from the climatological mean). The higher the ACC value, the higher the forecast skill. In terms of this metric, data-driven models demonstrate a notable advantage over traditional NWP models, especially in the medium to long forecast lead times. Among them, hybrid models are particularly prominent in the long-term forecast: both NeuralGCM and HMgNO exhibit higher ACC values after a week lead time for most variables. This is well-illustrated by the 500 hPa geopotential height: HMgNO shows a slight lead at shorter lead times, but as the forecast lead time extends, NeuralGCM gradually takes the lead over all models. At long lead times, the difference between HMgNO and NeuralGCM decreases gradually, with both models outperforming Pangu-Weather and HRES. We suspect that using more training data, increasing the vertical resolution, and adding specific cloud ice and specific liquid cloud water content, as in NeuralGCM, could further improve the performance of HMgNO. Furthermore, while NeuralGCM demonstrates outstanding performance for pressure level variables, it cannot provide forecasts for 2-m temperature and 10-m wind speed. Specifically, NeuralGCM includes surface pressure as a prognostic variable, but it does not forecast 2-m temperature and 10-m wind speed directly, nor does it provide the necessary diagnostic routines for these surface variables. In contrast, HMgNO directly predicts these high-impact surface variables like other data-driven models with excellent performance.
Geostrophic wind balance
In mid-to-high latitudes, large-scale atmospheric flow is governed by a fundamental diagnostic relationship known as geostrophic balance if friction and acceleration are neglected28. The geostrophic wind vector, \({V}_{g}\), is defined as follow:
where \(f=2\varOmega \sin (\varphi )\) is the Coriolis parameter, Ф is the geopotential, \({\bigtriangledown }_{p}\) represents the gradient on an isobaric surface. The vector difference between the actual horizontal wind and the geostrophic wind is defined as the ageostrophic wind. Although the magnitude of ageostrophic wind is often smaller than the geostrophic wind, it is critically important for the evolution of atmospheric motion. Whether a forecast model can produce a reasonable and realistic ratio of ageostrophic to geostrophic wind serves as a key criterion for assessing the physical consistency between its prediction in the mass field (e.g., geopotential, temperature) and wind field12.
The globally averaged vertical profiles of geostrophic wind speed, ageostrophic wind speed, and the ratio of ageostrophic to geostrophic wind speed (hereafter “wind ratio”) shown in Fig. 2 were calculated after being regridded to 1.40625° resolution. As illustrated, the vertical profile of the wind ratio from ERA5 reanalysis data exhibits an asymmetric bow-shaped structure: (1) In the planetary boundary layer, strong friction leads to a strong ageostrophic wind speed, which results in a high wind ratio; (2) Ascending into the free atmosphere, where the flow is predominantly governed by pressure gradient and Coriolis forces, the ratio decreases sharply, reaching a minimal value near mid-tropospheric (~500 hPa). (3) In the upper troposphere, the ratio increases again, influenced by jet streams and other dynamical processes. (4) A subsequent decrease occurs around 150–100 hPa, as the total wind speed and jet-related accelerations weaken. (5) Finally, above 100 hPa, the ratio rises once more due to the effects of atmospheric wave breaking and wave-mean flow interactions.
The figure presents vertical profiles averaged over the extratropical regions (latitude ≥ 20°) for HRES (grayish lime green lines), Pangu-Weather (blue lines), Dinosaur (gray lines), NeuralGCM 2.8 (light orange lines), NeuralGCM 1.4 (orange lines), SMgNO (purple lines), HMgNO (red lines), and ERA5 (black lines). Values averaged over all forecasts initialized in 2018. The first row is geostrophic wind speed, the second row is ageostrophic wind speed, and the third row is the intensity ratio of ageostrophic wind speed over geostrophic wind speed. The four columns respectively show the results for 1 day, 3 days, 5 days, and 7 days forecast lead times.
A comparative analysis reveals that the wind ratio of HRES most closely matches ERA5. Its geostrophic wind speed shows high consistency with ERA5 reanalysis data across all forecast lead times, while ageostrophic wind speed exhibits minor deviations in the troposphere and lower stratosphere during longer lead times. In contrast, purely data-driven models (such as Pangu and SMgNO) exhibit limitations in maintaining physical consistency and long-term stability, with their wind ratio errors increasing as the forecast lead time extends. The hybrid models, including NeuralGCM and HMgNO, successfully reproduce a vertical profile similar to that of ERA5, significantly outperforming purely data-driven models. Furthermore, while HMgNO exhibits a minor drift with lead times, it surpasses both NeuralGCM 1.4 and Pangu-Weather within the 7-day forecast lead times at most pressure levels (except for the upper level at 50 hPa and the near-surface layers below 850 hPa).
Case study
Accurate surface temperature prediction in conventional NWP models is dependent on complex, empirical boundary layer parameterization schemes29. These schemes, which characterize the intricate exchanges of energy, moisture, and momentum at the Earth-atmosphere interface, are a substantial source of model uncertainty. To determine whether the effects of these complex boundary physical processes can be autonomously learned from data by HMgNO without using typical parameterization approaches, we evaluated its simulations of T2M across three different land surface types. Figure 3 displays the 6-hourly variations in T2M over the Pacific, Amazon rainforest, and Australian desert for the next 10 days, beginning at 00 UTC on September 30, 2018. As shown, the fluctuation of T2M over the Pacific is minimal due to the high heat capacity of water, where all models perform comparably. Over the Amazon forest, a pronounced and highly consistent T2M diurnal cycle is observed. While the HRES captures the cycle, it underestimates the amplitude, resulting in a dampened prediction that fails to reach observed daytime maxima. Pangu-Weather and HMgNO reproduce this diurnal variation more accurately in this situation. In the Australian Desert, where moisture content is low, T2M fluctuates dramatically. On such surfaces, Pangu-Weather struggles to maintain the accuracy of long-term T2M forecasts. Despite HRES demonstrating long-term stability, its forecast errors are greater than those of HMgNO. Comprehensive analysis indicates that, despite lacking explicit surface type information and boundary layer parameterization schemes, HMgNO can learn from the data to capture the effects of different land surface processes.
The figure illustrates the temporal evolution of 2-m temperature at three locations for HRES (grayish lime green lines), Pangu-Weather (blue lines), SMgNO (purple lines), HMgNO (red lines), and ERA5 (black lines). For all cases, the input time is 00:00 UTC on 30 September 2018, and the values are plotted every 6 h.
Spectral analysis
Figure 4 illustrates the horizontal kinetic energy (HKE) spectra at 100 hPa, 500 hPa, and 850 hPa pressure levels (the formula is given in the supplementary information). All model outputs were regridded to a consistent resolution of 1.40625° and averaged over all forecasts initialized in 2018. The spectral analysis reveals that HRES exhibits exceptional performance across all selected altitudes and lead times, with its HKE spectra nearly overlapping with the ERA5 reanalysis. For the remaining models, the energy distributions at large scales remain highly consistent with ERA5 during the early stages of the forecast. However, as the wavenumber increases (corresponding to smaller spatial scales), these models begin to show varying degrees of energy dissipation. A comparison between NeuralGCM 1.4 and NeuralGCM 2.8 reveals that as the data resolution increases, the values of wavenumber at which the spectrum begins to decay also increase, indicating that higher-resolution models can resolve finer-scale atmospheric features. Vertically, energy attenuation is generally more severe in the upper atmosphere (100 hPa) than in the lower levels. NeuralGCM 1.4 outperforms HMgNO and Pangu-Weather at 100 hPa, primarily due to the higher vertical resolution of its training data in the upper atmosphere. Conversely, at the 500 hPa and 850 hPa, Pangu-Weather demonstrates superior overall performance within the 7-day window. This is attributed to its high-resolution (0.25°) training data and its integration of four specific models trained for different lead times via a greedy algorithm, which reduces the required number of iterations and cumulative errors for long-term forecasts.
The figure displays the horizontal kinetic energy spectra for HRES (grayish lime green lines), Pangu-Weather (blue lines), NeuralGCM 2.8 (light orange lines), NeuralGCM 1.4 (orange lines), HMgNO (red lines), and ERA5 (black lines). Values averaged over all forecasts initialized in 2018. Columns from left to right correspond to 1 day, 3 days, 5 days, and 7 days lead time. Rows from top to bottom represent the 100 hPa, 500 hPa, and 850 hPa pressure levels.
Except for HRES, HMgNO exhibits remarkable performance during the first 3 days, being the model that most closely aligns with the ERA5 at 500 hPa and 850 hPa. This indicates that HMgNO can maintain a low RMSE while successfully resolving mesoscale dynamical information during short lead times. However, due to the inherent chaos of the atmosphere, uncertainty increases significantly with longer forecast lead times. In such cases, maintaining a “sharp” energy spectrum often exacerbates the “double-penalty” effect30,31. To mitigate this, HMgNO applies a degree of smoothing to long-term predictions, strategically balancing spectral bias against RMSE. A more detailed discussion regarding this trade-off is provided in the Ablation Study section.
Training costs
As delineated in Table 1, the computational and data overhead associated with training large-scale, data-driven weather forecasting models presents a significant challenge. For instance, Pangu-Weather utilized 39 years of high-resolution (0.25°) data, with its training data storage space estimated to be approximately 60 terabytes (TB), and underwent continuous training for 16 days on 192 NVIDIA V100 graphics processing units (GPUs), incurring costs estimated to be in the range of tens to hundreds of thousands of dollars. While hybrid models like NeuralGCM can achieve commendable performance with lower-resolution data, their training costs remain substantial. The 2.8° version of NeuralGCM was trained for one day on 16 Google Cloud Tensor Processing Units (TPUs) v4 units, while its 1.4° version required one week of training on the same hardware. Furthermore, it is worth noting that, unlike many data-driven models, NeuralGCM requires 37 vertical layers of data to ensure accuracy in interpolation from the native pressure layer to its internal sigma coordinate systems. In contrast, HMgNO achieves comparable performance while incurring training costs that are several orders of magnitude lower than those of Pangu-Weather and NeuralGCM 1.4. Specifically, HMgNO was trained using only 12 years of data, less than one-third of that used by Pangu-Weather and NeuralGCM, which dramatically reduces the data storage requirement to less than 1 TB. The entire training was completed in just 1.6 days on a small-scale setup of 4 Nvidia GeForce RTX 4090 GPUs. The significant reduction in hardware scale and training time achieved by HMgNO lowers training costs to tens of dollars, greatly lowering the barrier to model development.
Ablation study
To systematically evaluate the contributions of the dynamical core and data-driven model within our hybrid architecture, we conducted a comparative analysis between spherical multigrid neural operator (SMgNO)32, Dinosaur16, and HMgNO across various metrics. As shown in Fig. S1 and Fig. 2, due to the lower resolution employed by Dinosaur T63 and the lack of parameterizations for unresolved scales, it exhibits significant forecast errors and limited forecast skill. However, as Dinosaur is a physical-based model, its primary strength lies in its capacity to prevent the non-physical accumulation of energy or momentum imbalances, thereby generating long-term stable atmospheric flows. In contrast, the purely data-driven SMgNO model demonstrates remarkable proficiency in deterministic forecast metrics, which achieves comparable RMSE and ACC to HRES. Nevertheless, it struggles to maintain physical consistency and long-term atmospheric dynamics. As illustrated in Fig. 2, with increasing lead times, the geopotential and wind field of SMgNO gradually decouple, and the global mean wind speed progressively deviates from actual conditions. The integration of Dinosaur and SMgNO within the HMgNO framework effectively leverages their individual strengths while simultaneously mitigating their respective limitations. HMgNO not only yielded more accurate forecasts across deterministic metrics (Fig. S1) but also demonstrated substantially enhanced physical consistency (Fig. 2). Crucially, these substantial improvements were achieved with only a marginal increase in computational overhead compared to the SMgNO (Table 1). In other words, HMgMO establishes an optimal balance among forecast performance, physical consistency, and resource utilization.
Furthermore, to investigate the specific contribution of the third training stage to the enhancement of high-frequency information, we conducted experiments using different loss functions in this stage while keeping the pre-training and fine-tuning stages consistent. Figures S2 and S3 illustrate the HKE spectra and RMSE for HMgNO trained with these different loss functions. Specifically, “HMgNO wo refinement” refers to the baseline model that underwent only the first two training stages without the refinement stage. “HMgNO AMSE”, “HMgNO AMAE”, and “HMgNO” denote models trained in the third stage using adjusted mean squared error (AMSE)33, adjusted mean absolute error (AMAE) with \(p=0\) and \(w=1\), and the hybrid spectral error (HSE), respectively. The formulas for these loss functions are provided in the experiment settings section. A comparison of these models reveals distinct trade-offs. The HKE spectra of HMgNO AMAE align almost perfectly with those of ERA5 and show no degradation as the forecast lead time increases; however, this model yields the highest RMSE. Conversely, while HMgNO without refinement achieves the best performance in terms of RMSE and ACC, it smooths the mesoscale information, an effect that exacerbates with increasing lead time. HMgNO AMSE and HMgNO exhibit performance characteristics between these two models, representing different trade-off strategies. Comprehensively, although HMgNO AMSE presents slightly better energy spectra than HMgNO, it suffers from larger forecast errors. In contrast, HMgNO achieves a superior balance between minimizing forecast error and maintaining spectral fidelity.
Discussion
In this work, we propose HMgNO, a novel forecasting framework based on a multigrid architecture that integrates a low-resolution physical model with an AI model. Experiments on medium-range global weather forecasting demonstrate that our hybrid framework exhibits remarkably slow error accumulation during autoregressive forecasting. Despite substantially less training data and vertical layers, HMgNO performs similarly to NeuralGCM 1.416. They both produce more skillful forecasts than the SOTA operational model, HRES, and the leading data-driven model, Pangu-Weather5. Furthermore, through diagnostic analysis, we find that HMgNO enhances the physical consistency among forecasted variables compared to purely data-driven approaches. Critically, HMgNO achieves these results while reducing training costs by orders of magnitude compared to models like Pangu-Weather and NeuralGCM, which substantially lowers the barrier to entry for the research, development, and iterative refinement of advanced hybrid weather forecasting models.
Despite the notable performance, HMgNO has some limitations. First, current training data has a limited number of vertical layers, resulting in non-negligible errors during data conversion between the pressure layers and the sigma layers. These errors increase the uncertainty of the dynamical core’s forecast, thereby limiting the overall performance. Secondly, similar to other data-driven approaches, HMgNO exhibits a trade-off between the forecast error and spectral fidelity, which is worth further research.
In future work, we will increase the horizontal and vertical resolutions or directly employ sigma layers data to train HMgNO, which will reduce or eliminate errors arising from data conversion, improving the overall performance. We will use ensemble techniques to quantify atmospheric uncertainty and achieve a better trade-off between the forecast error and spectral bias. Furthermore, we also plan to employ the latent space data assimilation techniques, enabling the integration of multi-source observations within the learned manifold for further improvement.
Methods
Overview of the Hybrid Framework
In previous studies34, we combined numerical solvers with planar multigrid neural operators and demonstrated their effectiveness and stability on several typical two-dimensional partial differential equations, such as the Navier-Stokes equations and shallow water equations. In this paper, we extend and optimize this approach and test its performance in the more complex domain of global weather forecasting. Specifically, as illustrated in Fig. 5a, our hybrid framework, HMgNO, fuses a physics-based dynamical core with SMgNO32 to deliver efficient, accurate, and physically consistent medium-range global weather forecasting. The detailed procedure of HMgNO is outlined in Algorithm 1. As shown, HMgNO first employs a restriction operator \(R\) to downsample a given high-resolution state \({x}_{{t}_{0}}^{{HR}}\) to get a low-resolution initial state \({x}_{{t}_{0}}^{{LR}}\), which is compatible with the chosen dynamical core \({ {\mathcal M} }\). Subsequently, the dynamical core performs a forward integration to produce a low-resolution forecast \({x}_{{t}_{n}}^{{LR}}\) at a specific lead time \(\varDelta {t}_{c}\). The forecast is then extended back to the original high resolution to obtain an approximate forecast \({\widetilde{x}}_{{t}_{n}}^{{HR}}\) through the prolongation operator \(P\). Finally, the deviation in this forecast is corrected by the SMgNO, and the result \({x}_{{t}_{n}}^{{HR}}\) is fed back as a new, more accurate initial condition for the subsequent integration cycle. This iterative “integrate-correct-update” loop is repeated until the desired forecast lead time is reached (Fig. 5b). Through this interactive online correction mechanism, HMgNO leverages the powerful nonlinear fitting capabilities of neural networks to suppress error accumulation and amplification during long-term integration, guiding the predicted trajectory toward more realistic atmospheric evolution.
a Overview of the HMgNO. The left part is the numerical module with no trainable parameters. The right part is the SMgNO, which does the online correction. b Sketch of forecast evolution form \(t\) to \(t+n{t}_{c}\). c Comparison of the tendency fitting and trajectory fitting. Tendency fitting tries to match the tendencies (rates of change; dotted lines) of the reference trajectory (solid). Trajectory fitting tries to pull the predicted states toward the reference states directly (dashed line and arrows).
A critical distinction between our approach and NeuralGCM lies in the coupling strategy. As shown in Table S1, NeuralGCM employs “tendency fitting35”, where the neural network \({NN}\) is used to correct the rates of meteorological variables’ change calculated by the dynamical core \({{ {\mathcal M} }}_{{DC}}\). HMgNO employs “trajectory fitting35”, which directly corrects the predicted atmospheric states. The intuitive difference between these two methods is shown in Fig. 5c. Despite NeuralGCM achieving outstanding performance, its training process requires backpropagating gradients through the dynamical core at the integration step, which requires the dynamical core to be fully differentiable. Applying this hybrid approach to other NWP models often needs rewriting legacy Fortran or C code into differentiable frameworks (e.g., JAX, PyTorch) or designing specialized interfaces between the NWP and AI models25,26, which is a massive undertaking. However, in HMgNO, the neural network is used to correct the systematic bias of the trajectory after integration, which no longer needs the gradient backpropagation through the dynamical core. Consequently, existing physical models can be integrated with AI models through our framework without rewriting their code to differentiable frameworks. This flexibility greatly enhances our framework’s applicability and potential for widespread adoption across diverse forecasting systems.
Algorithm 1
The algorithm of the hybrid multigrid neural operator (HMgNO).
1. Initialization: Input high resolution initial field \({x}_{{t}_{0}}^{\mathrm{HR}}\), given the time interval \(\varDelta t\), \(\varDelta {t}_{c}\), and the number of iterations \(N\)
2. Restriction from fine to coarse level: \({x}_{{t}_{0}}^{\mathrm{LR}}=R\left({x}_{{t}_{0}}^{\mathrm{HR}}\right)\)
3. for \(n=1:N\) do
4. \({t}_{n}={t}_{0}+n\varDelta t\)
5. Advance through dynamical core: \({x}_{{t}_{n}}^{\mathrm{LR}}={\mathcal{M}} \left({x}_{{t}_{n-1}}^{\rm{LR}}\right)\)
6. if \(n\Delta{t}\%\) \(\varDelta{t}_{c}==0\) do
7. Prolongation from coarse to fine level: \({\widetilde{x}}_{{t}_{n}}^{\mathrm{HR}}=P({x}_{{t}_{n}}^{\mathrm{LR}})\)
8. Calculate change: \(\varDelta \widetilde{x}={\widetilde{x}}_{{t}_{n}}^{\mathrm{HR}}-{x}_{{t}_{n}-\varDelta {t}_{c}}^{\mathrm{HR}}\)
9. Correct through SMgNO: \(\varDelta {\widetilde{x}}^{{\prime} }=SMgNO\left(\varDelta \widetilde{x},\left({x}_{{t}_{n}-\varDelta {t}_{c}}^{\mathrm{HR}},{\widetilde{x}}_{{t}_{n}}^{\mathrm{HR}}\right)\right)\)
10. Correct the solution: \({x}_{{t}_{n}}^{\mathrm{HR}}={x}_{{t}_{n}-\varDelta {t}_{c}}^{\mathrm{HR}}+{\lambda }_{\theta }\varDelta \widetilde{x}+(1-{\lambda }_{\theta })\varDelta {\widetilde{x}}^{{\prime} }\)
11. Save the corrected solution \({x}_{{t}_{n}}^{\mathrm{HR}}\)
12. Restriction from fine to coarse level: \({x}_{{t}_{n}}^{\mathrm{LR}}=R({x}_{{t}_{n}}^{\mathrm{HR}})\)
13. end if
14. end for
Dynamical core
The dynamical core in NWP models is a simplified atmospheric model grounded in physical laws36, and is responsible for simulating the fluid-dynamical and thermodynamic processes that are explicitly resolvable on the discrete model grid37. Dynamical core defines the causal chains and feedback mechanisms governing interactions among key atmospheric variables. Since its computational process strictly adheres to the conservation of mass and energy, it can provide a robust physical prior and generate a reliable and dynamically consistent background field in HMgNO. This allows SMgNO to focus on learning and correcting systematic biases arising from the use of low-resolution grids and the neglect of physical processes at inaccessible scales. Compared to purely data-driven models that must learn the entire evolution of the geophysical system from scratch, this hybrid approach is more likely to yield predictions that conform to physical laws.
In this paper, we employ a dynamical core named Dinosaur16 to construct our hybrid model. Dinosaur is an open-source global atmospheric dynamical core that employs spectral methods to numerically solve primitive equations within terrain-following sigma coordinates38. Leveraging the JAX framework39, Dinosaur can effectively leverage hardware resources such as GPUs and TPUs to accelerate operations. For more details, please refer to their paper16. In this work, we use a quadratic truncated equidistant grid T63 for horizontal discretization, 17 sigma levels for vertical discretization, and implicit-explicit Runge-Kutta methods40 with a time step of 20 minutes for time integration. Detailed configurations are shown in Table 2.
SMgNO
In our hybrid framework, we deliberately employ a low-resolution dynamical core, aiming to leverage physical priors at a low computational cost. Then, a neural network is used to correct its systematic errors to achieve comparable accuracy to those of SOTA high-resolution NWP models. In this work, we utilize the SMgNO32 to correct deviations of the dynamical core. SMgNO is designed for efficient global weather forecasting by combining the multigrid neural operator41 with spherical harmonic-based convolutions in a manner that respects the underlying spherical geometry. Specifically, SMgNO primarily consists of the system operator, smoothing operator, restriction operator, prolongation operator, and the underlying spherical Fourier neural operator (SFNO)42. For detailed information about SMgNO, please refer to Hu et al.32. To enhance its performance, we introduce three critical modifications. First, we use discrete-continuous (DISCO) convolutions43 as restriction operations, rather than the 2D convolutions with periodic padding in the longitudinal direction, to enhance the accuracy of information transfer in different scales. Second, following the work of Liu-Schiaffini et al.44, we integrate DISCO convolutions into the SFNO to enhance its ability for local information. Third, we replaced the multi-layer perceptrons in SFNO with Fourier analysis networks45 to improve its modeling ability for periodicity.
Data
Data used for training and evaluating our models are primarily derived from ERA546 and WeatherBench47. ERA5 is the fifth-generation global atmospheric reanalysis dataset produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). ERA5 provides hourly reanalysis data from 1940 to the present, with a horizontal resolution of 0.25° (approximately 31 km) and 37 vertical pressure levels, covering a wide range of atmospheric and surface variables. Owing to its high quality and coverage, ERA5 is widely regarded as the SOTA benchmark for weather and climate research. To facilitate standardized evaluation in data-driven weather forecasting, WeatherBench47 offers preprocessed, regridded ERA5 data from 1979 to 2018 at multiple spatial resolutions (e.g., 1.40625°, 2.8125°, and 5.625°) and an hourly temporal resolution.
In this study, we chose a total of 70 target variables at 1.40625° resolution as the input and output of our model: five upper-air atmospheric variables (geopotential, temperature, specific humidity, zonal wind, and meridional wind), each with 13 pressure levels (50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa), along with five surface variables (2-m temperature, 10-m zonal wind, 10-m meridional wind, surface pressure, and mean sea level pressure). The corresponding abbreviations of each variable are shown in Table S2. In addition, two constant fields (the land sea mask and the orography) were also used as input variables. Surface pressure and mean sea level were first downloaded from ERA5 to a 0.70625° resolution and then regridded using the method provided in WeatherBench47. Other variables were directly downloaded from WeatherBench47 dataset. Then the data is sampled at 6-hour frequency and partitioned into a training set (2004–2015), a validation set (2016–2017), and a test set (2018) to ensure out-of-sample evaluation and fair comparison with prior works. The data was preprocessed using z-score normalization before training, and the evaluation metrics were calculated after de-normalization. Furthermore, the forecasts and deterministic evaluation metrics for Pangu-Weather and HRES are downloaded from WeatherBench 227.
Experiment settings
The proposed framework was implemented using PyTorch48, leveraging the training workflow established by ClimaX49. The training procedure is divided into three stages: pre-training, fine-tuning, and refinement for high-frequency information. In the pre-training stage, we employed supervised learning to predict a single time step, establishing the fundamental mapping of the model. Building upon the weights from pre-training, the fine-tuning stage optimizes autoregressive forecasting over 16 steps to enhance long-term accuracy. Finally, the refinement stage also performs autoregressive forecasting over 16 steps, but with a spectral loss to improve the prediction of high-frequency details. Both HMgNO and SMgNO models were trained using Brain Floating Point half-precision format and the Muon optimizer50,51 with parameters \({\beta }_{1}=0.9,{\beta }_{2}=0.95\), following the three-stage training strategy on four NVIDIA GeForce RTX 4090 GPUs.
The specific training parameters for each stage were as follows. The pre-training process was conducted over 50 epochs with a batch size of 14 and a base learning rate of 5 × 10−4, accompanied by a linear warmup schedule for 5 epochs, followed by a cosine annealing schedule52 for the subsequent 45 epochs. The fine-tuning process was carried out over 30 epochs with a batch size of 1 and a base learning rate of 5 × 10−6, accompanied by a linear warmup schedule for 3 epochs, also followed by a cosine annealing schedule52 for the remaining epochs. The refinement process was trained over 10 epochs with a batch size of 1 and an initial learning rate of 1 × 10−6, followed by a cosine annealing schedule52. The loss function for the first two stages is the latitude-weighted mean absolute error (MAE), which is defined as follows:
where \({\hat{x}}_{t,c,h,w}\), \({x}_{t,c,h,w}\) are the predicted and ground truth, \(w(h)\) is the latitude weight, \({\varphi }_{h}\) is the latitude, \(T\), \(C\), \(H\),\(W\) are the number of times, channels, grid points in latitude and longitude, respectively. The loss function for the refinement stage is the HSE with \(p=32\) and \({c}_{0}=0.4\), which is inspired by AMSE33. Specifically, a real-valued function \(g\left(\lambda ,\varphi \right)\) defined on the unit sphere \({{\boldsymbol{S}}}^{2}\) can be expressed through spherical harmonics:
where \(\lambda \in \left[-\pi ,\pi \right]\) is the longitude, \(\varphi \in [0,\pi ]\) is the colatitude, \(L\) is the maximum truncation of total wavenumber, and \({\alpha }_{{lm}}^{g}\) is the associated coefficient of spherical harmonics \({Y}_{{lm}}\). Based on these spherical harmonics transform, Subich et al.33 introduced AMSE to improve the prediction of small-scale information:
where the spectral error, power spectral density and coherence of \(\hat{x}\) and \(x\) are calculate as follows:
while \({(\cdot )}^{* }\) is the complex conjugate, \({\mathfrak{R}}{\mathfrak{(}}{\mathfrak{\cdot }}{\mathfrak{)}}\) means take the real part. To balance forecast error and high-frequency bias, we modified this loss function and proposed the HSE:
where \(w(l)=\min (1.0,\max (0.0,\frac{{{\rm{Coh}}}_{l}\left(\hat{x},x\right)-{c}_{0}}{1-{c}_{0}})\) is the adaptive weight, \(p\in [0,L]\) is a natural number.
Data availability
WeatherBench is publicly available at https://github.com/pangeo-data/WeatherBench. WeatherBench 2 is publicly available at https://github.com/google-research/weatherbench2. ERA5 is downloaded from https://cds.climate.copernicus.eu/.
Code availability
The codes are available upon request from yinfukang@nudt.edu.cn.
References
Bauer, P., Thorpe, A. & Brunet, G. The quiet revolution of numerical weather prediction. Nature 525, 47–55 (2015).
Arcomano, T. et al. A hybrid approach to atmospheric modeling that combines machine learning with a physics-based numerical model. J. Adv. Model. Earth Syst. 14, e2021MS002712 (2022).
Chen, L. et al. FuXi: a cascade machine learning forecasting system for 15-day global weather forecast. Npj Clim. Atmos. Sci. 6, 190 (2023).
Potvin, C. K. & Flora, M. L. Sensitivity of Idealized supercell simulations to horizontal grid spacing: implications for warn-on-forecast. Mon. Weather Rev. 143, 2998–3024 (2015).
Bi, K. et al. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 533–538 (2023).
Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 382, 1416–1421 (2023).
Chen, K. et al. The operational medium-range deterministic weather forecasting can be extended beyond a 10-day lead time. Commun. Earth Environ. 6, 518 (2025).
Allen, A. et al. End-to-end data-driven weather prediction. Nature 641, 1172–1179 (2025).
Price, I. et al. Probabilistic weather forecasting with machine learning. Nature 1–7, https://doi.org/10.1038/s41586-024-08252-9 (2024).
Voosen, P. The AI weather forecaster arrives. Science 382, 1232–1233 (2023).
Voosen, P. AI is set to revolutionize weather forecasts. Science 382, 748–749 (2023).
Bonavita, M. On some limitations of current machine learning weather prediction models. Geophys. Res. Lett. 51, e2023GL107377 (2024).
Pan, S. & Duraisamy, K. Data-driven discovery of closure models. SIAM J. Appl. Dyn. Syst. 17, 2381–2413 (2018).
Bolton, T. & Zanna, L. Applications of deep learning to ocean data inference and subgrid parameterization. J. Adv. Model. Earth Syst. 11, 376–399 (2019).
Han, Y., Zhang, G. J. & Wang, Y. An ensemble of neural networks for moist physics processes, its generalizability and stable integration. J. Adv. Model. Earth Syst. 15, e2022MS003508 (2023).
Kochkov, D. et al. Neural general circulation models for weather and climate. Nature 632, 1060–1066 (2024).
Xu, H., Zhao, Y., Zhao, D., Duan, Y. & Xu, X. Exploring the typhoon intensity forecasting through integrating AI weather forecasting with regional numerical weather model. Npj Clim. Atmos. Sci. 8, 1–10 (2025).
Liu, H.-Y. et al. A hybrid machine learning/physics-based modeling framework for 2-week extended prediction of tropical cyclones. J. Geophys. Res. Mach. Learn. Comput. 1, e2024JH000207 (2024).
Xu, H., Zhao, Y., Zhao, D., Duan, Y. & Xu, X. Improvement of disastrous extreme precipitation forecasting in north China by Pangu-weather AI-driven regional WRF model. Environ. Res. Lett. 19, 054051 (2024).
Luo, Y., Fang, S., Wu, B., Wen, Q. & Sun, L. Physics-guided learning of meteorological dynamics for weather downscaling and forecasting. In Proc. 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 2010–2020. https://doi.org/10.1145/3711896.3737081 (Association for Computing Machinery, 2025).
Zheng, J., Ling, Q. & Feng, Y. Physics-assisted and topology-informed deep learning for weather prediction. In Proc. Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25 (ed. Kwok, J.) 7958–7966. https://doi.org/10.24963/ijcai.2025/885 (International Joint Conferences on Artificial Intelligence Organization, 2025).
Gelbrecht, M., Klöwer, M. & Boers, N. PseudospectralNet: toward hybrid atmospheric models for climate simulations. J. Adv. Model. Earth Syst. 17, e2025MS004969 (2025).
Agarwal, N., Amrhein, D. E. & Grooms, I. Cross-attractor transforms: improving forecasts by learning optimal maps between dynamical systems and imperfect models. Geophys. Res. Lett. 52, e2024GL110472 (2025).
Rasp, S. Coupled online learning as a way to tackle instabilities and biases in neural network parameterizations: general algorithms and Lorenz 96 case study (v1.0). Geosci. Model Dev. 13, 2185–2196 (2020).
Atkinson, J. et al. FTorch: a library for coupling PyTorch models to Fortran. J. Open Source Softw. 10, 7602 (2025).
Ott, J. et al. A Fortran-Keras deep learning bridge for scientific computing. Sci. Program. 2020, 8888811 (2020).
Rasp, S. et al. WeatherBench 2: a benchmark for the next generation of data-driven global weather models. J. Adv. Model. Earth Syst. 16, e2023MS004019 (2024).
Holton, J. R. & Hakim, G. J. Basic conservation laws. in An Introduction to Dynamic Meteorology (Fifth Edition) (eds Holton, J. R. & Hakim, G. J.) 31–66. https://doi.org/10.1016/B978-0-12-384866-6.00002-7 (Academic Press, 2013).
Karlbauer, M. et al. Advancing parsimonious deep learning weather prediction using the HEALPix mesh. J. Adv. Model. Earth Syst. 16, e2023MS004021 (2024).
Hoffman, R. N., Liu, Z., Louis, J.-F. & Grassoti, C. Distortion representation of forecast errors. Mon. Weather Rev. 123, 2758–2770 (1995).
Ghelli, A., Coelho, C., Mittermaier, M. & Power, C. Progress and challenges in forecast verification. Meteorol. Appl. 20, 129–129 (2013).
Hu, Y. et al. Spherical multigrid neural operator for improving autoregressive global weather forecasting. Sci. Rep. 15, 11522 (2025).
Subich, C., Husain, S. Z., Separovic, L. & Yang, J. Fixing the double penalty in data-driven weather forecasting through a modified spherical harmonic loss function. in Proc. 42nd International Conference on Machine Learning 57191–57211 (PMLR, 2025).
Hu, Y., Zhang, W., Yin, F. & Wu, J. HMgNO: hybrid multigrid neural operator with low-order numerical solver for partial differential equations. Neural Netw. 190, 107649 (2025).
Melchers, H., Crommelin, D., Koren, B., Menkovski, V. & Sanderse, B. Comparison of neural closure models for discretised PDEs. Comput. Math. Appl. 143, 94–107 (2023).
Zhang, Y. et al. History and status of atmospheric dynamical core model development in China. in Numerical Weather Prediction: East Asian Perspectives (ed. Park, S. K.) 3–36. https://doi.org/10.1007/978-3-031-40567-9_1 (Springer International Publishing, 2023).
Wood, N. Dynamical cores for NWP: an uncertain landscape. in Uncertainties in Numerical Weather Prediction (eds Ólafsson, H. & Bao, J.-W.) 1–46. https://doi.org/10.1016/B978-0-12-815491-5.00001-X (Elsevier, 2021).
Bourke, W. A multi-level spectral model. I. Formulation and hemispheric integrations. https://journals.ametsoc.org/view/journals/mwre/102/10/1520-0493_1974_102_0687_amlsmi_2_0_co_2.xml (1974).
Bradbury, J. et al. JAX: Composable transformations of python+NumPy programs. Github https://github.com/jax-ml/jax (2018).
Whitaker, J. S. & Kar, S. K. Implicit–explicit runge–kutta methods for fast–slow wave problems. https://doi.org/10.1175/MWR-D-13-00132.1 (2013).
He, J., Liu, X. & Xu, J. MgNO: efficient parameterization of linear operators via multigrid. In The Twelfth International Conference on Learning Representations (OpenReview.net, 2024).
Bonev, B. et al. Spherical Fourier neural operators: Learning stable dynamics on the sphere. In Proc. 40th International Conference on Machine Learning (Honolulu, 2023).
Ocampo, J., Price, M. A. & McEwen, J. Scalable and equivariant spherical CNNs by discrete-continuous (DISCO) convolutions. In The Eleventh International Conference on Learning Representations (OpenReview.net, 2023).
Liu-Schiaffini, M. et al. Neural operators with localized integral and differential kernels. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) 235, 32576–32594 (PMLR, 2024).
Dong, Y. et al. Fourier analysis network (MIT Press, 2025).
Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).
Rasp, S. et al. WeatherBench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12, e2020MS002203 (2020).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. 33rd International Conference on Neural Information Processing Systems 8026–8037 (Curran Associates Inc., 2019).
Nguyen, T., Brandstetter, J., Kapoor, A., Gupta, J. K. & Grover, A. ClimaX: A foundation model for weather and climate. In Proc. 40th International Conference on Machine Learning 25904–25938 (PMLR, 2023).
Jordan, K. et al. Muon: an optimizer for hidden layers in neural networks. Github https://kellerjordan.github.io/posts/muon (2024).
Liu, J. et al. Muon is scalable for LLM training. Preprint at https://doi.org/10.48550/arXiv.2502.16982 (2025).
Loshchilov, I. & Hutter, F. SGDR: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations (OpenReview.net, 2017).
Acknowledgements
We acknowledge funding from the National Natural Science Foundation of China under grant No. 42375155.
Author information
Authors and Affiliations
Contributions
F.Y., W.Z., and Y.H. designed the project. W.Z., K.R., and J.S. managed and oversaw the project. Y.H. performed the model training and evaluation. F.Y. and Y.H. improved the model design. K.D. established the model training environment. F.Y. and Y.H. wrote the manuscript. All authors discussed, commented, and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hu, Y., Yin, F., Zhang, W. et al. A hybrid framework for global weather forecasting via low-resolution dynamical core and multigrid neural operator. npj Clim Atmos Sci 9, 112 (2026). https://doi.org/10.1038/s41612-026-01374-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41612-026-01374-z







