Introduction

Fine particulate matter (PM2.5) poses significant threats to human health, ecosystems, and climate1,2,3. In the Yangtze River Delta (YRD) region, strict emission control measures in recent years have effectively reduced the intensity and frequency of severe PM2.5 pollution events4. However, an unexpected resurgence of PM2.5 pollution occurred during the 2023–2024 winter, suggesting possible shifts in pollution patterns due to changes in emissions and atmospheric conditions. This pronounced pollution rebound indicates the necessity for a comprehensive analysis into the characteristics and underlying mechanisms of this extreme pollution episode from multiple perspectives.

In Shanghai’s typical winter pollution episodes, nitrate (NO3) constitute the dominant component of PM2.5, driven by adverse meteorological conditions and elevated emissions5,6,7. Previous studies demonstrated that under stagnant meteorological conditions, the substantial emissions of pollutants in the YRD can rapidly accumulate significant amounts of PM2.5. Furthermore, the influence of cold fronts in winter facilitates the transport of pollutants from the North China Plain (NCP) to the YRD, leading to the further deterioration of air quality in Shanghai and the broader YRD region8,9,10. NO3 has been the dominant contributor to the Shanghai PM2.5 pollution event during winter, surpassing sulfate particles (SO42−) in contribution since the strict emission control11,12,13. NO3 formation is primarily affected by nitrogen oxides (NOx), volatile organic compounds (VOCs), and ammonia (NH3) as precursors14,15,16. The process involves the oxidation of NOx to nitric acid (HNO3), which subsequently condenses into the particulate phase in the presence of alkalinity14,17. Recent studies reported that elevated atmospheric oxidation capacity (AOC) enhanced the production of NO3, particularly under high relative humidity conditions during nighttime through the formation and hydrolysis of nitrogen pentoxide (N2O5), leading to high NO3 and PM2.5 levels in the YRD18,19,20,21.

In regional PM2.5 pollution studies, chemical transport models (CTMs) and machine learning methods are widely used and highly effective research tools. Chemical transport models (CTMs) can reappear the spatiotemporal distribution and evolution of atmospheric pollutants. Studies employing the source-apportionment Community Multiscale Air Quality (CMAQ) CTM identified industry, transportation, and power plants as major emission sources in the YRD22,23. Wu et al.24 used NAQPMS CTM highlighted the role of regional transport in elevating PM2.5 levels in Shanghai. Xie et al.25 utilized UCD/CIT CTM to analyze a typical pollution event during 2017–2018 winter, providing insights into the aging processes of pollutants. In addition to traditional CTMs, machine learning methods have the potential to be a complementary tool for prediction and interpreting PM2.5 pollution dynamics. Wu et al.26 employed long short-term memory network (LSTM) model to predict seasonal PM2.5 patterns in Beijing. Hou et al.27 reported the important contribution of photochemical process to NO3 formation using random forest models combined with Shapley Additive explanation (SHAP). Advanced techniques like CatBoost and Tree-structured Parzen Estimator (TPE) have also been applied to evaluate the impacts of long-term emissions and operational factors in China28. While CTMs effectively depict spatiotemporal distributions and source contributions, and interpretable machine learning model can assess the importance of the specific factors, limited studies integrated both approaches to comprehensively analyze pollution episodes. Moreover, some research often focused on a single pollution event in winter, despite multiple pollution events typically occurring throughout the winter with potentially distinct characteristics and drivers. Relying on a single event may overlook the entire circumstance of wintertime pollution.

This study integrated traditional CMAQ model with interpretable machine learning (XGBOOST with SHAP) to leverage their strengths in analyzing the characteristic and driving factors of PM2.5 pollution during the 2023–2024 winter in Shanghai. We also conducted sensitivity analyses of important precursors (NH3, NOx, VOCs, and SO2) to quantify their impacts on NO3 and PM2.5 concentrations.

Results

Characteristics of PM2.5 pollution

We evaluated the CMAQ model performance by comparing simulated PM2.5 concentrations and chemical components with ground-based observations, demonstrating good agreement between modeled and measured values. We employed statistical metrics including mean fractional bias (MFB), mean fractional error (MFE), mean normalized bias (MNB), mean normalized error (MNE), and root mean square error (RMSE) using recommended criteria for evaluation29,30. As shown in Table S1, while the model slightly underestimated wintertime PM2.5 concentrations, all statistical metrics met recommended standards. The regional and Shanghai PM2.5 MFB values were −0.04 and −0.13 respectively (within ±0.6 threshold), and corresponding MFE values were 0.48 and 0.31 (below 0.75 threshold). Similarly, the inorganic components in Shanghai showed agreement with observations. While EC and OC simulations were less accurate, their performance remained within acceptable limits. Given the good performance of the species of concern, we consider the overall model performance satisfactory. Figure S1 presents the time series comparison of simulated and observed PM2.5 concentrations. The model successfully captured both the temporal trends and peak values, though some underestimation occurred during extreme events. It worth noting that simulated and observed values aligned well before and after these underestimated periods, suggesting the discrepancies may stem from unaccounted complex processes or social factors (e.g., firework emissions during Spring Festival). These uncertainties have been addressed in the discussion section. In summary, the model effectively reproduced both the magnitude and temporal patterns of PM2.5 concentrations, showing consistency with observations. These results demonstrate robust model performance, supporting our analysis of pollution characteristics.

During the winter of 2023–2024, Shanghai experienced prolonged, frequent, and high-intensity PM2.5 pollution, marking one of the most severe periods in the past six years (Table S2). Over the three-month period, the daily average PM2.5 concentration in Shanghai was 48.7 μg m−3, with a peak concentration of 162.7 μg m−3. Pollution days (PM2.5 > 35 μg m−3) totaled 38, with an average concentration of 79.5 μg m−3 during these days. PM2.5 concentrations were particularly high in December and January, coinciding with a prolonged pollution episode around New Year. A sharp increase in PM2.5 levels began on 23 December, with concentrations remaining elevated for several days, frequently exceeding 75 μg m−3 and peaking at 152.5 μg m−3. Although precipitation temporarily reduced PM2.5 levels during this period, the concentrations rebounded rapidly afterward, resulting in a prolonged pollution episode lasting until 15 January. While the highest PM2.5 concentration of the winter occurred on 10 February, overall levels in February were relatively low, with the highest number of clean days (PM2.5 < 20 μg m−3) recorded in this winter.

This study divided the pollution period into four distinct pollution events (defined as periods with three or more consecutive pollution days): P1 (3–9 December), P2 (23 December–14 January), P3 (25–29 January), and P4 (8–12 February). PM2.5 composition varied across different events, with secondary inorganic aerosols (SIA) dominating throughout the winter, accounting for 60% of the total mass (33% NO3, 13% SO42−, and 14% NH4+) (Fig. 1). It was noting that NO3 dominated three of four pollution events during the 2023–2024 winter in Shanghai (Fig. 2). OC was another significant component, contributing 14% on average, with higher contributions on clean days (27%). Other primary components, including dust and metals contributed minimally but exhibited higher variability during pollution events. Spatiotemporal distribution of PM2.5 and its components, derived from the CMAQ model, revealed high PM2.5 concentrations in the northern inland areas near NCP (Fig. S2). Elevated NO3 levels occurred in the western YRD, gradually decreasing toward surrounding regions. Regional PM2.5 and NO3 concentrations in Shanghai and adjacent provinces (Jiangsu and Zhejiang) exhibited similar patterns, indicating a shared pollution process across this region.

Fig. 1: Observed PM2.5 concentrations and their chemical composition in Shanghai during the winter of 2023–2024.
figure 1

The data are presented for the entire winter (Total), four pollution events (P1: 3–9 December; P2: 23 December–14 January; P3: 25–29 January; P4: 8–12 February), and clean days (Clean). The category “Resolved Element” represents the combined mass of Cl, K+, Mg2+, and Ca2+.

Fig. 2: Incremental mass ratio (IMR) of major PM2.5 chemical components in Shanghai during the four pollution events.
figure 2

The data are presented for the four pollution events. Categories and symbols used here are the same as in Fig. 1.

Contributions of variables and sources in NO3

Given that NO3 was the dominant component of PM2.5 in Shanghai during the winter, we conducted a cause analysis of NO3 using models. Figure 3 shows the SHAP analysis results the entire winter, identifying VOCs, RH, and NH3 as the most influential factors. Nitrate formation occurs primarily through two dominant pathways: NOx oxidation to nitric acid followed by neuralization with ammonia, and heterogeneous hydrolysis of N2O516,31. In Shanghai’s pollution episodes, the first pathway typically dominates under ammonium rich conditions32,33,34. The oxidation of NOx proceeds primarily though OH or ozone (O3). In VOC-limited regimes, higher VOCs concentrations enhanced oxidant levels, accelerating HNO3 formation, which often occurs in urban winter17,35. Furthermore, NH3 preferably neutralizes sulfuric acid first, with additional NH3 subsequently stabilizing nitrate formation, highlighting the crucial role of NH3 availability in nitrate production36. High RH can promote nitrate formation through enhanced aqueous reactions and N2O5 heterogeneous hydrolysis, while also facilitating particle hygroscopic growth, though accompanied by precipitation under near saturation conditions20,37,38. Source apportionment results revealed that 16.8% of NO3 concentrations originated from local emissions, 39.4% from adjacent transport from Jiangsu and Zhejiang, and 26.6% from long-range transport from the NCP (Fig. 4). We also conducted indirect validation by incorporating PM2.5 concentration observations from multiple sites (Fig. S2). Under stagnant meteorological conditions, the availability of sufficient precursors accelerated NO3 formation, leading to pollution outbreaks. However, the specific drivers of pollution varied across different events.

Fig. 3: Summary plot of SHAP values.
figure 3

SHAP values of factors influencing NO3 concentrations in Shanghai during the entire winter of 2023–2024.

Fig. 4: Time series of meteorological conditions and NO3 regional sources in Shanghai.
figure 4

a Observed wind, relative humidity (RH), and precipitation (PRE). b Comparison of observed and simulated NO3 concentrations in Shanghai. c Simulated regional contributions to NO3 in Shanghai (SH Shanghai, JSZJ Jiangsu and Zhejiang, AH Anhui, NCP North China Plain, bdy boundary sources). Red shading indicates pollution days (PM2.5 > 35 μg m−3).

P1 (3–9 December)

This event was characterized by high contributions from both primary and secondary pollutants. NO3 accounted for 30.2% of the IMR, while other components, including primary aerosols, also made substantial contributions, highlighting a mixed pollution event. Adverse meteorological conditions and regional transport were the primary drivers. High RH ( ~ 80%) promoted N2O5 hydrolysis, while low temperature and high humidity conditions favored nitric acid partitioning to the particle phase39,40. Southwesterly transport from Zhejiang significantly elevated pollutant levels in Shanghai. And Shanghai’s aerosol concentrations began to decrease following the rapid decline in Zhejiang’s pollution levels, while contributions from the NCP became increased (Fig. S3).

P2 (23 December–14 January)

This prolonged event was dominated by SIA, particularly NO3 (IMR = 40.5%). Both meteorological and chemical processes contributed to this event. Meteorological conditions were characterized by stagnation and frequent fog events, as recorded by the National Meteorological Center of CMA (http://www.nmc.cn). The event was segmented into three sub-periods: the first period (23 December–1 January), similar to P1, was characterized by transport from the NCP triggering a PM2.5 surge on 31 December following substantial local chemical formation (Figs. S1 and S4); the second period (3–7 January) was dominated by local and adjacent emissions under stable meteorological conditions, with NO3 and PM2.5 levels peaking on 6 January due to fog and local emissions; and the third period (9–14 January) was influenced by local chemical processes, with over 80% of NO3 contributions originating from Shanghai, Jiangsu, and Zhejiang.

P3 (25–29 January)

NO3 (IMR = 44.6%) dominated this event, driven by stable meteorological condition. Stable wind conditions prevailed across central and eastern China, facilitating the continuous transport of pollution from NCP. PM2.5 observations further corroborate these findings: under northwesterly flow, decreasing concentrations in Zhengzhou were followed by significant increases in Shanghai several days later (Fig. S3). The low RH ( ~ 50%) during this period limited NO3 formation, but the persistent influx of pollutants from the NCP resulted in elevated pollution levels that were difficult to disperse.

P4 (8–12 February)

This event coincided with the Spring Festival and was characterized by high IMR from elements (17.3%), sulfate (17.2%), and nitrate (18.2%). Intensive fireworks emissions exacerbated PM2.5 pollution, although these emissions were not included in the models. VOCs and RH were identified as the most influential factors (Fig. S5), with VOCs promoting local NO3 formation under stable meteorological conditions. Previous observational studies indicate that fireworks can significantly elevate PM2.5 concentrations while causing slight increases in nitrate levels, attributed to their primary emissions of potassium and sulfate with some nitrate32,41,42,43. Therefore, firework emissions may play more roles in intensifying the PM2.5 pollution event44.

NO3 and PM2.5 sensitivity to precursors

Given the dominance of NO3 in most pollution events, a sensitivity experiment was conducted to assess the impacts of reducing emissions of NH3, NOx, VOCs, and SO2 on PM2.5 and NO3 concentrations. Figure 5 illustrated the changes in NO3 and PM2.5 concentrations in Shanghai under different emission reduction scenarios (20%, 40%, 60%, and 80%). For NO3, both NH3 and NOx reduction were effective, with NH3 showing greater efficacy than NOx. In contrast, VOCs reduction had a limited impact, and the differences in effectiveness among these precursors became more pronounced as the emission reduction ratio increased. A 20% reduction in NH3, NOx, and VOCs decreased NO3 concentrations by 10.8%, 8.7%, and 8.3%, respectively, while an 80% reduction decreased concentrations by 61.6%, 47.5%, and 21.9%, respectively. However, SO2 reduction slightly increased NO3 concentrations under all scenarios. For PM2.5, NH3 and VOCs reductions were highly effective at low reduction levels (20%), decreasing PM2.5 by 4.8% and 5.6%, respectively, compared to NOx (2.7%) and SO2 (2.6%). At higher reduction levels (80%), NH3 reduction remained the most effective, decreasing PM2.5 by 26.8%, compared to NOx (16.3%), VOCs (13.3%), and SO2 (6.7%).

Fig. 5: Responses of NO3 and PM2.5 concentrations in Shanghai under emission reduction scenarios of NH3, NOx, VOC, and SO2.
figure 5

The values in the plots represent the baseline concentrations of NO3 (a) and PM2.5 (b) in Shanghai during the winter of 2023–2024.

These differences were attributed to the underlying chemical processes. Gas-phase photooxidation (OH + NO2) dominates NO3 formation, making NOx reduction directly impactful16,31,45. NH3, as a key alkaline gas, neutralized nitric and sulfuric acids, limiting particulate formation17,46,47,48. Additionally, the differences in the lifetimes of HNO3 and NO3 against deposition further influenced the response to NH3 reduction35. VOCs influenced multiple chemical processes, including secondary organic aerosol (SOA) formation and photochemical reactions, but its impact on NO3 reduction required higher reduction ratios combined with NOx control to overcome nonlinear feedbacks49,50. SHAP analysis under different scenarios also identified VOCs as the most influential factor in most cases and indicated an increasing influence of O3 on NO3 with higher NOx reduction ratios, suggesting the important interactions between O3 and NO3 (Fig. S6). As for SO2, due to the chemical competition for bases in SO42− and NO3 formation, reduced SO2 emissions led to an increase in NO3 concentrations49,51,52.

We also analyzed the response of the entire region (Fig. 6). In this study, we regarded the precursor with the best effect on pollutant concentrations as the “dominant precursor” among the four precursors. It indicated that under a 20% emission reduction ratio, NH3 was the dominant precursor for NO3 in Shanghai and eastern Zhejiang, VOCs were in Jiangsu and parts of the NCP, and NOx was in most other areas. With the increase of the emission reduction ratio, the area where VOCs were the dominant precursor shifted to NOx, while the southeast region of China gradually transitioned to NH3. As for PM2.5, the overall trend was similar to NO3, but at low emission reduction ratios, VOCs were the dominant precursor in the NCP and northern YRD instead of NOx. However, as the emission reduction ratio increased, more regions transitioned to NOx or NH3 as the dominant precursor. Due to the relatively low sulfate concentrations and the partial offset of SO2 reduction effects by increased NO3, no region was dominated by SO2.

Fig. 6: Spatial distributions of dominant precursors influencing NO3 and PM2.5 under emission reduction scenarios of NH3, NOx, VOC, and SO2.
figure 6

The maps display the dominant precursors of NO3 (ad) and PM2.5 (e-h) for different scenarios and regions. The white shadedareas represent the relative difference value of responses to different precursors is less than 5%.

These patterns were likely associated with the atmospheric background, particularly at low reduction levels. Notably, the responses to NH3, NOx, and VOCs reductions in many eastern regions were similarly effective (Figs. S7 and S8), indicating a “transitional regime” that was challenging to characterize directly. VOCs were the dominant precursor in the southern NCP. As this region is a major agricultural and industrial hub with substantial ammonia emissions, nitrate formation typically occurred under ammonia-saturated conditions. Consequently, VOCs reduction may serve as an effective strategy to reduce nitrate concentration in these areas, particularly during winter when VOC-limited conditions significantly constrain oxidant availability14,17. In contrast, central China (Hunan and Hubei) exhibited greater sensitivity to NOx reduction. It could be attributed to its high agricultural ammonia emissions with lower NOx levels than the southern NCP, allowing NOx to emerge as the dominant precursor35. NH3 was the dominant precursor in eastern China, consistent with the region’s elevated NOx levels53.

We further analyzed the response to 20% reduction of multiple precursor scenarios (Fig. 7). The results demonstrate that the most effective reduction for nitrate in Shanghai were the NH3-NOx-VOC and NH3-NOx-VOC-SO2 control scenarios, indicating the effects of reducing all three precursors. The combined reduction scenarios of VOC-NH3 and NOx-NH3 also show significant effects, confirming the effects of ammonia on nitrate in Shanghai. SO2 reduction fails to decrease nitrate concentration and even results in slight increases in some regions owing to competition of ammonia between sulfate and nitrate formation. And the PM2.5 response patterns to combined reduction are similar to nitrate (Fig. S9). The combined reduction analysis demonstrated that absence of linear additive effects when compared to single precursor reduction, with some regions even exhibiting diminished efficacy. For example, the reduction effects of VOCs in southern NCP were attenuated under combined reduction scenarios, suggesting nonlinear atmospheric interactions. Nevertheless, the regional dominant precursors maintained their predominant roles across combined reduction scenarios (Fig. 6), highlighting their critical importance for informing integrated control strategies.

Fig. 7: Spatial distributions of nitrate responses to different combined emission reduction scenarios.
figure 7

a NOx-VOC, b NOx-NH3, c NOx-SO2, d VOC-NH3, e VOC-SO2, f NH3-SO2, g NOx-VOC-NH3, and h NOx-VOC-NH3-SO2.

Discussion

In this study, observation data were used to analyze the characteristics of PM2.5 pollution, and the simulation results were integrated to provide further insights. However, the models employed can lead to some uncertainties. The emission inventory, which provided data only up to 2020, was adjusted to reflect 2023–2024 conditions, introducing additional uncertainties. Furthermore, the emission inventory was allocated at a monthly resolution, distinguishing between workdays and weekends, which limited the ability to assess the contribution of emission changes to PM2.5 pollution during the study period. Given the overall improvement in model performance after emission adjustments, the consistency with previous similar studies, and the inherent uncertainties in emission inventories themselves, we consider this emission adjustment approach to be acceptable54,55,56. Additionally, the impact of fireworks emissions was not included in the models due to limited emission process data, leading to significant underestimation of pollutants such as metals and sulfur, particularly during the Spring Festival. Moreover, while this study utilized source apportionment and machine learning models to analyze pollution events, it lacked methods to examine physicochemical processes and mechanisms in detail. However, since this study focuses on pollutant concentration characteristics rather than microphysical and chemical processes, and considering the model successfully captures both the peak concentrations and temporal evolution patterns of particulate matter, we maintain that the model results are acceptable for investigating the spatiotemporal distribution of particle concentrations. The machine learning model incorporated only 16 factors, which may be insufficient to fully capture the influences of meteorological and chemical conditions, while certain omitted variables may have been partially represented due to collinear relationships with the selected factors. While the selected time configuration may affect short-term model robustness, its impacts on winter-scale model were negligible.

Beyond the limitations of the models themselves, short-term simulations are unable to account for the effects of climatic oscillations such as the El Niño-Southern Oscillation (ENSO). The winter of 2023–2024 coincided with an El Niño event, which was associated with stable meteorological conditions that exacerbated pollution events57. To assess the El Niño impacts, we conducted an experiment using meteorological data during 2017–2018 (La Niña year) combined with 2023–2024 emission inventories (Fig. S10). Results reveal an overall reduction in peak concentrations, yet multiple pollution events still occurred, suggesting that the 2023–2024 El Niño event likely intensified pollution levels. Given that El Niño is a recurring climate phenomenon, we maintain that the 2023–2024 meteorological conditions (despite being influenced by a moderate El Niño) remain representative of typical years. Compared with the original 2017–2018 severe pollution cases (Table S2), it indicates significant emission reduction benefits in recent years. Nevertheless, additional emission control may be required to prevent pollution during unfavorable years. Future studies should consider more detailed processes and larger temporal scales to comprehensively analyze the causes of pollution events in Shanghai and other regions.

The study conducted a simple sensitivity analysis of emission reductions without considering more combined reduction of multiple precursors or region-specific control scenarios. It could limit the assessment of local concentration impacts resulting from varying provincial policy implementations, particularly in provinces with substantially different regulatory frameworks compared to neighboring regions. Nevertheless, the results remain valuable. They indicated that while NO3 was the dominant contributor to PM2.5 pollution in Shanghai, optimal emission reduction strategies may differ when targeting total PM2.5 reduction. These findings can be generalized to other regions. Furthermore, it is worth noting that, despite identifying the “dominant precursor”, the effects of reducing different precursors may show limited differences in some cases, highlighting the need to investigate the “transitional regime”. While this study primarily focused on the nitrate component of PM2.5 pollution, further research on the underlying mechanisms and integrated air quality control strategies is still required, given the uncertainties in aerosol processes and the frequent cooccurrence of O3 and PM2.5 pollution58,59.

Methods

Observation

Meteorological data and PM2.5 concentrations, including its components, were collected from December 2023 to February 2024. Meteorological observation data were from the National Climate Data Center (NCDC; https://www.ncdc.noaa.gov, last viewed on March 1, 2025). The observational site of PM2.5 and its component were in the Shanghai Academy of Environmental Sciences (SAES; 31.17° N, 121.42° E) in the southwest of the central urban area of Shanghai, which could be regarded as a representative urban area influenced by a wide mixture of emission sources in Shanghai. A detailed description can be found in previous studies60,61. PM2.5 concentrations were measured by an online particulate monitor (FH 62 C14 series, Thermo Fisher Scientific Inc.) using beta attenuation techniques equipped with a verified PM2.5 cyclone. Carbonaceous species (OC and EC) were analyzed by a semicontinuous OC/EC analyzer (model RT-4, Sunset Laboratory Inc.) equipped with an upstream parallel-plate organic denuder62. Surface PM2.5 observation data in other cities were from the national air quality monitoring network developed by the China National Environmental Monitoring Center (http://www.cnemc.cn, last access: 1 March 2025).

We used incremental mass ratio (IMR) to identify the components driving PM2.5 pollution episodes63. The IMR of a certain component is calculated as the ratio of the increment of component \(i\) (\(\varDelta {C}_{i}\)) to the increment of total PM2.5 (\(\varDelta {{PM}}_{2.5}\)) during the pollution episode (Eq. (1)). We selected a pollution event (consecutive days with PM2.5 > 35 μg m−3) and five days prior to its occurrence for comparison. A high IMR value indicated a significant contribution of the component to the specific PM2.5 pollution event.

$${{IMR}}_{i}=\frac{\varDelta {C}_{i}}{\varDelta {{PM}}_{2.5}}\times 100$$
(1)

Model setup

We employed a modified Community Multiscale Air Quality (CMAQ) model v5.0.2 with the revised Statewide Air Pollution Research Center (SAPRC-11) chemical mechanism to simulate PM2.5 levels and identify emission sources64,65. The simulation period spanned from November 28, 2023 to February 29, 2024, excluding the first three days as the spin-up period. The model employed a 36 km horizontal resolution (197 × 127 grid cells) covering all of China, with a nested 12 km resolution (97 × 88 grid cells) over the YRD to provide detailed simulations for Shanghai (Fig. S11). In addition, four individual precursor emission reduction scenarios were conducted, involving individual reductions of 20%, 40%, 60%, and 80% for NH3, NOx, VOCs, and SO2 emissions respectively. We further conducted eight combined emission reduction scenarios, including 20% reductions for two, three (NH3-NO3-VOCs), and all four species to better inform integrated control strategies.

We used the Weather Research and Forecasting (WRF) model v4.1.2 to provide the meteorological driving fields, based on the FNL (Final) Operational Global Analysis data from the National Center for Environmental Prediction (NCEP) (https://rda.ucar.edu/datasets/ds083.2, last access: March 1, 2025). The vertical grids were divided into 18 sigma levels, extending from the surface to the upper troposphere. Original anthropogenic emission inventory was from the Multi-resolution Emission Inventory model for Climate and air pollution research (MEIC) v1.4 (http://meicmodel.org.cn, last access: March 1, 2025), and we adjusted to 2023–2024 based on power generation, transportation turnover, and national development plans. Biogenic emissions were generated using the Model of Emissions of Gases and Aerosols from Nature (MEGAN) v2.166. The simulation results were validated against observation, demonstrating good agreement between the modeled and measured values (Table S1).

Emission adjustment method

We adjusted the five sectors (transportation, industry, power, residential, and agriculture) in the MEIC data to reflect 2023–2024 conditions based on 2019 data (Table S3). Following these emission adjustments, the model demonstrated improved performance (Table S1) when validated against observation data, confirming the reasonableness of our adjustment methodology.

For the transportation sector, which is a major source of NOx and carbon monoxide (CO) emissions, we adjusted NOx emissions based on official passenger and freight turnover data (https://www.mot.gov.cn/tongjishuju, last access: March 1, 2025) (Table S4). since CO emissions had minimal impact on our simulation results, they were kept unadjusted along with other pollutants.

For the industrial sector, which is a major source of SO2, NOx, and particulate matters (PM) in China, was identified as exhibiting the most significant emission trends among all sectors67. We assumed the annual emission trend maintained consistent and adjusted emissions to 2023–2024 using 2018–2019 (excluding NOx as specified below) according to the following equations (Eqs. (2)–(4)):

$${\delta }_{i,j,k}=\frac{{E}_{i,j,k,2019}}{{E}_{i,j,k,2018}}$$
(2)
$${E}_{i,j,k,2023}={{\delta }_{i,j,k}}^{4}\cdot {E}_{i,j,k,2019}$$
(3)
$${E}_{i,j,k,2024}={{\delta }_{i,j,k}}^{5}\cdot {E}_{i,j,k,2019}$$
(4)

where \({E}_{i,j,k,t}\) is the concentration of \(k\) emissions (SO2 and PM) at grid location \((i,j)\) in year \(t\). \({\delta }_{i,j,k}\) is the emission adjustment coefficient of industrial sector.

For the power sector, we adjusted all emissions based on national thermal power generating capacity data for 2019, 2023, and 2024 obtained from the China Electricity Council (https://cec.org.cn/index.html, last access: March 1, 2025) as fowling equations (Eqs. (5) and (6)):

$${\delta }_{p{ower},2023}=\frac{{E}_{p{ower},2023}}{{E}_{p{ower},2019}}=1.197$$
(5)
$${\delta }_{p{ower},2024}=\frac{{E}_{p{ower},2024}}{{E}_{p{ower},2019}}=1.262$$
(6)

where \({E}_{{power},2019}\), \({E}_{{power},2023}\), and \({E}_{{power},2024}\) represent the power generating capacity in 2019, 2023, and 2024. \({\delta }_{p{ower},2023}\) and \({\delta }_{p{ower},2024}\) represent the emission adjustment coefficients for power sector in 2023 and 2024, respectively.

For residential and agricultural sectors, we assumed minimal interannual variation and insignificant impacts on simulation results, thus retaining unadjusted 2019 MEIC values.

Additionally, as for VOCs emissions from transportation and industrial sectors, we implemented corresponding adjustments. According to a five-year plan on energy conservation and emission reduction released by the State Council in 2022, China will appropriately reduce the NOx and VOCs emissions by over 10% from 2020 levels by 2025 (http://www.gov.cn/xinwen/2022-01/24/content_5670214.htm, last access: March 1, 2025). Following the VOCs emission control plans developed by Zhong et al.68, our study assume strict implementation of these measures over the past five years, establishing reduction coefficients for industrial and transportation VOCs emissions, with an annual NOx reduction rate of 2% for industrial sector.

Machine learning method

Boosting is a tree-based ensemble learning model, and eXtreme Gradient Boosting (XGBOOST) is an extended efficient version, introducing critical improvements in computational efficiency and model robustness28,69. In this study, we employed XGBOOST to investigate the factors influencing NO3 concentrations, incorporating 16 factors that encompass 9 chemical species (NO2, NO, N2O5, NO3, OH, O3, HO2, VOCs, and NH3) and 7 meteorological parameters (temperature [T], relative humidity [RH], zonal wind speed [U10], meridional wind speed [V10], surface pressure [PRSFC], planetary boundary layer height [PBL], and solar radiation [RGRND]). These factors were selected based on their relevance to the primary reaction pathways of nitric acid formation, including the heterogeneous hydrolysis of N2O5 and NO2, gas-phase reactions, and the oxidation of VOCs by NO3 radicals16,31,70. Additional tests incorporating other variables (e.g., pH and heterogeneous uptake coefficients) exhibited similar model performance (Fig. S12). And Shapley Additive Explanations analysis indicated these supplementary variables indicated comparable importance patterns to existing variables, likely because they share overlapping chemical mechanisms (e.g., pH and ammonia both influence nitrate neutralization reactions48,71,72), suggesting their effects are already partially represented within existing variables. Thus, we exclude them from the model. All data were obtained from the CMAQ model.

The Shapley Additive Explanations (SHAP) method we used to interpret the XGBOOST model outputs, providing a robust framework for assessing the contribution of each factor to the predicted NO3 concentrations. Shapley values quantified the average marginal contribution of a factor across all potential combinations of factors, ensuring the equitability and justification of the attribution28,73,74. To calculate the marginal contribution of each factor in the XGBOOST model to the predicted \({{Nitrate}}_{i}\), the shapely value of each factor can be obtained by the following formula (Eqs. (7) and (8)):

$${{Nitrate}}_{i}={{Nitrate}}_{i({base})}+\mathop{\sum }\limits_{j=1}^{N}{shap}({x}_{i,j})$$
(7)
$${shap}({x}_{i,j})=\sum _{S\subseteq \{{x}_{i,1},\,{x}_{i,2},\ldots ,{x}_{i,n}\}\backslash \{{x}_{i,j}\}}\frac{\left|S\right|!\left(N-\left|S\right|-1\right)!}{N!}\left({{Nitrate}}_{(s\cup \{{x}_{i,j}\})}-{{Nitrate}}_{(S)}\right)$$
(8)

where \({{Nitrate}}_{i}\) is the predicted value generated for each sample \(i\) with \(N\) factors. \({{Nitrate}}_{i({base})}\) is the expectancy-value of \({{Nitrate}}_{i}\). \({x}_{i,j}\) denotes the factor \(j\) for sample \(i\). \({shap}({x}_{i,j})\) is the Shapley value of the impact of the factor \(j\) on \({{Nitrate}}_{i}\) at sample \(i\). \(S\) is a subset of factors used in the model, \(\left|S\right|\) is the number of non-zero entries in \(S\). \({{Nitrate}}_{(S)}\) is the predicted value of subset \(S\). \({shap}({x}_{i,j}) > 0\) indicates the positive effect of factor \(j\) that increases the prediction above the base value, and \({shap}({x}_{i,j}) < 0\) is the opposite. Therefore, Shapley value can be used to infer the specific processes that affect the change in the concentration of pollutants for each sample27,75.