Machine learning-enhanced high-resolution exposure assessment of ultrafine particles

Jianyao, Yudie; Yuan, Hongyong; Su, Guofeng; Wang, Jing; Weng, Wenguo; Zhang, Xiaole

doi:10.1038/s41467-025-56581-8

Download PDF

Article
Open access
Published: 31 January 2025

Machine learning-enhanced high-resolution exposure assessment of ultrafine particles

Nature Communications volume 16, Article number: 1209 (2025) Cite this article

10k Accesses
21 Citations
5 Altmetric
Metrics details

Subjects

Abstract

Ultrafine particles (UFPs) under 100 nm pose significant health risks inadequately addressed by traditional mass-based metrics. The WHO emphasizes particle number concentration (PNC) for assessing UFP exposure, but large-scale evaluations remain scarce. In this study, we developed a stacking-based machine learning framework integrating data-driven and physical-chemical models for a national-scale UFP exposure assessment at 1 km spatial and 1-hour temporal resolutions, leveraging long-term standardized PNC measurements in Switzerland. Approximately 20% (1.7 million) of the Swiss population experiences high UFP exposure exceeding an annual mean of 10⁴ particles‧cm⁻³, with a national average of (9.3 ± 4.7)×10³ particles‧cm⁻³, ranging from (5.5 ± 2.3)×10³ (rural) to (1.4 ± 0.5)×10⁴ particles‧cm⁻³ (urban). A nonlinear relationship is identified between the WHO-recommended 1-hour and 24-hour exposure reference levels, suggesting their non-interchangeability. UFP spatial heterogeneity, quantified by coefficient of variation, ranges from 4.7 ± 4.2 (urban) to 13.8 ± 15.1 (rural) times greater than PM_2.5. These findings provide crucial insights for the development of future UFP standards.

Global high-resolution ultrafine particle number concentrations through data fusion with machine learning

Article Open access 14 November 2025

Low-smoke fuels for residential heating linked to an increase in ultrafine particle emissions

Article 02 March 2026

Daily PM_2.5 concentration estimates by county, ZIP code, and census tract in 11 western states 2008–2018

Article Open access 19 April 2021

Introduction

Exposure to ambient particulate matter is recognized as a leading global health concern¹. Over the past decades, extensive epidemiological studies have investigated the relationship between mortality risk and long-term exposure to mass-based metrics, e.g. ambient PM_2.5 and PM₁₀^2,3,4,5. However, toxicological studies suggest that the relevant dose metric may not only be related to particle mass^6,7, but also dependent on the size, shape, and surface properties of particles^8,9,10,11,12.

Ultrafine particles (UFPs), defined as particles with diameters less than 100 nm, pose significant health risks due to their small size and high specific surface area, which allow them to easily penetrate and contact with human tissues and organs^13,14. Despite their substantial contribution to the total number of particles, UFPs contribute minimally to overall mass^15,16,17, rendering mass-based metrics inadequate for accurately characterizing their influence. Limited studies indicate a potential association between increased UFP exposure and adverse effects in vulnerable populations, e.g. elderly and children, particularly those with respiratory and cardiovascular diseases^7,18,19,20. However, definitive health effects of UFPs remain inconclusive, primarily due to insufficient exposure data on spatial variations in UFP concentrations^2,21,22.

The World Health Organization (WHO) has recommended in its recent Global Air Quality Guidelines (AQG)²³ that ambient UFPs should be quantified by particle number concentration (PNC). The WHO also advocates using emerging science and technology to enhance methods for assessing UFP exposure, which is crucial for epidemiological studies and effective UFP management. Monitoring PNC is inherently more challenging than measuring mass-based metrics due to the predominance of nanoscale particles and the need for high resolution and sensitivity. The transient nature of UFPs also leads to highly variable results depending on time and location. This process requires expensive, research-grade instruments²⁴, resulting in limited monitoring data. Consequently, the development of exposure assessment methodologies for PNC is still in its early stages^25,26.

To determine the spatial variability of ambient concentrations, atmospheric dispersion models have been employed to evaluate the UFP exposure in major European cities¹⁵, with a particular focus on urban areas^27,28. However, the uncertainty associated with particle number emission inventories is significantly higher than that of commonly regulated pollutants^{15,29,30,31,32,33,34}. Additionally, the complex microphysics of number-based aerosol dynamics introduces substantial uncertainties and high computational costs^35,36. As an alternative, data-driven methods, e.g. land-use regression (LUR) models, have been utilized to assess PNC exposure. In Amsterdam, a measurement campaign collected PNC data from 50 homes over a one-week period each, marking the initial efforts to evaluate PNC exposure using LUR models³⁷. Inspired by the Measurements of UFP and Soot in two Cities (MUSiC) and the European Study of Cohorts for Air Pollution Effects (ESCAPE) projects, LUR models have increasingly been applied for small-scale PNC exposure assessment in European cities^38,39,40, as well as in North American cities^4,41,42 and in China⁴³.

Recently, machine learning (ML) methods have been proposed for assessing UFP exposure. Random forest (RF) machine learning models have been integrated with LUR to improve UFP exposure assessment^24,44. Additionally, LUR has been combined with Deep Convolutional Neural Network (CNN) models to generate high-resolution models of annual median ambient PNC using aerial images⁴. These studies have demonstrated that ML methods outperform traditional LUR models⁴⁴.

Most existing models primarily rely on geographic information (e.g. land use, road and traffic, and population density). However, they do not fully utilize the extensive data available on regulated pollutants (e.g. NOx, Ozone, PM_2.5, and PM₁₀) from both monitoring and air quality modeling sources. By leveraging this abundant data, emerging ML methods can more effectively capture the complex relationships between UFP and these regulated pollutants, potentially leading to significant improvements in the accuracy and performance of UFP exposure assessments.

This study presents the development of an innovative stacking-based ensemble machine learning framework that integrates data-driven and physical-chemical models to conduct a national-scale assessment of UFP exposure at high spatial (1 km) and temporal (1 h) resolutions, encompassing urban centers, urban clusters, and rural areas. The globally unique long-term standardized UFP measurements in Switzerland and the superior design of the model significantly enhanced its generalization capabilities. High spatial and temporal resolutions were achieved by leveraging the Copernicus Atmosphere Monitoring Service (CAMS) validated air quality reanalysis dataset, in conjunction with meteorological data from the ECMWF Reanalysis v5 (ERA5) and the traffic volume with 100 m resolution from the Open Transport Map (OTM)⁴². The high-resolution PNC obtained from the model was employed to comprehensively assess the national UFP exposure across different area types. It reveals a relatively high exposure to UFP with the national average of (9.3 ± 4.7) × 10³ particles‧cm⁻³. We highlighted a nonlinear relationship between the annual cumulative durations of high UFP exposure respectively based on the 24-h mean and the 1-h reference levels recommended by the WHO AQG, and disparities among various types of areas. This study promotes the adoption of innovative ML models as a cost-effective method for evaluating UFP exposure, utilizing available measurements of regulated pollutants. Furthermore, high-resolution UFP data allowed detailed exposure analyses, revealing crucial patterns and informing public health. It enhances our understanding of UFP exposure on a national scale and provides valuable insights for future UFP standards.

Results

Data-driven model for particle number concentration

A data-driven Stacking Technique for Ensemble Modeling of Particle Number Concentration (Stem-PNC) has been proposed to estimate PNC based on regulated pollutants (i.e. NOx, PM₁₀, PM_2.5, and Ozone), in conjunction with meteorological parameters (i.e. wind speed, temperature, radiation, relative humidity, and precipitation) and traffic information. Stem-PNC was trained using the globally unique, long-term standardized measurements of PNC concentrations, from the Swiss National Air Pollution Monitoring Network (NABEL). Since 2005, NABEL has measured the number concentration of particles with diameters from 5 nm to 3 μm⁴⁵. Stem-PNC was trained on hourly PNC measurements collected over four years (2016–2019, covering 78% of the data) and tested on data from 2020 (22%). 5% of the training dataset was allocated as a validation set for hyperparameter tuning, ensuring that the test dataset remained completely independent for the final evaluation. The model utilized pollutant datasets from NABEL, meteorological data (MeteoSwiss), traffic data, and temporal features. Both training and predictions were conducted on data collected from monitoring stations. Additional details on Stem-PNC and the training dataset are provided in the “Methods” section.

We applied Stem-PNC to estimate the particle number concentrations at the five NABEL stations for the year 2020. Despite the significant changes in PNC distributions due to the COVID-19 pandemic, which differed markedly from the 2016–2019 training dataset, Stem-PNC successfully predicted the weekly temporal trends of PNC at all five sites, as illustrated in Fig. 1a. Quantitative evaluation of the model performance is shown in Fig. 1b–e. The model accuracy improved with longer averaging periods, as indicated by the increase in the coefficient of determination (R²) from 0.85 for hourly averages to 0.92 for monthly averages. This suggests that Stem-PNC is well-suited for long-term exposure assessment. The strong agreement between Stem-PNC predictions and actual measurements highlights the robust generalization capabilities of the model.

**Fig. 1: Performance evaluation of the Stacking Technique for Ensemble Modeling of Particle Number Concentration (Stem-PNC).**

These capabilities are attributed to three synergistic factors: the extensive training datasets, the inclusion of regulated pollutants in the model, and the stacking technique. The large training dataset enhances the generalization ability by providing a diverse set of PNC distributions, reducing the likelihood of overfitting, and enabling the model to capture accurate underlying relationships between PNC and regulated pollutants. The stacking technique further boosts generalization by integrating the predictive strengths of multiple models, resulting in more accurate and reliable predictions of unseen data.

Figure 1f demonstrates that Stem-PNC outperforms a range of models, from simple linear models to more advanced Deep Learning models. Notably, despite its relatively lower complexity compared to the Deep Learning framework, the performance of the Stem-PNC stacking model is comparable with the Deep Learning model. This is evidenced by a higher R² (0.845), a lower Root Mean Square Error (RMSE = 4594), and a significantly reduced Mean Bias (124). However, deep learning models have higher hardware requirements, such as the need for A6000 GPUs, compared to the Stem-PNC model, which can operate effectively on less specialized hardware. These findings highlight the cost-efficiency and practical utility of the Stem-PNC stacking model, especially in resource-constrained environments. Supplementary Notes 2.2 and 2.7 provide detailed information about the hardware requirements and computational costs for each model.

As discussed in Supplementary Note 2.10, among the deep learning models, the CNN with sliding time windows achieved an R² of 0.848, slightly outperforming the Stem-PNC model (0.845). However, Stem-PNC maintained the lowest mean bias (123.5), likely due to the robustness of its ensemble approach. While these results suggest that deep learning models, e.g. CNN, did not significantly outperform the Stem-PNC model, this should be interpreted with caution. The current findings do not necessarily indicate that deep learning methods are inherently less effective; rather, the relatively small and spatially sparse PNC dataset may limit their full potential. With a larger and more spatially comprehensive dataset, the performance of deep learning models, particularly CNNs, may improve substantially, allowing them to better exploit complex spatial and temporal patterns in the future.

Generalization ability of Stem-PNC

In this study, we tried to improve the spatial performance mainly through three aspects. Firstly, we implemented a data pooling strategy by combining measurements from different stations during model training. This approach enhances the model’s ability to identify broader spatial patterns, thereby improving its generalizability across diverse locations. Secondly, we applied a random time sequence for training by shuffling the temporal order of the data. This allows the model to focus on capturing the relationship between PNC and the immediate input variables without being influenced by temporal patterns or trends. This approach reduces the risk of overfitting to temporal variations. Thirdly, we integrated data-driven and physical-chemical models to strengthen temporal and spatial performance by leveraging the strengths of both approaches. Physical-chemical models provide temporally and spatially consistent inputs that support robust generalization across different times and areas, while the data-driven model captures complex relationships between PNC and regulated pollutants, e.g. NOx, PM₁₀, PM_2.5, and Ozone.

To evaluate the spatial and temporal generalization capabilities of Stem-PNC, a five-fold station-holdout cross-validation method was employed. In this approach, data from each of the five monitoring stations were sequentially excluded from the training dataset, ensuring that each station’s data served as an independent test set in one of the folds. Consequently, the model was trained on the data from the remaining four stations with the data between 2016 and 2019 and then evaluated on the withheld station’s data from 2020. The detailed configurations of the cross-validations are shown in Table S10 in Supplementary Note 5.

By excluding one station during each training iteration, this method provides a rigorous test of the model’s spatial generalization capabilities, as the excluded station’s data has never been used in training and is reserved solely for evaluation purposes. This approach simulates real-world conditions by predicting PNC levels at an unseen location, ensuring that the model learns generalizable spatial patterns rather than overfitting to specific stations. This approach provides a robust evaluation of the model’s performance across various spatial domains. Further details are provided in Supplementary Note 4. This procedure facilitated a comprehensive assessment of the model’s generalization performance over both spatial and temporal dimensions. The process was repeated iteratively for each station, providing a thorough analysis of the Stem-PNC model’s capacity to generalize across varied spatial domains and temporal intervals.

Figure 2 illustrates that Stem-PNC demonstrated commendable performance on the independent test sets, corresponding to the data from stations sequentially excluded in each fold of the validation process. For the BER (urban, roadside) and LUG (urban, background) stations, the predictions generated by Stem-PNC exhibited strong concordance with the actual measurements, as evidenced by the scattering of data points along the 1:1 reference line. This alignment suggested that Stem-PNC possessed robust spatial and temporal generalization capabilities within urban environments, which are critical for assessing exposure to PNCs. Conversely, at the HAE (rural, next to motorway) and RIG (rural, altitude > 1000 m) stations, discrepancies were observed wherein Stem-PNC respectively underestimated and overestimated the PNCs when the data of these stations were excluded from the training dataset. These outcomes were anticipated due to distinct environmental characteristics: the proximity of HAE to a major motorway results in higher emission levels, while RIG, being in a rural and elevated locale, typically registers at the lower end of the PNC spectrum. Such extremities in PNCs at HAE and RIG represent the upper and lower bounds of particle number distributions observed across the network, likely mirroring the broader national context of Switzerland.

**Fig. 2: The cross-validation results at National Air Pollution Monitoring Network (NABEL) stations.**

In Supplementary Note 4, we provide scatter plots of weekly and daily measurements versus model predictions across various stations, allowing for a detailed assessment of the model’s performance at shorter time scales. We observe that discrepancies, including both underestimation and overestimation, are more pronounced at the HAE and RIG stations, with some predictions deviating beyond the 1:2 and 2:1 reference lines. These deviations are notably reduced when data is averaged over longer periods, such as monthly averages. This pattern suggests that spatially divergent locations, which may not be fully represented within the training dataset, could yield larger deviations in short-term predictions. Therefore, we recommend interpreting short-term results with caution in such areas, as longer-term averaged predictions provide more robust estimates. The limitations of spatial generalizability and their implications for divergent locations are further documented and quantified in Supplementary Note 4, where we evaluate prediction accuracy in relation to station-specific characteristics, highlighting how the model’s generalizability may be affected in areas with extreme features.

Further analysis revealed that Stem-PNC underestimated PNCs at BAS (suburban, background), located in a park within a suburban region. Figure 1 illustrates that PNC levels at BAS were low, nearing the lower boundary of the spectrum. Excluding data at BAS caused Stem-PNC to approximate from lower concentration levels, predominantly influenced by the rural station RIG, which led to the underestimation in suburban areas. Therefore, integrating data from BAS can substantially improve the spatial and temporal generalization capabilities of Stem-PNC, particularly for suburban and rural settings.

Overall, the monthly mean predictions generated by the Stem-PNC model consistently fell within a factor of two of the actual measurements, with R² values ranging from 0.76 to 0.91, indicating a notable degree of accuracy. These statistical metrics underscore the model’s efficacy in capturing the underlying patterns and dynamics governing UFP across diverse geographical locations and temporal periods.

Distribution of particle number concentration in Switzerland

For the implementation of the developed Stem-PNC model in UFP exposure assessment across Switzerland, it is crucial to obtain data on regulated pollutants (NOx, PM₁₀, PM_2.5, and Ozone) with high spatial and temporal resolutions. Such comprehensive data are typically produced by sophisticated air quality model systems. We utilized the Copernicus Atmosphere Monitoring Service (CAMS) validated air quality reanalysis dataset⁴⁶, which provides data at a spatial resolution of 0.1 degrees (approximately 10 km) and a temporal resolution of one hour. This dataset integrates extensive validated observational data from satellites, ground-based, and airborne measurement systems with advanced atmospheric models. The integration of observational data and model results has substantially enhanced the accuracy and reliability of the calculated spatial-temporal distributions of regulated pollutants.

To improve the spatial resolution of CAMS data from 10 km to 1 km, we developed a machine learning-based downscaling technique using Light Gradient Boosting Machine (lightGBM)^47,48 and Gradient Boosting Machine⁴⁹. Details about the downscaling technique can be found in “Methods” section. The Stem-PNC leveraged the downscaling outputs from CAMS as well as the meteorological data from ECMWF Reanalysis v5 (ERA5)⁵⁰ to estimate particle number concentrations across Switzerland. These estimates were achieved with high temporal (hourly) and spatial (1 km) resolutions.

In Fig. 3, the workflow outlines the four main components of training and spatial extrapolation with Stem-PNC. First, feature selection and target outputs are established using data from multiple sources. Stem-PNC’s input format is structured as N_time × M_Features, including meteorological, traffic, and temporal data, with PNC as the output. The model utilizes a two-layer stacking approach with a base model and a meta-model, employing cross-validation to assess generalization. For spatial extrapolation, the trained Stem-PNC model, using pre-trained weights, is combined with the CAMS Downscale Model to incorporate high-resolution CAMS data. By integrating this data with ERA5 meteorological, traffic, and temporal features as 2D inputs, Stem-PNC can estimate 2D PNC distributions across broader regions, capturing localized variations in particle concentrations.

**Fig. 3: Workflow for Training and Estimating.**

Figure 4a–d present the seasonal average distributions of particle number concentrations in Switzerland for 2020, estimated by the Stem-PNC model in conjunction with the CAMS and ERA5 dataset. The results generally indicate elevated PNCs in the northern regions and along major roadways, attributable to significant emissions of fine particles from vehicular traffic. Particle number concentrations were higher in January to March (JFB) and October to December (OND) as compared to April to June (AMJ) and July to September (JAS). The observed seasonal fluctuations in PNC align with those of other regulated pollutants, e.g. NOx. These variations are primarily attributed to reduced atmospheric mixing and the low height of the atmospheric boundary layer during colder seasons, which facilitates the accumulation of PNCs. Additionally, summer rainfall might be also a factor that influences the variation of PNC¹⁴.

**Fig. 4: Estimated seasonal distributions of particle number concentration (PNC) in Switzerland for 2020.**

Figure 4e compares the annual average PNC measurements from the five NABEL stations with the results from the Stem-PNC model. It is important to note that the CAMS and ERA5 datasets, which have relatively low resolutions, were used as input data. Therefore, the results reflect the performance of the entire estimation system. The site-specific and grid-average estimations were respectively derived using sub-grid (at the location of the measurement site) and grid-averaged (1 km by 1 km) traffic information. Both the site-specific and grid-average estimations closely align with the measurements. The largest deviations were observed at the HAE station. The difference between the site-specific estimation and the measurement reached a maximum of 1.7 × 10³ particles‧cm⁻³, representing an approximately 8% deviation from the measurements at the HAE station. The difference between the grid-average estimation and the measurement peaked at 4.4 × 10³ particles‧cm⁻³, corresponding to about a 20% deviation from the measurement. While the site-specific estimations outperformed the grid-average estimations, both of them underestimated the PNC, likely due to the localized high emissions from the adjacent motorways. The grid-average estimations reflected the averaged PNC within the grid, where most of the area was far from the motorways, thus resulting in lower PNCs. Overall, the Stem-PNC model, in conjunction with the CAMS and ERA5 datasets, provides PNC estimations with sufficient accuracy for exposure assessment.

Exposure assessment of ultrafine particles

Figure 5a, b represents the statistics of the gridded (1 km by 1 km) annual mean exposure assessment of UFPs in Switzerland. The population count generally exhibits a downward trend as PNC increases, as shown in Fig. 5a. However, it should be noted that approximately 20% (1.7 million) of the Swiss population is exposed to high UFP levels exceeding an annual mean of 10⁴ particles‧cm⁻³, which is considered as high exposure (24-h mean) suggested by WHO AQG²³. Three distinct peaks are observed below 2 × 10⁴ particles‧cm⁻³. The first peak, at the lower end of the spectrum close to 3000 particles‧cm⁻³, is primarily due to contributions from rural areas. The second peak, around 10⁴ particles‧cm⁻³, is attributed to urban centers and urban clusters. Notably, there is a minor peak in PNC between 1.5 × 10⁴ and 1.7 × 10⁴ particles‧cm⁻³, corresponding to regions encompassing major transportation hubs and road networks (in Supplementary Note 7.1), indicating that local increases in UFP are driven by these specific conditions.

Figure 5b summarizes the UFP exposure in urban centers, urban clusters, and rural areas. Approximately 2.12 million residents (24%) live in urban centers, 4.16 million (47%) in urban clusters, and 2.56 million (29%) in rural areas, as indicated by the cumulative population. Among those exposed to UFP levels exceeding 2 × 10⁴ particles‧cm⁻³, 55.8% are from urban centers, 38.3% from urban clusters, and 5.9% from rural areas. While there is a pronounced focus on UFP exposure risks in urban centers, attention should also be extended to certain high-UFP areas within urban clusters.

Considering that most activities of individuals occur within a specific spatial extent, evaluating community-level exposure may be more reasonable than focusing on a single grid. Community-level exposure was assessed using population-weighted PNCs of the grids (1 km by 1 km) within the corresponding community, as detailed in Supplementary Note 7.3. Grids with higher population densities exert greater influence on the community-level exposure assessment. As shown in Fig. 5c, approximately 4.6% of the total population lives in communities with annual mean UFP exposure levels above 2 × 10⁴ particles‧cm⁻³, and 31.2% experience community-level UFP exposure between 10⁴ and 2 × 10⁴ particles‧cm⁻³.

The violin plots in Fig. 5d depict the distributions of community-level annual mean UFP exposure in urban centers, urban clusters, and rural areas, highlighting the range and frequency of UFP exposure within each area type. There are 2686 communities: 93 in urban centers (3.5%), 803 in urban clusters (29.9%), and 1790 in rural areas (66.6%). The community-level UFP exposure in urban centers exhibits the widest range, with values from 6.1 × 10³ to 2.7 × 10⁴ particles‧cm⁻³, and a median of 1.2 × 10⁴ particles‧cm⁻³. Urban clusters display a mid-range distribution, with values from 1.3 × 10³ to 2.3 × 10⁴ particles‧cm⁻³, and a median of 7.8 × 10³ particles‧cm⁻³. Conversely, rural areas show a more compact distribution, with values from 1.5 × 10³ to 1.7 × 10⁴ particles‧cm⁻³, and a median of 4.4 × 10³ particles‧cm⁻³. Overall, community-level exposure ranges from (5.5 ± 2.3) × 10³ particles‧cm⁻³ in rural areas, (8.5 ± 3.1) × 10³ particles‧cm⁻³ in urban clusters, to (1.4 ± 0.5) × 10⁴ particles‧cm⁻³ in urban centers, with a national average of (9.3 ± 4.7) × 10³ particles‧cm⁻³.

We adopted the Coefficient of Variation (CV) to investigate the spatial heterogeneity of UFP and PM_2.5 within communities across different area types, as shown in Fig. 5e. The median CVs for UFP and PM_2.5 at the community level in Switzerland are 0.38 and 0.032, respectively, indicating that the spatial distribution of UFP is significantly more heterogeneous than that of PM_2.5. The median CVs for UFP are relatively consistent across different area types, ranging from 0.38 in urban centers and rural areas to 0.41 in urban clusters. In contrast, the heterogeneity of PM_2.5 varies more widely across area types, with median CVs of 0.026 in rural areas, 0.087 in urban clusters, and 0.12 in urban centers. This indicates that the spatial distribution of PM_2.5 is much more homogeneous in rural areas compared to urban clusters and urban centers. The average ratio between the CVs of UFP and PM_2.5 is 10.1 ± 12.2 among all the communities, ranging from 4.7 ± 4.2 in urban centers to 10.0 ± 11.5 in urban clusters, and 13.8 ± 15.1 in rural areas. These substantial differences in spatial heterogeneity suggest that monitoring strategies for UFP may need to differ from those currently used for mass-based measurements.

Evaluation of the high UFP exposure levels suggested by WHO

Community-level exposure to UFP was evaluated based on the 24-h mean and 1-h reference levels for high exposure as suggested by the recent WHO AQG²³, where high UFP exposure is considered to be more than 10⁴ particles‧cm⁻³ for a 24-h mean or 2 × 10⁴ particles‧cm⁻³ by hourly PNC.

We calculated the annual cumulative hours and days for each community that exceeded the WHO-recommended 1-h and 24-h mean reference levels for high PNC, as shown in Fig. 6a, b. Most communities fall below the 1-h high exposure level, with those exceeding it primarily located in urban centers, as depicted in Fig. 6a. The average duration exceeding the 1-h high exposure level is about 597 ± 1050 h per year. The results indicate that the temporal duration and spatial extent of communities exceeding the 24-h mean high exposure level (Fig. 6b) are significantly larger than those exceeding the hourly high exposure level (Fig. 6a), suggesting that it is more common to surpass the 24-h mean high exposure level. The average duration exceeding the 24-h mean high exposure level is 122 ± 118 days per year.

**Fig. 6: Distributions of the annual duration of communities exceeding high ultrafine particle (UFP) exposure levels recommended by the World Health Organization Air Quality Guidelines (WHO AQG).**

Figure 6c clearly illustrates the significant nonlinear relationship between the number of days when the PNC exceeds the WHO-recommended daily average and the number of hours when the PNC exceeds the hourly average. This indicates that the two WHO-recommended reference levels might not consistently reflect the same high UFP exposure patterns. The population with various durations of high exposure remains relatively stable based on the hourly level, as shown in the bottom panel of Fig. 6c. However, the durations display an increasing trend with longer durations based on the 24-h mean level, as indicated by the left panel of Fig. 6c. The nonlinear relationship between the two reference levels should be considered in future recommendations for high UFP exposure.

Figure 7a presents the distributions of the annual cumulative hours exceeding the hourly high exposure level of UFPs in the communities within urban centers, urban clusters, and rural areas. Regarding the lower end of the distribution, approximately 47.9% of the total population experiences hourly high UFP exposure for less than 24 h per year, similar to the fraction in urban clusters (49%). This fraction is significantly higher in rural areas, reaching 86.3% of the residents, due to the lack of emission sources such as road traffic. In contrast, only 13.4% of the residents in urban centers experience less than 24 h of high exposure annually. The average annual durations exceeding the 1-h high exposure level are about (1.59 ± 1.45) × 10³ h in urban centers, 318 ± 577 h in urban clusters, and only 59 ± 226 h in rural areas.

Fig. 7: Distributions of the exposed population fractions categorized by annual cumulative durations exceeding the 1-h and 24-h mean high ultrafine particle (UFP) exposure levels recommended by World Health Organization Air Quality Guidelines (WHO AQG).

Approximately 25.5% of the total population exceeds the hourly high exposure level for more than 720 h (1 month) annually, with this fraction rising to 62.4% in urban centers. The exposed total population, as well as those in urban clusters and rural areas, generally decreases as the cumulative duration of high exposure increases. Conversely, in urban centers, the exposed population displays an upward trend, reaching 21.9% for cumulative durations between 2880 and 5760 h (4–8 months) per year. No population experiences the hourly high exposure level for more than 5760 h annually. Figure 7b reveals a negative correlation between the exposure populations in urban centers and urban clusters, indicating that the hourly high exposure periods exceeding 2 × 10⁴ particles‧cm⁻³ are common in urban centers but rare in urban clusters.

As shown in Fig. 7c, when the 24-h mean high UFP exposure reference level is applied, the exposed total population rises as the cumulative duration of high exposure increases, peaking at 21.1% of the total population with more than 240 days of high exposure, which is opposite to the downward trend observed based on the hourly reference level (Fig. 7a). About 56% of the total population experiences high UFP exposure for over 60 days based on the 24-h mean reference level. Particularly in urban centers, the majority (81.7%) of the population experiences high exposure for over 120 days intervals, and the fraction in urban clusters is 36.9%. It indicates that about 80% of the residents in rural areas exceed the 24-h mean reference level for less than 30 days. The average annual durations exceeding the 24-h mean high exposure level are about 239 ± 104 days in urban centers, 102 ± 94 days in urban clusters, and only 27 ± 52 days in rural areas. A positive correlation between the exposure populations in urban centers and urban clusters has been observed as shown in Fig. 7d.

Discussion

The innovative stacking-based ensemble machine learning model, Stem-PNC, integrates data-driven and physical-chemical models to achieve comprehensive UFP exposure assessments with high accuracy and resolution. The Stem-PNC model, utilizing the stacking method, demonstrates significant advantages in air quality modeling due to its strong generalization capabilities and flexibility in model selection. Leveraging the unparalleled long-term standardized PNC measurements in Switzerland, along with the extensive dataset of the regulated pollutants and model results from CAMS, the generalization abilities of the model are significantly enhanced. The Stem-PNC model has achieved a high R² of 0.85 and a low mean bias of 124 when compared to measurements. The largest annual mean deviations were at the HAE station near motorways, with a maximum site-specific estimation difference of 1.7 × 10³ particles‧cm⁻³ (approximately 8% of the mean PNC). Notably, Stem-PNC demands only 4% of the computational time and energy consumption needed by deep learning models. These highlight its cost-efficiency and practical utility, underscoring its superior predictive and generalization abilities compared to other models.

By integrating machine learning downscaling techniques with CAMS grid data and ERA5 meteorological data, we achieved high-quality UFP estimations at a 1 km spatial resolution and hourly temporal resolution. The national-scale UFP exposure assessment has provided the first comprehensive picture of the exposure across various area types, i.e. urban centers, urban clusters, and rural areas. The analysis has revealed that approximately 20% (1.7 million) of the Swiss population is exposed to high UFP levels exceeding an annual mean of 10⁴ particles‧cm⁻³. The annual mean community-level exposure is as high as (1.4 ± 0.5) × 10⁴ particles‧cm⁻³ in urban centers, (8.5 ± 3.1) × 10³ particles‧cm⁻³ in urban clusters, and decreases to (5.5 ± 2.3) × 10³ particles‧cm⁻³ in rural areas. Notably, about 10% and 5.6% of the population in urban centers and urban clusters, respectively, about half a million residents in total, face an annual mean UFP exposure above 2 × 10⁴ particles‧cm⁻³. Therefore, UFP exposure risk should be considered for both urban centers and clusters.

A significant nonlinear relationship was identified between the number of days when the PNC exceeds the WHO-recommended daily average and the number of hours when the PNC exceeds the hourly average, following an exponential relation with an exponent of 0.35. The average durations exceeding the 1-h and 24-h mean high exposure levels are about 597 ± 1050 h and 122 ± 118 days per year. Additionally, disparities in the trends for high exposure durations among different types of areas were observed. There is a negative correlation between the exposure populations in urban centers and urban clusters based on the 1-h reference level (2 × 10⁴ particles‧cm⁻³), indicating that the hourly high exposure periods are common in urban centers but rare in urban clusters. Conversely, a positive correlation is observed between the exposure populations in urban centers and urban clusters when the 24-h mean reference level is applied, suggesting that the two reference levels might not consistently reflect the same high UFP exposure patterns.

This nonlinear relationship between the two WHO-recommended reference levels should be considered in future recommendations for high UFP exposure. The current WHO AQG suggested that the two reference levels for high PNC are interchangeable. However, the results here indicate that they are not equivalent, and both are necessary to manage UFP exposure effectively. One level aims to control daily exposure to prevent it from being excessively high, while the other focuses on controlling hourly exposure to prevent extraordinarily high peaks. The spatial heterogeneity of UFP is about 10 times larger than that of PM_2.5 on average, ranging from 4.7 ± 4.2 in urban centers to 10.0 ± 11.5 in urban clusters, and 13.8 ± 15.1 in rural areas, indicating that monitoring strategies for UFP should differ from those currently used for mass-based measurements.

The Stem-PNC model, trained using the long-term standardized PNC measurements in Switzerland, could be fine-tuned with a small dataset of UFP measurements from another region, and utilized for large-scale UFP exposure assessment in other countries, where similar emission standards are applied. The combination of temporal and spatial generalization, along with computational efficiency, positions Stem-PNC as a valuable tool in UFP modeling and epidemiological studies. Furthermore, the exposure conditions in Switzerland investigated in this study could serve as a reference, providing crucial insights for the development of future UFP standards.

Methods

Stacking technique for ensemble modeling of particle number concentration

The Stacking Technique^51,52 for Ensemble Modeling of Particle Number Concentration (Stem-PNC) employs the stacking technique, also known as stacked generalization, an ensemble learning approach that combines multiple prediction models to enhance the overall predictive performance. Stem-PNC consists of two model levels. The first level comprises four base learners: K-Nearest Neighbors (KNN), Decision Tree (Tree), Random Forest (RF), and Light Gradient Boosting Machine (LGB). These models are chosen for their ability to capture the nonlinear relationships between PNC, regulated pollutants, meteorological data, and traffic information. The second layer, known as the meta-learner, aggregates the predictions of the base learners by training a sub-learner. Stem-PNC utilizes a multilayer perceptron (MLP) neural network as the meta-learner.

$${{{\rm{P}}}}{{{{\rm{NC}}}}}_{{pred}}=f({{{\rm{NABEL}}}},{{{\rm{Meteo}}}},{{{\rm{Traffic}}}},{{{\rm{Time}}}};\Theta ),$$

(1)

where PNC_pred is the predicted PNC, derived from a machine learning model f such as Random Forest or LightGBM, where Θ represents the parameters of the model. The data used as features in this model include NABEL (NABEL data), Meteo (Meteorological Data), Traffic (Traffic Data), and Time (Time Data).

The stacked model offers enhanced generalization capabilities, enabling broad scenario application and the integration of diverse predictor types, thereby increasing the model’s robustness and stability. The stacking model used in this study is as follows:

$$\hat{y}={f}_{{MLP}}{\left\{{\left[{f}_{{KNN}}\left({{{\bf{X}}}}\right){f}_{{RF}}\left({{{\bf{X}}}}\right){f}_{{LGB}}\left({{{\bf{X}}}}\right){f}_{{Tree}}\left({{{\bf{X}}}}\right)\right]}_{{{{\rm{First\; Layer}}}}}\right\}}_{{{{\rm{Second\; Layer}}}}},$$

(2)

where X represents the input feature set and $\hat{y}$ denotes the target variable.

The PNC measurements are taken continuously with a high temporal resolution of 1 h at five different stations across Switzerland, Basel-Biningen (BAS, suburban, background), Bern-Bollwerk (BER, urban, roadside), Harkingen-A1 (HAE, rural, next to motorway), Lugano-University (LUG, urban, background), Rigi-Seebodenalp (RIG, rural, altitude > 1000 m). In addition to PNC, regulated pollutants (NOx, PM₁₀, PM_2.5, and Ozone) are also monitored at these NABEL sites. Stem-PNC was trained using hourly PNC measurements collected over a four-year period (2016–2019) at these five stations. Metrological data were obtained from MeteoSwiss. Details about the study area and the dataset are provided in Supplementary Note 1.

Stem-PNC demonstrates superior predictive and generalized performance compared to machine learning models and greater time efficiency than deep learning methods. By combining multiple base models into a meta-model, Stem-PNC enhances prediction accuracy and reduces overfitting. This ensemble approach integrates various strengths of different models, leading to more robust and reliable predictions. Additionally, the simpler, faster-to-train base models in Stem-PNC require less hyperparameter tuning and computational power than deep learning models, resulting in quicker predictions and efficient performance.

The performance of Stem-PNC was evaluated against five different types of data-driven machine learning methods: Linear Models, Support Vector Machines, Neighbor-based Models, Tree-based Models, and Deep Learning (DL) Models. Statistical metrics such as the R², M-Bias, RMSE, and Mean Absolute Error (MAE) were used for these comparisons. A smaller absolute value of M-Bias indicates better generalization capability compared to other models. Mean prediction metrics are calculated within each category, with standard deviation represented as error bars in Fig. 1f. Among the categories, Tree-based Models exhibit the highest standard deviation. To enhance clarity, some metrics with similar evaluation functions were omitted. Additional details and definitions of these statistical metrics, the structure of stacking, the criteria for categorizing models, the rationale for their division, hyper-parameterization, and more prediction results in each category are available in Supplementary Note 2. To further analyze the predictive model’s results, model interpretability was visualized using SHapley Additive exPlanations (SHAP^53,54) for Random Forests, as detailed in Supplementary Note 2.8.

We conducted additional experiments focusing on the most correlated features: NO_X, PM₁₀, and traffic. To evaluate their individual and combined predictive power, we trained models using each feature separately, in pairs, and with all three features together, comparing their performance to that of the comprehensive, full-feature Stem-PNC model. Our results indicate that, while models using only the most correlated features (NO_X, PM₁₀, and traffic) achieved reasonable accuracy, their predictive performance was notably lower than that of the full-feature Stem-PNC model.

Specifically, R² values ranged from 0.679 with NOx alone to 0.794 when combining NOx, PM₁₀, and traffic. Correspondingly, root mean square error (RMSE) increased considerably across the reduced-feature models, with RMSE values ranging from 5308 to 6623. Incorporating additional features (e.g. ozone) further improved model performance, increasing R² to 0.813 and reducing RMSE errors (RMSE = 5053). The full-feature Stem-PNC model achieved the highest accuracy, with R² reaching 0.845 and significantly lower RMSE (RMSE = 4594). This variation is primarily due to the unique characteristics of the RIG station, where the rural setting and high altitude (>1000 m) introduce additional complexities for models with fewer features. A detailed analysis is provided in Supplementary Note 2.9.

These findings illustrate the limitations of models trained solely on highly correlated features and underscore the advantage of incorporating a broader range of input variables. Each additional feature contributes incrementally to the model’s predictive accuracy.

Score-metrics to generalization ability of Stem-PNC

For evaluating prediction results, certain metrics such as the Coefficient of Determination (R²) and Explained Variance Score (EVS) naturally range from 0 to 1. However, metrics like Mean Bias (M-Bias), Root Mean Square Error (RMSE), Mean Square Error (MSE), and Mean Absolute Error (MAE) vary over a broader range and are more challenging to compare directly due to their disparate magnitudes. To facilitate comparisons across these metrics, the Minmax scalar is employed for normalization. For each metric x (such as MAE, MSE, RMSE, M-Bias), normalization is calculated using the formula:

$${{{\rm{Score}}}}{-}{{{\rm{Metrics}}}}=1-\frac{x-{x}_{\min }}{{x}_{\max }-{x}_{\min }},$$

(3)

Here, x_min and x_max are the minimum and maximum values of the metric in the dataset, respectively. This formula ensures that a lower original metric value translates into a higher normalized score, reflecting better performance.

Downscaling CAMS data: methodology and formulas

Although the spatial resolution of CAMS data is adequate for broad-scale assessments, it necessitates refinement to fulfill the requirements of detailed PNC exposure analysis in Switzerland, particularly given the significant spatial variability of PNC. The formulas for calculating prediction features and the rationale for selecting models for high-resolution CAMS predictions are detailed in Supplementary Notes 6.1 and 6.2, respectively.

This refinement was achieved by utilizing long-term observational data of regulated pollutants collected from 16 NABEL stations across Switzerland. These stations include Basel-Binningen (BAS [Suburban]), Bern-Bollwerk (BER [Urban, Traffic]), Beromünster (BRM [Rural, Altitude < 1000 m]), Chaumont (CHA [Rural, Altitude > 1000 m]), Davos-Seehornwald (DAV [Rural, Altitude > 1000 m]), Dübendorf-Empa (DUE [Suburban]), Härkingen-A1 (HAE [Rural, Motorway]), Jungfraujoch (JUN [High Alpine]), Lausanne-César-Roux (LAU [Urban, Traffic]), Lugano-Università (LUG [Urban]), Magadino-Cadenazzo (MAG [Rural, Altitude < 1000 m]), Payerne (PAY [Rural, Altitude < 1000 m]), Rigi-Seebodenalp (RIG [Rural, Altitude > 1000 m]), Sion-Aéroport-A9 (SIO [Rural, Motorway]), Tänikon (TAE [Rural, Altitude < 1000 m]), and Zürich-Kaserne (ZUE [Urban]). Similar to the approach in Stem-PNC, meteorological data, date, and traffic information were also integrated into the downscaling method. The high-resolution CAMS prediction model was trained using data from 2016 to 2019 and used to predict the data for 2020.

By downscaling CAMS data, higher-resolution pollutant distributions can be obtained. These distributions are influenced by various factors including meteorological conditions, human activities, and topography⁵⁵, all of which are crucial for improving the reliability and accuracy of model predictions. To enhance data richness and correct discrepancies, the ratio of pollutant data from NABEL to CAMS, referred to as Measurements in Eq. (4), is utilized initially for data correction.

To further enrich the feature set, meteorological, traffic, and time-related data are incorporated. Fusing multiple datasets mitigates the limitations inherent in relying on a single data source, thereby ensuring more continuous and comprehensive coverage. Machine learning methods are employed to discern complex patterns and relationships among these factors, enhancing our understanding of how they collectively influence pollutant distribution.

Moreover, machine learning enhances the flexibility and adaptability of model predictions, particularly when applied to different datasets. This adaptability ensures that once a model is trained and validated, it can effectively predict air quality in different regions, even those without extensive monitoring networks. This capability is vital for maintaining good generalization performance across varied geographical locations.

$${{{\rm{Measurements}}}}=\frac{{{{\rm{pollutio}}}}{{{{\rm{n}}}}}_{{NABEL}}}{{{{\rm{pollutio}}}}{{{{\rm{n}}}}}_{{CAMS}}}\propto f({{{\rm{pollutio}}}}{{{{\rm{n}}}}}_{{CAMS}},{{{\rm{Meteo}}}},{{{\rm{Traffic}}}},{{{\rm{Time}}}}),$$

(4)

The predicted Measurements were then multiplied by the linearly interpolated pollution data to obtain high-resolution pollutant distributions in Eq. (5). This process enhances the accuracy of the scaled-up data, ensuring that the finer details of pollution patterns are captured effectively.

$${{{\rm{pollutio}}}}{{{{\rm{n}}}}}_{{High}{-}{res\; pollution}}={{{\rm{pollutio}}}}{{{{\rm{n}}}}}_{{CAMS}}\times {{{{\rm{Measurements}}}}}_{{pred}}$$

(5)

PNC estimation: methodology and formulas

The Stem-PNC model, integrated with machine learning-based downscaling of CAMS data, is utilized to estimate PNC. Initially, the Stem-PNC model is trained specifically for time-series predictions. Following this, the CAMS downscaling process produces distributions of regulated pollutants at a high spatial resolution of 1 km$\times$1 km. These high-resolution outputs are then merged with meteorological, traffic, and time data to derive a comprehensive high-resolution PNC using Eq. (6).

This multi-source integration significantly improves the model’s accuracy in estimating spatial and temporal variations in PNC. Additionally, this approach allows for the application of the trained Stem-PNC model to accurately estimate the spatial and temporal distributions of PNC. As a result, it enables precise predictions of PNC across various regions and times, thereby enhancing the model’s utility and effectiveness in real-world scenarios.

$${{{\rm{P}}}}{{{{\rm{NC}}}}}_{{High}-{res}}\propto f({{{\rm{pollutio}}}}{{{{\rm{n}}}}}_{{High}-{res}},{{{\rm{Meteo}}}},{{{\rm{Traffic}}}},{{{\rm{Time}}}})$$

(6)

In this study, we focus on exposure assessment, selecting a spatial resolution of 1 km to align with the available population distribution data. The estimated PNC values represent only the average PNC within each grid cell. As illustrated by the sub-grid and grid-average results in Fig. 4e, the 1 km resolution may not capture fine-scale variations near high-emission sources, e.g. the motorways. Supplementary Fig. S20 in Supplementary Note 6.6 indicates that while grid averages effectively represent the median values of PNC distributions, closely matching sub-grid results and measurements, they may miss high hourly PNC values. Future studies would benefit from improved spatial resolution of PNC estimates. However, finer spatial scales can increase measurement errors and noise^56,57, necessitating denser monitoring networks and larger PNC observation datasets for model training, which are currently unavailable. Additionally, more detailed population distribution data, or movement data, would be required for accurate exposure assessment at finer spatial estimations.

Population data and area types

The population data utilized in this analysis is sourced from the Gridded Population of the World (GPW), v4⁵⁸, specifically from the year 2020 and at a resolution of 2.5 arc-minutes (~1 km at the equator). The categorization of area types is based on the Eurostat methodology—Applying the Degree of Urbanisation⁵⁹. According to this classification, areas are divided into three types: Urban Centers, characterized by a population density of at least 1500 inhabitants per km²; Urban Clusters, with a population density of at least 300 inhabitants per km²; and Rural Areas, where the population density falls below 300 inhabitants per km². Further details regarding the estimated PNC and population can be found in Supplementary Notes 7 and 8.

In this study, we mainly focused on long-term exposure, and we used annual population data due to minimal changes in Switzerland^60,61. In contrast, developing countries experience more dynamic shifts, including immigration. Regarding short-term exposure, future work could consider combining dynamic population mapping^62,63,64,65 with hourly PNC estimations to improve exposure assessments at finer temporal resolutions. Although this study uses static population density for time-averaged exposure, the model’s temporal accuracy suggests that integrating dynamic data could better capture exposure during peak hours, allowing for more precise evaluations in regions with significant population changes. This integration would require additional data sources, such as cell phone-based position data, which may become available in the future. For more detailed explanations, see Supplementary Note 7.5.

Data availability

A complete example of a small dataset illustrating the steps for PNC estimation and the trained model applied in this study have been deposited in the Zenodo database under accession code https://doi.org/10.5281/zenodo.14630763 [https://doi.org/10.5281/zenodo.14634738]⁶⁶. The data that support the findings of this study are available as follows: the National Air Pollution Monitoring Network (NABEL) data can be available at: [https://www.bafu.admin.ch/bafu/en/home/topics/air/state/data/data-query-nabel.html]. The Copernicus Atmosphere Monitoring Service (CAMS) data⁴⁶ can be available at: [https://ads.atmosphere.copernicus.eu/cdsapp#!/dataset/cams-europe-air-quality-forecasts?tab=form]. The Copernicus Climate Data (ERA5-Land Hourly Data, Meteo data⁵⁰) can be available at: [https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=overview]. And the traffic data are from Open Transport Map which can be available at: [https://hub.plan4all.eu/otm] and for download at: [https://opentransportmap.info/download]. And the population data can be available at: [https://sedac.ciesin.columbia.edu/data/collection/gpw-v4]⁵⁸.

Code availability

All calculation and plotting codes⁶⁷ in this study are in https://github.com/Jyyd/PNC_Estimate. https://doi.org/10.5281/zenodo.14554168. In particular, the code for CAMS downscaling is derived from https://github.com/zxiaole/pncEstimator.

References

Cohen, A. J. et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: an analysis of data from the Global Burden of Diseases Study 2015. Lancet 389, 1907–1918 (2017).
Article PubMed PubMed Central MATH Google Scholar
Burnett, R. et al. Global estimates of mortality associated with long-term exposure to outdoor fine particulate matter. Proc. Natl Acad. Sci. USA 115, 9592–9597 (2018).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Burnett, R. T. et al. An integrated risk function for estimating the global burden of disease attributable to ambient fine particulate matter exposure. Environ. Health Perspect. 122, 397–403 (2014).
Article ADS PubMed PubMed Central MATH Google Scholar
Lloyd, M. et al. Predicting spatial variations in annual average outdoor ultrafine particle concentrations in Montreal and Toronto, Canada: Integrating land use regression and deep learning models. Environ. Int. 178, 108106 (2023).
Article CAS PubMed MATH Google Scholar
Zhang, X. et al. Ecological study on global health effects due to source-specific ambient fine particulate matter exposure. Environ. Sci. Technol. 57, 1278–1291 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Donaldson, K., Li, X. Y. & MacNee, W. Ultrafine (nanometre) particle mediated lung injury. J. Aerosol. Sci. 29, 553–560 (1998).
Article ADS CAS Google Scholar
Oberdörster, G. Pulmonary effects of inhaled ultrafine particles. Int. Arch. Occup. Environ. Health 74, 1–8 (2000).
Article MATH Google Scholar
Bruinink, A., Wang, J. & Wick, P. Effect of particle agglomeration in nanotoxicology. Arch. Toxicol. 89, 659–675 (2015).
Article CAS PubMed MATH Google Scholar
Kendall, M. & Holgate, S. Health impact and toxicological effects of nanomaterials in the lung. Respirology 17, 743–758 (2012).
Article PubMed MATH Google Scholar
Nel, A. E. et al. Understanding biophysicochemical interactions at the nano–bio interface. Nat. Mater. 8, 543–557 (2009).
Article ADS CAS PubMed MATH Google Scholar
Rahim, M. F., Pal, D. & Ariya, P. A. Physicochemical studies of aerosols at Montreal Trudeau Airport: the importance of airborne nanoparticles containing metal contaminants. Environ. Pollut. 246, 734–744 (2019).
Article CAS PubMed Google Scholar
Gao, H. et al. Quantifying respiratory tract deposition of airborne graphene nanoplatelets: the impact of plate-like shape and folded structure. NanoImpact 21, 100292 (2021).
Article CAS PubMed Google Scholar
Hopke, P. K., Feng, Y. & Dai, Q. Source apportionment of particle number concentrations: a global review. Sci. Total Environ. 819, 153104 (2022).
Article CAS PubMed MATH Google Scholar
Zhu, Y. et al. Airborne particle number concentrations in China: a critical review. Environ. Pollut. 307, 119470 (2022).
Article CAS PubMed Google Scholar
Karl, M. et al. Modeling and measurements of urban aerosol processes on the neighborhood scale in Rotterdam, Oslo and Helsinki. Atmos. Chem. Phys. 16, 4817–4835 (2016).
Article ADS CAS MATH Google Scholar
Zhang, X., Chen, X. & Wang, J. A number-based inventory of size-resolved black carbon particle emissions by global civil aviation. Nat. Commun. 10, 534 (2019).
Article ADS PubMed PubMed Central MATH Google Scholar
Zhang, X., Karl, M., Zhang, L. & Wang, J. Influence of aviation emission on the particle number concentration near Zurich Airport. Environ. Sci. Technol. 54, 14161–14171 (2020).
Article ADS CAS PubMed Google Scholar
Breitner, S. et al. Sub-micrometer particulate air pollution and cardiovascular mortality in Beijing, China. Sci. Total Environ. 409, 5196–5204 (2011).
Article ADS CAS PubMed MATH Google Scholar
Leitte, A. M. et al. Associations between size-segregated particle number concentrations and respiratory mortality in Beijing, China. Int. J. Environ. Health Res. 22, 119–133 (2012).
Article CAS PubMed MATH Google Scholar
Meng, X. et al. Size-fractionated particle number concentrations and daily mortality in a Chinese city. Environ. Health Perspect. 121, 1174–1178 (2013).
Article PubMed PubMed Central MATH Google Scholar
Sokhi, R. S. et al. Advances in air quality research – current and emerging challenges. Atmos. Chem. Phys. 22, 4615–4703 (2022).
Article ADS CAS MATH Google Scholar
Ostro, B. et al. Associations of mortality with long-term exposures to fine and ultrafine particles, species and sources: results from the California Teachers Study Cohort. Environ. Health Perspect. 123, 549–556 (2015).
Article PubMed PubMed Central MATH Google Scholar
World Health Organization, others. WHO Global Air Quality Guidelines: Particulate Matter (PM.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide (World Health Organization, 2021).
Rahman, M. M., Karunasinghe, J., Clifford, S., Knibbs, L. D. & Morawska, L. New insights into the spatial distribution of particle number concentrations by applying non-parametric land use regression modelling. Sci. Total Environ. 702, 134708 (2020).
Article CAS PubMed Google Scholar
Buonanno, G., Stabile, L. & Morawska, L. Personal exposure to ultrafine particles: the influence of time-activity patterns. Sci. Total Environ. 468-469, 903–907 (2014).
Article ADS CAS PubMed MATH Google Scholar
Kumar, P. et al. Ultrafine particles in cities. Environ. Int. 66, 1–10 (2014).
Article CAS PubMed MATH Google Scholar
Zhong, J., Harrison, R. M., James Bloss, W., Visschedijk, A. J. H. & Denier van der Gon, H. A. C. Modelling the dispersion of particle number concentrations in the West Midlands, UK using the ADMS-Urban model. Environ. Int. 181, 108273 (2023).
Article PubMed Google Scholar
Karl, M. et al. Measurement and modeling of ship-related ultrafine particles and secondary organic aerosols in a Mediterranean Port City. Toxics 11, 771 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Feng, X., Zhang, X., He, C. & Wang, J. Contributions of traffic and industrial emission reductions to the air quality improvement after the lockdown of Wuhan and neighboring cities due to COVID-19. Toxics 9, 358 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Feng, X. et al. A hybrid model for enhanced forecasting of PM2.5 spatiotemporal concentrations with high resolution and accuracy. Environ. Pollut. 355, 124263 (2024).
Article CAS PubMed MATH Google Scholar
Feng, X., Zhang, X. & Wang, J. Update of SO2 emission inventory in the Megacity of Chongqing, China by inverse modeling. Atmos. Environ. 294, 119519 (2023).
Article CAS MATH Google Scholar
Zhang, J. et al. Developing a high-resolution emission inventory of China’s aviation sector using real-world flight trajectory data. Environ. Sci. Technol. 56, 5743–5752 (2022).
Article ADS CAS PubMed MATH Google Scholar
Zhang, X. et al. Dynamic harmonization of source-oriented and receptor models for source apportionment. Sci. Total Environ. 859, 160312 (2023).
Article CAS PubMed Google Scholar
Zhang, C. et al. Mitigation effects of alternative aviation fuels on non-volatile particulate matter emissions from aircraft gas turbine engines: a review. Sci. Total Environ. 820, 153233 (2022).
Article CAS PubMed Google Scholar
Karl, M. et al. Description and evaluation of the community aerosol dynamics model MAFOR v2.0. Geosci. Model Dev. 15, 3969–4026 (2022).
Article ADS CAS MATH Google Scholar
Pan, Z. et al. High fidelity simulation of ultrafine PM filtration by multiscale fibrous media characterized by a combination of X-ray CT and FIB-SEM. J. Membr. Sci. 620, 118925 (2021).
Article CAS Google Scholar
Hoek, G. et al. Land use regression model for ultrafine particles in Amsterdam. Environ. Sci. Technol. 45, 622–628 (2011).
Article ADS CAS PubMed MATH Google Scholar
Cattani, G. et al. Development of land-use regression models for exposure assessment to ultrafine particles in Rome, Italy. Atmos. Environ. 156, 52–60 (2017).
Article ADS CAS MATH Google Scholar
Eeftens, M. et al. Development of land use regression models for nitrogen dioxide, ultrafine particles, lung deposited surface area, and four other markers of particulate matter pollution in the Swiss SAPALDIA regions. Environ. Health 15, 53 (2016).
Article PubMed PubMed Central Google Scholar
van Nunen, E. et al. Land use regression models for ultrafine particles in six European areas. Environ. Sci. Technol. 51, 3336–3345 (2017).
Article ADS PubMed PubMed Central MATH Google Scholar
Zwack, L. M., Hanna, S. R., Spengler, J. D. & Levy, J. I. Using advanced dispersion models and mobile monitoring to characterize spatial patterns of ultrafine particles in an urban area. Atmos. Environ. 45, 4822–4829 (2011).
Article ADS CAS MATH Google Scholar
Weichenthal, S. et al. A land use regression model for ambient ultrafine particles in Montreal, Canada: a comparison of linear regression and a machine learning approach. Environ. Res. 146, 65–72 (2016).
Article CAS PubMed MATH Google Scholar
Ge, Y. et al. High spatial resolution land-use regression model for urban ultrafine particle exposure assessment in Shanghai, China. Sci. Total Environ. 816, 151633 (2022).
Article CAS PubMed Google Scholar
Amini, H. et al. Harnessing AI to unmask Copenhagen’s invisible air pollutants: a study on three ultrafine particle metrics. Environ. Pollut. 346, 123664 (2024).
Article CAS PubMed Google Scholar
Hueglin, C., Bugmann, S., Herich, H. & Mueller, M. Trend and spatial variability of ambient ultrafine particle concentration in Switzerland In 19th ETH Nanoparticles Conference (NPC) (ETH Zurich, 2015).
Inness, A. et al. The CAMS reanalysis of atmospheric composition. Atmos. Chem. Phys. 19, 3515–3556 (2019).
Article ADS CAS MATH Google Scholar
Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Neural Information Processing Systems (Association for Computing Machinery, 2017).
Yan, J. et al. LightGBM: accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 22, 271 (2021).
Article PubMed PubMed Central Google Scholar
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Article MathSciNet MATH Google Scholar
Muñoz-Sabater, J. ERA5-Land Hourly data from 1950 to Present https://doi.org/10.24381/cds.e2161bac (2019).
Wolpert, D. H. Stacked generalization. Neural Netw. 5, 241–259 (1992).
Article MATH Google Scholar
Breiman, L. Stacked regressions. Mach. Learn. 24, 49–64 (1996).
Article MATH Google Scholar
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article PubMed PubMed Central MATH Google Scholar
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, Inc., 2017).
Liu, Y., Zhou, Y. & Lu, J. Exploring the relationship between air pollution and meteorological conditions in China under environmental governance. Sci. Rep. 10, 14518 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Jiang, Z. A survey on spatial prediction methods. IEEE Trans. Knowl. Data Eng. 31, 1645–1664 (2019).
Article MATH Google Scholar
Saha, P. K., Hankey, S., Marshall, J. D., Robinson, A. L. & Presto, A. A. High-spatial-resolution estimates of ultrafine particle concentrations across the continental united states. Environ. Sci. Technol. 55, 10320–10331 (2021).
Article ADS CAS PubMed Google Scholar
Center for International Earth Science Information Network C. C. U. Gridded Population of the World, Version 4 (GPWv4): Population Density Adjusted to Match 2015 Revision UN WPP Country Totals, Revision 11 https://doi.org/10.7927/H4F47M65 (NASA Socioeconomic Data and Applications Center, 2018).
European Union F, UN-Habitat, OECD, The World Bank. Applying the Degree of Urbanisation (World Bank Group, 2021).
Federal Statistical Office. Demographic Balance by Canton - 1971-2023. Federal Statistical Office https://www.bfs.admin.ch/asset/en/32208093 (2024).
Newsham, N. & Rowe, F. Understanding trajectories of population decline across rural and urban Europe: a sequence analysis. Popul. Space Place 29, e2630 (2023).
Article MATH Google Scholar
Freire S. Modeling of spatiotemporal distribution of urban population at high resolution – value for risk assessment and emergency management. In Geographic Information and Cartography for Risk and Crisis Management: Towards Better Solutions (eds Konecny, M., Zlatanova, S. & Bandrova, T. L.) (Springer, 2010).
Kobayashi, T., Medina, R. M. & Cova, T. J. Visualizing diurnal population change in urban areas for emergency management. Prof. Geogr. 63, 113–130 (2011).
Article PubMed MATH Google Scholar
Sun, Z. et al. Heat exposure assessment based on high-resolution spatio-temporal data of population dynamics and temperature variations. J. Environ. Manag. 349, 119576 (2024).
Article Google Scholar
United Nations Department of Economic and Social Affairs. World Population Prospects 2024: Summary of Results (DESA Publications, 2024).
Jianyao, Y. & Zhang, X. Machine learning-enhanced high-resolution exposure assessment of ultra fine particles (Mini Dataset + Stem-PNC weights). Zenodo https://doi.org/10.5281/zenodo.14634738 (2025).
Jianyao, Y. & Zhang, X. Machine learning-enhanced high-resolution exposure assessment of ultra fine particles. Zenodo https://doi.org/10.5281/zenodo.14554168 (2024).

Download references

Acknowledgements

This work is partially supported by the National Key Research and Development Program of China (No.2023YFC3008803 and No.2023YFC3008805) to X.Z. and Y.J., the Excellent Young Scientists Fund Program (Overseas) to X.Z., the Independent Research Project of the China Institute for Radiation Protection (No. ZFYHHJMN-2023001) to X.Z., and MASSEV (Monitoring, Mapping and Modelling Air Quality for Sustainable Environments in Cities and Communities) (40CF40_221599) to J.W. We acknowledge the Swiss National Air Pollution Monitoring Network (NABEL) for making the particle number concentration dataset publicly available.

Author information

These authors contributed equally: Yudie Jianyao, Hongyong Yuan.

Authors and Affiliations

School of Safety Science, Tsinghua University, Beijing, China
Yudie Jianyao, Hongyong Yuan, Guofeng Su, Wenguo Weng & Xiaole Zhang
Institute of Public Safety Research, Tsinghua University, Beijing, China
Yudie Jianyao, Hongyong Yuan, Guofeng Su, Wenguo Weng & Xiaole Zhang
Institute of Environmental Engineering (IfU), ETH Zürich, Zürich, Switzerland
Jing Wang
Laboratory for Advanced Analytical Technologies, Empa, Dübendorf, Switzerland
Jing Wang

Authors

Yudie Jianyao
View author publications
Search author on:PubMed Google Scholar
Hongyong Yuan
View author publications
Search author on:PubMed Google Scholar
Guofeng Su
View author publications
Search author on:PubMed Google Scholar
Jing Wang
View author publications
Search author on:PubMed Google Scholar
Wenguo Weng
View author publications
Search author on:PubMed Google Scholar
Xiaole Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

X.Z. conceived the study. X.Z., H.Y., and Y.J. designed the development procedures. X.Z., Y.J., and J.W. collected and processed the raw particle number concentration data. Y.J. and X.Z. developed the Stem-PNC code. X.Z., Y.J., J.W., and H.Y. analyzed the results. X.Z., Y.J., J.W., G.S., W.W., and H.Y. discussed the data and methods during the development. Y.J., X.Z., and H.Y. wrote the paper. All the authors reviewed and edited the paper.

Corresponding author

Correspondence to Xiaole Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Nicolas Moussiopoulos and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Jianyao, Y., Yuan, H., Su, G. et al. Machine learning-enhanced high-resolution exposure assessment of ultrafine particles. Nat Commun 16, 1209 (2025). https://doi.org/10.1038/s41467-025-56581-8

Download citation

Received: 14 July 2024
Accepted: 20 January 2025
Published: 31 January 2025
Version of record: 31 January 2025
DOI: https://doi.org/10.1038/s41467-025-56581-8