Introduction

Aerosol particles have a significant impact on Earth’s radiation balance due to their interactions with solar radiation and clouds. Particles’ ability to scatter and absorb radiation, known as the aerosol direct effect, is influenced by their mixing state – how different aerosol types are distributed within the population1,2. This mixing state can range from external mixing (single species) to internal mixing (mixture of species). Newly emitted aerosols usually have external mixing, while aging processes lead to internal mixing. Aerosol particles consist of diverse organic and inorganic components, showing significant variability in composition and abundance across time and space. Previous studies emphasize the importance of mixing state in understanding aerosols’ optical properties (AOPs)1,3,4,5. For example, studies demonstrate a greater positive forcing for internally mixed black carbon aerosols under the assumption of core-shell mixing in contrast to homogeneous volume-mixing and external mixing scenarios2,6,7.

Accurately modeling aerosol populations and predicting their impact on air quality, weather, and climate has long been a major challenge. Despite a good understanding of the underlying physics, resolving many small-scale processes, especially within atmospheric models, remains difficult. Precise quantification of AOPs, including mass extinction coefficient (ke), single-scattering albedo (ω), and asymmetry factor (g) are crucial for improving the forecasting capabilities of the atmospheric composition models (ACMs).

Accurate representation of AOPs of internally mixed particles remains a significant challenge in ACMs1,8. Currently, many ACMs use large database of pre-calculated AOPs, in the form of look-up tables9,10,11,12,13,14,15. These AOPs are often archived using Mie calculations for a discrete set of chemical and micro-physical attributes (such as particle size and refractive index)8,16. But aerosol size and composition have large spatial and temporal variability in model simulations. An interpolation is inevitable whenever AOPs in the model are queried for a set of input parameters different from the archived values. The interpolation may lead to non-trivial errors due to non-linearity. Such errors can be reduced by adding more parameters to the interpolation. For example,12 implemented polynomial fits to account for the variability of median diameter of log-normal modes during atmospheric transport (due to faster sedimentation of large particles). The database, however, grows larger as the number of parameters increases, making the offline AOPs less convenient for use with diverse applications14,17.

Due to the significant computational burden, online calculation of AOPs using Mie code is only feasible for specific applications and impractical for real-time use in ACMs15. To tackle this issue, several attempts have been made for the online calculation of AOPs that often parameterize the Mie calculations for variable aerosol size and composition10,18,19,20. Yet, these methods are also subjected to large uncertainties and errors stemming from underlying assumptions and interpolations. This highlights the immediate demand for accurate and computationally efficient tools for online calculaton of AOPs consistent with the aerosol chemical and microphysical characteristics in ACMs14.

In recent years, the application of Machine Learning (ML) and, more specifically, Deep Learning (DL), has garnered significant prominence within the domain of weather and climate research. This prominence is reflected in its diverse applications, spanning across various aspects including weather prediction21,22, the refinement of numerical model outputs through post-processing23, and even the substitution of pivotal model physics24 and parameterizations25,26. The methodologies employed encompass a variety of techniques, spanning from emulation27 to the resolution of partial differential equations (PDEs) via widely adopted ML algorithms28. ML has undergone significant advancements in recent years, particularly after 2010, as a result of the development of effective techniques for training a neural network (NN) of considerable size. NNs excel in learning knowledge representation in very high-dimensional spaces; in forms of connecting weights in between neurons of the networks. The organization of the networks or the network architecture is thus a mapping of the knowledge space of various domains. As demonstrated in recent studies, it is feasible to predict the optical properties of aerosol particles by means of a NN, rather than solving Maxwell’s equation as in Mie calculations14,17,29.

Therefore, the objective of this study is to develop a NN based emulator to replace the current aerosol optics parameterization for internally mixed aerosols used in ACMs such as ICON-ART. We present a multi-layer fully connected feed-forward NN to derive optical properties for spherical particles covering a large size range accurately and efficiently; thereby meeting the emergent requirements in both remote sensing and atmospheric modeling of aerosol particles. This study builds upon prior endeavors that employed ML techniques for emulating aerosol optics and radiative transfer modeling14,15,17,24,29,30,31,32,33. The overarching objective here is to devise an approach capable of robust generalization as the existing literature lacks the discussion on the challenges of utilizing a neural network-based approach in real world applications.

Results

MieAI training and testing

In this study, we use a MLP with multiple hidden layers (named MieAI) to emulate the calculation of AOPs using Mie theory. MieAI considers the mixing state of particles by incorporating inputs such as size parameter, shell thickness, and RI of both core and shell. It then outputs three AOPs, including Qext, Qsca, and g. ω is calculated from Qext and Qsca using Eq. (10).

MieAI model, selected after hyper-parameter tuning, is trained for 5000 epochs until the loss function is minimized, resulting in an optimized network. The NN was trained on 500,000 training data samples whereas verification was done on 100,000 test samples randomly chosen from 600,000 Mie samples. Supplementary Fig. 3 provides a comprehensive visualization of the dynamic evolution of two crucial loss metrics, the MSE and the MAPE, throughout the training process of our NN model. These loss metrics are pivotal for assessing the model’s performance, particularly in its capacity to accurately approximate AOPs. The observed trends in this figure offer profound insights into the model’s convergence and its effectiveness in learning from the training data. Notably, the continuous and monotonic decrease in validation losses, both in terms of MSE and MAPE, serves as a strong indicator of the model’s robust fitting to the data. This persistent reduction in validation losses underscores the model’s consistent improvement in its ability to predict AOPs accurately as the training progresses. Such a trend is highly promising, as it demonstrates the model’s capacity to continually refine its representations and effectively grasp the intricate relationships inherent in the AOP data.

It’s important to highlight that our model’s training incorporates an early stopping mechanism, with a patience parameter set at 50. This strategy ensures that the model training halts at the 2548th epoch, optimizing the MieAI model with a validation MSE of 0.01187. This early stopping mechanism is a prudent approach to prevent overfitting and ensure that the model generalizes well to unseen data.

To verify the optimized network, we evaluated its performance by comparing its AOP predictions against the true AOP values estimated using Mie calculations and the R2 values for different AOPs modeled in this study are shown in Fig. 1a–c. As depicted in the figure, our trained NN model demonstrates a commendable ability to model AOPs effectively, as evidenced by high R2 values of 0.994, 0.994, and 0.997 for Qext, Qsca, and g, respectively. These results underscore the robust learning capability of the selected NN model, affirming its aptitude for capturing the intricate relationships within the data.

Fig. 1: MieAI training and evaluation.
Fig. 1: MieAI training and evaluation.
Full size image

ac MieAI predictions against true AOPs estimated using Mie calculations for the test dataset. df Distribution of NN errors for the test dataset. Here, error reported is the percentage error of MieAI with respect to true AOPs estimated using Mie calculations.

However, it is worth noting that while MieAI excels in predicting these three AOPs overall, there are specific regions where it encounters challenges. In particular, these challenges become apparent in the case of Qext and Qsca, especially when these values fall below 2 and 1, respectively. In these regions, MieAI appears to struggle, leading to more substantial discrepancies between its predictions and the actual values.

To gain deeper insights into these discrepancies, we examine the distribution of relative errors in MieAI predictions, as illustrated in Fig. 1d-f. This analysis reveals that MieAI tends to underestimate Qext and Qsca slightly when compared to those calculated using the Mie theory. However, notably, there is no such bias observed in the case of g. These findings provide valuable insights into the performance characteristics of MieAI and highlight specific areas where further model refinement may be warranted.

MieAI validation using ICON-ART simulations

Next, we compare the AOP predictions of MieAI against the same estimated using Mie theory for the outputs of ICON-ART simulations and are shown in Figs. 24 for different real events examined in this study. For this purpose, the number concentrations for the constituent species of mixed modes were taken from ICON-ART output. We first map the ICON-ART number concentrations to RIs for core and shell as shown in Supplementary Fig. 1and ICON-ART modes to bins assuming log-normal distributions as illustrated in Fig. 5. MieAI predictions for the bins are integrated back to modes and then compared with the Mie calculations.

Fig. 2: Comparison of AOPs predicted by MieAI against those estimated using Mie theory for coarse mode internally mixed aerosol particles at an altitude of 15 km above sea level for the La Soufrière volcanic eruption (denoted by the plus symbol) event simulated using ICON-ART.
Fig. 2: Comparison of AOPs predicted by MieAI against those estimated using Mie theory for coarse mode internally mixed aerosol particles at an altitude of 15 km above sea level for the La Soufrière volcanic eruption (denoted by the plus symbol) event simulated using ICON-ART.
Full size image

Here, left column (a, d, g, j) shows the AOPs estimated using Mie theory, middle column (b, e, h, k) shows the same predicted from MieAI and right column (c, f, i, l) shows the relative error of MieAI AOPs prediction against Mie calculations. m The geographical distribution of the aerosol median diameter simulated using ICON-ART, whereas (n) shows the geographical variation of shell thickness, as a fraction of the total particle diameter (in percentage), of the coated aerosol.

Fig. 3: Comparison of AOPs predicted by MieAI against those estimated using Mie theory for coarse mode internally mixed aerosol particles at an altitude of 850 m above sea level for Australian biomass burning event simulated using ICON-ART.
Fig. 3: Comparison of AOPs predicted by MieAI against those estimated using Mie theory for coarse mode internally mixed aerosol particles at an altitude of 850 m above sea level for Australian biomass burning event simulated using ICON-ART.
Full size image

Here, left column (a, d, g, j) shows the AOPs estimated using Mie theory, middle column (b, e, h, k) shows the same predicted from MieAI and right column (c, f, i, l) shows the relative error of MieAI AOPs prediction against Mie calculations. m The geographical distribution of the aerosol median diameter simulated using ICON-ART whereas (n) shows the geographical variation of shell thickness, as a fraction of the total particle diameter (in percentage), of the coated aerosol.

Fig. 4: Comparison of AOPs predicted by MieAI against those estimated using Mie theory for coarse mode internally mixed aerosol particles at an altitude of 5 km above sea level in ICON-ART simulation of a dust event over central Europe.
Fig. 4: Comparison of AOPs predicted by MieAI against those estimated using Mie theory for coarse mode internally mixed aerosol particles at an altitude of 5 km above sea level in ICON-ART simulation of a dust event over central Europe.
Full size image

Here, left column (a, d, g, j) shows the AOPs estimated using Mie theory, middle column (b, e, h, k) shows the same predicted from MieAI and right column (c, f, i, l) shows the relative error of MieAI AOPs prediction against Mie calculations. m The geographical distribution of the aerosol median diameter simulated using ICON-ART, whereas (n) shows the geographical variation of shell thickness, as a fraction of the total particle diameter (in percentage), of the coated aerosol.

Fig. 5: Coated internally mixed aerosol particle.
Fig. 5: Coated internally mixed aerosol particle.
Full size image

It is assumed to be composed of a core that is insoluble and a shell that is soluble. The core consists of black carbon, volcanic ash, sea salt and dust whereas the shell consists of organic, inorganic matter (such as ammonia (NH4), nitrate (NO3), chlorine (Cl), sulfate (SO4) and sodium (Na) and water (H2O)). Here, Dc represensts the diameter of the core and Dt is the total diameter of coated, mixed aerosol particle that consists of both core and shell. Refractive indices (RI) for all chemical species constituting the mixed aerosol particle except dust are obtained from ref. 65 whereas those for dust are obtained from ref. 64.

Figure 2 shows the spatial distribution of AOPs for simulated internally mixed volcanic aerosols in coarse mode, obtained from both Mie and MieAI (see Supplementary Fig. 4 for the comparison in accumulation mode). The illustration focuses on the derived AOPs after the La Soufrière volcanic eruption in April 2021, specifically showcasing the comparison at an altitude of 15 km above sea level 27 hours after the start of the simulation. The median diameters, shown in Fig. 2m, exhibit a range spanning from 100 nm to 1200 nm. Concurrently, the shell (coating) thickness, depicted in Fig. 2n, varies from 10 to 80% of the total diameter. It is notable that a majority of particles possess median diameters exceeding 500 nm and exhibit thick coatings (more than 50% coating fraction). In this case, volcanic ash constitute the core whereas water and inorganic species (sulfate, nitrate and ammonium) are the constituents of the coating/shell. A discernible trend emerges in the figure, where Qsca (Fig. 2d) and consequently Qext (Fig. 2b) appear to align with the distribution of both median diameter and coating fraction. Higher values of Qext and Qsca are observed in regions characterized by lower median diameters and coating fractions. Conversely, both ω (negative correlation; Fig. 2g) and g (positive correlation; Fig. 2j) show a more pronounced correlation with changes in median diameter, with a lesser influence from the coating fraction. Impressively, MieAI, shown in Fig. 2b, e, h, k, effectively captures these dependencies, showcasing an impressive agreement between its predictions and Mie theory estimates. The comparison between MieAI predictions and Mie theory estimates reveals a very good agreement, with relative errors (depicted in Fig. 2c, f, i) generally staying within 10% for all AOPs, except for g (Fig. 2l), where the relative error reaches up to 12%. This suggests that the NN model effectively captures the intricate relationships between particle morphology, mixing state, and optical properties. Interestingly, it’s worth noting that network errors exhibit a degree of dependency on the coating fraction for all AOPs, except g. In the case of g, network errors closely track the distribution of median diameter, with higher relative errors occurring in regions where the median diameter is smaller.

Figure 3 shows a comparison of the bulk AOPs estimated from MieAI and Mie for case wildfire. This case study centers on an Australian wildfire event from 2019, specifically examining the comparison at an average altitude of 850 m above sea level after 23 hours of simulation (23rd of November 2019, 23:00 UTC). The selected altitude corresponds to the mass weighted height of the plume, further the plume at that level is wide spread with a high concentrations compared to other model levels. The time step selected is towards the end of the one day simulation, enabling transport and aging of the aerosol. In this case, soot constitutes the core whereas water, organic and inorganic species (sulfate, nitrate and ammonium) are the constituents of the coating/shell. Here, the median diameter (Fig. 3m) for the internally mixed aerosol in coarse mode ranges from 50nm to 1000nm whereas the shell (coating) thickness (Fig. 3n) varies from 35 to 50% of the total diameter (see Supplementary Fig. 5 for the comparison in accumulation mode). It’s worth noting that this simulation predominantly features aerosol particles with total diameters exceeding 900 nm. Similar to case volcano, intriguing patterns emerge wherein all four optical properties exhibit alignment with the distribution of median diameters. In particular, changes in Qext (Fig. 3a), Qsca (Fig. 3d), and ω (Fig. 3g) showcase a negative correlation with the variations in median diameters, while g (Fig. 3j) demonstrates a positive correlation with the same. Intriguingly, none of the optical properties appear to exhibit sensitivity to variations in the shell thickness. Remarkably, the comparison between MieAI, shown in Fig. 3b, e, h, k, and Mie calculations, shown in Fig. 3a, d, g, j, underscores an excellent agreement, reaffirming the robustness of the employed NN model in effectively emulating Mie theory for internally mixed aerosols. The relative errors for all optical properties, shown in Fig. 3c, f, i, l, in this case remain within the 10%. Notably, in contrast to the La Soufrière case study, network errors in this instance appear to be particularly responsive to changes in median diameters rather than variations in the coating fraction.

Finally, Fig. 4 shows a comparison of the MieAI predictions using the model trained with quantile transformation and Mie calculation for coarse mode internally mixed particles using ICON-ART simulation for case dust. The investigation focuses on a dust event occurring over central Europe, wherein the simulation encompasses a comprehensive range of aerosol species emissions, including sea salt, dust, and soot. The figure exclusively showcases these comparisons at an altitude of 5 km above sea level. In terms of particle characteristics, the median diameter (Fig. 4m) for mixed-phase aerosols within the coarse mode exhibits a range spanning from 200 nm to 2300 nm. Notably, the majority of these particles possess a median diameter of less than 500 nm. Concurrently, the shell (coating) thickness varies from 0 up to 50% of the total diameter as shown in Fig. 4n. However, it is important to note that a substantial proportion of the particles feature a shell thickness of less than 10%. As anticipated, the optical properties (Fig. 4a, d, g, j) display sensitivity to changes in median diameter, mirroring the patterns observed in previous cases. While akin to the previous case, the influence of shell thickness remains relatively limited. As expected, MieAI (Fig. 4b, e, h, k) excels in capturing the variations in AOPs, with relative errors (Fig. 4c, f, i) staying below 10% for Qext and Qsca as well as ω. The prediction accuracy for g (Fig. 4l) is also reasonably strong, with errors generally remaining within 15%. Importantly, it is noteworthy that the magnitude of errors in g is sensitive to the coating fraction, a characteristic distinguishing it from the other three optical properties. For a complementary comparison in the accumulation mode, please refer to Supplementary Fig. 6.

The comparisons between AOPs estimated using Mie theory and the predictions made by MieAI, employing a model trained without quantile transformation, are presented in Supplementary Fig. 7. As clearly evident from the figure, the MieAI model, when trained without quantile transformation, exhibits notable shortcomings in capturing the variations in AOPs, with the exception of Qext. This discrepancy becomes particularly conspicuous despite the model’s impressive performance on the test dataset, where correlation coefficients (R) exceeded 0.98 for all AOPs examined, including Qext, ω, and g, as demonstrated in Supplementary Fig. 8.

The divergence between the model’s performance on the test dataset and its application to real-world data underscores a critical limitation in its ability to generalize beyond the training context. Implications of this observation are far-reaching and offer valuable insights into the complexities of emulating intricate physical mechanisms using NNs, particularly when not validated against real-world scenarios. Consequently, it underscores the critical importance of comprehensive preprocessing of datasets before their integration into ML models, serving as a precautionary measure against potential pitfalls in model generalization.

In summation, this comprehensive analysis underscores the robustness of the MieAI model (with quantile transformation) in reproducing the optical properties of internally mixed aerosols. Note that the MieAI was trained using a dataset which had shell thickness up to 40% only whereas the comparisons in all three cases include shell thickness beyond 40% (up to 50%). Thus, the comparisons clearly demonstrate the extrapolating capability of MieAI. The fact that the model successfully extrapolates its predictions beyond the training data’s confines is a testament to its inherent capacity to generalize and capture the underlying physical principles governing the interactions between aerosol particles and light. This characteristic is particularly valuable in real-world scenarios where aerosol properties can exhibit a wide range of variability, often extending beyond the confines of training data. MieAI’s capacity to accurately predict optical properties for aerosols with shell thicknesses up to 50% highlights its versatility and reliability as a Mie emulator.

Computational efficiency of MieAI

In addition to its high fidelity in modeling AOPs, MieAI offers a remarkable advantage in computational efficiency, showcasing significant computational enhancements in comparison to traditional Mie calculations employed for the same purpose. As indicated in Table 1, MieAI demonstrates a computational speedup exceeding 500 times that of Mie calculations across all scenarios investigated in this study.

Table 1 Timing results (in seconds) of MieAI and Mie calculations for different real cases investigated in this study

The extent of performance gain is particularly noteworthy; for instance, during the 2019 dust event over central Europe with 28,800 ICON grid cells, MieAI exhibited a speedup of approximately 500 times. As the number of grids increases, this gain becomes more pronounced, with speedups surpassing three orders of magnitude when compared to Mie calculations. This phenomenon is exemplified in the ICON-ART simulations for events such as the La Soufrière volcanic eruption with 73,500 ICON grid cells, where MieAI achieved a speedup of around 1900 times, and the Australian wildfire event with 74,865 ICON grid cells, boasting a remarkable speedup of around 1800 times.

Furthermore, the computational cost associated with MieAI training is exceedingly minimal, taking approximately 3 hours and 20 minutes. This stands in stark contrast to the runtime requirements of ICON-ART simulations. Notably, MieAI training utilized a single computing node from a high-performance computing (HPC) cluster equipped with multiple nodes, each housing 36 Intel Xeon CPUs. It is pertinent to mention that both MieAI predictions and Mie calculations were executed utilizing a single CPU core.

Discussion

This study endeavors to introduce an innovative and computationally efficient framework, aptly named MieAI, specifically designed for calculating the bulk optical properties of internally mixed and coated aerosols characterized by a log-normal size distribution. Our approach leverages a straightforward multi-layer perceptron, a type of artificial neural network, to unravel the intricate relationship between AOPs and their physico-chemical characteristics, such as particle size distribution, mixing state, and chemical composition. Central to MieAI is the representation of both core and shell as ternary systems, subsequently linked to RIs via a volume mixing approach.

In order to validate the efficacy of our approach, we subjected it to rigorous evaluation against the gold standard method of Mie calculations – a technique renowned for its precision albeit its notably sluggish computational speed. Our comparative evaluation unveiled that the NN-based MieAI approach not only attains remarkable accuracy – with errors confined within 10% – but also exhibits an excellent computational efficiency, boasting a speed improvement of three orders of magnitude.

Furthermore, our study underscores the paramount significance of meticulous pre-processing in enhancing the accuracy and generalizability of NN-based methodologies. We emphasize the necessity for rigorous evaluations of novel ML-based approaches prior to their widespread deployment in scientific applications. Moreover, MieAI model proposed in this paper tries to emulate the Mie calculations for thinly coated aerosols assuming the aerosols particles to be spherical and having the core-shell configuration. However, this approach can be extended to account for non-spherical shape and the morphologically complex configurations such as embedded, partly embedded, thick coating and partially embedded configurations1,33,34.

With its generic design, the approach presented herein holds versatile applicability, seamlessly integrating into ACMs that adopt either bin or modal frameworks for representing aerosols and their optical properties. Moreover, the same framework can be extended to accommodate externally mixed aerosols and aerosol models featuring non-spherical shapes.

The substantial precision achieved through our developed approach bears the potential to significantly contribute to the ongoing efforts aimed at mitigating uncertainties in aerosol forcing estimations. By bridging the gap between precision and computational efficiency, MieAI emerges as a valuable asset in the realm of physics-based weather and climate models, especially ACMs; poised to contribute substantially to advancing our understanding of aerosol-climate interactions and fostering more robust climate models.

Methods

Mie calculation of aerosol optical properties

Optical properties are a function of the particle size and the wavelength-dependent refractive indices (RIs) of the constituents of the aerosol particles35. Both relative RI of the particle with respect to surrounding medium and particle shape should be accounted for in radiation interaction studies29. If the particle shape is spherical, Mie theory can be used to calculate the optical properties. Mie theory uses Maxwell’s equations to solve a 3-D electromagnetic wave equation whose solution can be written as an infinite series of products of orthogonal functions36. As per Mie theory, the extinction (Qext) / scattering (Qsca) efficiencies and g of a spherical particle can be written as:

$${Q}_{ext}=\frac{2}{{x}^{2}}\mathop{\sum }\limits_{n=1}^{\infty }(2n+1){\mathbb{R}}({a}_{n}+{b}_{n})$$
(1)
$${Q}_{sca}=\frac{2}{{x}^{2}}\mathop{\sum }\limits_{n=1}^{\infty }(2n+1)(| a{| }_{n}^{2}+| b{| }_{n}^{2})$$
(2)
$$g=\frac{4}{{Q}_{scat}{x}^{2}}\left[\mathop{\sum }\limits_{n=1}^{\infty }\frac{n(n+2)}{n+1}{\mathbb{R}}({a}_{n}{a}_{n+1}^{* }+{b}_{n}{b}_{n+1}^{* })+\mathop{\sum }\limits_{n=1}^{\infty }\frac{2n+1}{n(n+1)}{\mathbb{R}}({a}_{n}{b}_{n}^{* })\right]$$
(3)

Here, an and bn are the Mie scattering coefficients and x is the size parameter which, in turn, is given by:

$$x=\frac{\pi {d}_{p}}{\lambda }$$

where dp is the particle diameter.

An approximate solution for Qext, Qsca and g can be obtained by truncating the infinite series as explained by ref. 36. Mie codes calculate the Mie scattering coefficients (an and bn), which are solely dependent on particle diameter (dp), incident wavelength (λ) and RI (Bλ), followed by determination of the number of terms required before truncation and calculation of the series. Mass extinction coefficient (ke) is obtained from Qext37:

$${k}_{e}(l,\lambda ,{B}_{\lambda })=\frac{\int\nolimits_{0}^{\infty }\frac{\pi }{4}{d}_{p}^{2}{Q}_{ext}({d}_{p},\lambda ,{B}_{\lambda }){\psi }_{0,l}({d}_{p})d{d}_{p}}{\int\nolimits_{0}^{\infty }{\rho }_{p}[\frac{\pi }{6}{d}_{p}^{3}]{\psi }_{0,l}({d}_{p})d{d}_{p}}$$
(4)

where ψ0,l and ψ3,l are the parameters of the log-normal distribution for aerosol mode l.

To calculate the optical properties of the internally mixed aerosol particle using Mie calculations, some assumption are required. Mie theory assumes that the particles have spherical shapes. In reality, the majority of aerosol particles are non-spherical. However, the process of liquid coating frequently leads to the formation of spherical coating surfaces, thus justifying the assumption of particle sphericity in mixed mode models. Recent studies suggest that coated particles can also exhibit non-spherical shapes, which complicates this assumption15,38,39,40,41. Nevertheless, the use of coated spheres remains a practical approximation in many cases and is widely used configuration in ACMs14,15,42,43. In this study, aerosol particles were assumed to be spherical in a core-shell configuration, with solid phase as the core and liquid species as the shell. Both core and shell are considered as ternary systems of different chemical species. For example, core is the ternary system consisting of dust, sea salt and soot whereas the shell is constituted by water, inorganic and organic species as shown in Fig. 5. This assumption does not imply the one-to-one existence of such mixtures in nature. Rather, it covers a wide range of the possible RIs for core and shell accruing in the atmosphere which is shown in Supplementary Fig. 1. We employ the PyMieScatt Python library for computing the optical characteristics of a coated sphere using Mie theory44. This library is built on Mie codes originally written by45 and7, rooted in the concepts presented by36.

Emulation of the Mie Calculation: MieAI

In this study, we propose a multi-layer fully connected NN popularly known as multi-layer perceptron (MLP) to emulate the calculation of AOPs using Mie calculation i.e. MieAI. As a universal function approximator, the feed-forward NN is ideally suited for modeling nonlinear processes. The schematic diagram of the MLP is shown in Fig. 6. Specifically, it is used here to establish the relationships between the micro-physical parameters of aerosol particles and corresponding single-scattering properties14,15,17,46. Its feature is the interconnection of neurons with all nodes in the front and rear hidden layers. The output \({O}_{i}^{(l)}\) of the i-th node in the fully connected layer l can be calculated from the output of the previous layer l − 1 with a non-linear activation function (ϕ).

$${O}_{i}^{(l)}=\phi \left(\mathop{\sum }\limits_{j=1}^{{N}^{(l-1)}}{w}_{i,j}^{(l)}{O}_{j}^{(l-1)}+{b}_{i}^{(l)}\right)$$
(5)

Here, \({w}_{i,j}^{(l)}\) represents the weight of the j-th neuron in the layer l − 1 to the i-th neuron in the layer l and \({b}_{i}^{(l)}\) represents the bias term of the i-th neuron in the layer l. N(l−1) is the number of neurons in layer l − 1.

Fig. 6: MieAI Architecture. MieAI is a NN based model with multiple hidden layers.
Fig. 6: MieAI Architecture. MieAI is a NN based model with multiple hidden layers.
Full size image

The first and last layers represent input and output of MieAI respectively. Here, size parameter (x), wavelength (λ), coating fraction (f), real and imaginary parts of refractive indices for both core (\(R{I}_{re}^{c}\) and \(R{I}_{im}^{c}\)) and shell (\(R{I}_{re}^{s}\) and \(R{I}_{im}^{s}\)) constitute the input of MieAI whereas the extinction (Ext), scattering (Sca) Efficiency and asymmetry parameter (Asy) are the output.

For estimating AOPs using MieAI, 7 aerosol micro-physical parameters are regarded as input features (X = [x1, x2, …, x7]) and 3 single-scattering properties (Y = [Qext, Qsca, g]) are output targets as shown in Fig. 6. Here the input features are the size parameter (x), wavelength (λ), coating fraction (f) for coated, internally mixed aerosol and RIs for both core (RIc) and shell (RIs). Using a dataset comprising known input and output matrices, denoted as X and Y respectively, the model undergoes training to optimize its parameters - weights (w) and biases (b). This optimization is achieved via back-propagation, which minimizes the cost function Cy:

$${C}_{y}=\mathop{\sum }\limits_{i=1}^{N}{({y}_{true}-{y}_{pred})}^{2}$$
(6)

This function quantifies the error between the predicted values (ypred) generated through forward propagation in the NN and the actual values (ytrue). The cost function Cy is differentiable with respect to the model parameters (w and b), enabling the application of various gradient descent techniques for efficient optimization.

Training Data and its preprocessing

To facilitate MieAI training, a total of 30 distinct combinations of core and shell chemical compositions are considered, as outlined comprehensively in Supplementary Table 1. The computation of optical characteristics relies on wavelength-dependent RI. As emphasized by47, distinct peaks in the real component of the RI manifest as prominent maxima in Qext. Simultaneously, the ω and, consequently, the absorption efficiency (Qabs) is governed by the imaginary component of the RI. In Supplementary Fig. 1a, we present the real and imaginary components of RI for the chemical species composing both the core and shell of aerosol particles43,47,48. Supplementary Fig. 1b illustrates the variations in the real and imaginary components of the RI for internally mixed and coated particles as a function of changes in the chemical composition of the core and shell across various wavelengths of solar radiation. The real part of the RI exhibits a range from 1.1 to 2.75 for the core and 1.2 to 2 for the shell, contingent upon the specific chemical compositions of the core and shell. Meanwhile, the imaginary component varies from values as low as 10−8 to 0.5 for the core and from 10−9 to 1 for the shell. It’s noteworthy that the core is characterized as a volume-averaged ternary system involving mineral dust, sea salt, and soot, while the shell is likewise modeled as a ternary system, featuring water, inorganic, and organic constituents.

The training, test and validation datasets for MieAI are generated by randomly selecting 600,000 samples (about 2%) from more than 45 million possible combinations of input features arising from varying wavelength (0.2 to 100 μm), shell thickness (from 0 to 40% of total diameter with a step wise of 0.1%), core diameter (from 10 nm to 20 μm) and RI by considering 30 different combinations for core and shell as discussed before. Randomly selected samples were divided into training (70%), validation (15%) and test datasets (15%) while optimizing the NN architecture and parameters.

Both input and target datasets have a large variability; hence it is important to normalize them before feeding to NN for training in order to improve the model learning ability. Hence, input and target data to NN model is transformed using Min-Max normalization before being fed to NN model. Afterwards, the output from the NN model is denormalized to its original optical properties space. We first normalized the training dataset and used the same normalization scale to transform validation and test datasets to avoid data leakage during model training.

Due to the non-normal distribution of the target AOPs in training dataset, we perform a quantile distribution mapping over the raw target AOPs to a normal distribution. Quantile mapping transforms all input features to the same target distribution (Gaussian distribution in this case) based on the formula G−1(F(X)) where F is the cumulative distribution function (CDF) of the input feature and G−1 is the quantile function of the target distribution G49. Quantile mapping smooths out uneven distribution and is influenced less by outliers unlike scaling methods like min-max transformation. Quantile mapping has been used extensively in meteorology for bias correction50 and statistical downscaling51. We use the python library scikit-learn for performing quantile mapping in this study. As shown in Fig. 7, the raw training dataset for g is bi-modal with one peak over 0 and another over 1. While non-linear algorithms like MieAI may not have a Gaussian distribution assumption, however they perform better if variables have a Gaussian distribution. Thus, mapping to the normal distribution improves the generalization of the trained network. During inference, the predicted AOPs are transformed back to the original distribution using inverse quantile transform with the same parameters used during the training.

Fig. 7: Transformation of MieAI output using quantile mapping.
Fig. 7: Transformation of MieAI output using quantile mapping.
Full size image

Quantile mapping transforms input features to a Gaussian distribution with mean 0 and standard deviation 1. Here, left column (a, c, e, g) shows AOPs before quantile mapping and right column (b, d, f, h) shows AOPs after quantile mapping.

Optimization and assessment of MieAI

In addition to the model parameters optimized by the NN training procedure, there are hyper-parameters that define the model architecture and control the learning process, such as the number of hidden layers, the number of neurons in each layer, the activation function, batch size and the learning rate of the optimizer. The mean squared error (Eq. (6)) is employed as the loss function for the optimizer to minimize. We apply a non-linear activation function to all of the layers except the output where we apply linear activation to restrict the NN output between 0 and 152. After the training, the weight matrices in the NN are saved and used afterwards for evaluation using ICON-ART simulations.

To assess the performance of the network, we used the coefficient of determination (R2) and Mean Absolute Percentage Error (MAPE) as metrics to evaluate the fitness of the predictions with the true values. R2 is defined as:

$${R}^{2}=1-\frac{{\Sigma }_{i = 1}^{M}{({y}_{i}-{f_i})}^{2}}{{\Sigma }_{i = 1}^{M}{({y}_{i}-{\bar{y}})}^{2}}$$
(7)

Here, fi is the value predicted by MieAI and yi is the true value. \({\bar{y}}\) is the average of all true values. The closer R2 is to 1, the higher the performance of MieAI. The MAPE metrics is defined as:

$$MAPE=\frac{100}{N}\mathop{\sum }\limits_{i=1}^{N}\left| \frac{{Y}_{mie}-{Y}_{MieAI}}{{Y}_{mie}}\right|$$
(8)

Here, YMieAI is AOP prediction from MieAI, Ymie is the AOP estimated using Mie theory and N is the number of times AOPs are predicted using MieAI.

To avoid over-fitting and other training related issues, we chose our NN hyper-parameters using keras-tuner hyper-parameter optimization library and apply early stopping with patience parameter set as 5053. The hyper-parameters of the model have been meticulously optimized through the application of Bayesian optimization. The entire hyper-parameter tuning procedure is executed in a two-stage approach, wherein each stage serves to fine-tune distinct aspects of the model. In the first stage, we focus on optimizing critical architectural components, including the number of hidden layers, the neuron count in each hidden layer, activation functions, and the choice of optimizer. Subsequently, the second stage hones in on further enhancements by fine-tuning the learning rate and the batch size of the training data, for the NN selected in the first stage. During hyper-parameter optimization, we trained various NN architectures for 200 epochs. The corresponding MSE values for these diverse NN architectures are presented in Supplementary Table 2 (first stage) and Supplementary Table 3 (second stage). The resultant optimal values for all hyper-parameters are shown in Table 2.

Table 2 Hyper-parameter tuning of MieAI model

As depicted in Supplementary Table 2, the MieAI model with Adam optimizer having 5 hidden layers with 64 neurons in each layers and GELU activation function performed the best with the lowest MSE. With the aim to select the most accurate NN with smallest possible number of trainable parameters, we performed the second stage of tuning wherein we varied the number of hidden layers and the number of neurons in each layer along with the batch size and learning rate of Adam optimiser selected after the first stage tuning. As shown in Supplementary Table 3, the MieAI model with 4 hidden layers outperformed the 5 layer NN as selected in first stage when batch size and learning rate were also optimized. We apply early stopping with patience set as 50 and reduce the learning rate of the optimizer by one-fifth if the validation loss plateaus during both hyper-parameter tuning and training of the network. Therefore, the best NN after hyper-parameter optimization is a MLP with 4 hidden layers each having 64 neurons trained using Adam optimizer with learning rate of 0.01 and training batch size of 128.

ICON-ART model system

In addition to evaluating the trained MieAI using test datasets, we conducted three reference ICON-ART simulations for real-world events to validate the MieAI prediction of AOPs against Mie calculations. The ICON modeling framework excels in solving the nonhydrostatic and compressible Navier-Stokes equations on an icosahedral-triangular grid54. This model exhibits versatility in predicting various processes across scales, from global to local, as highlighted by55 and56. Complementing the ICON model, the ART module forms an integral part responsible for simulating trace gases and aerosols in both the troposphere and stratosphere. This module encompasses processes spanning emission, transport, physicochemical transformation, removal of gases and aerosols as well as their interactions with clouds and radiation12,57,58. Deutscher Wetterdienst (DWD) uses ICON and ICON-ART for operational weather and mineral dust forecasting and pollen, respectively.

ICON-ART uses the European Centre for Medium-Range Weather Forecasts (ECMWF) radiation scheme ecRad59 as the standard radiation scheme for numerical weather prediction60,61. To calculate the local radiative transfer parameters, ecRad needs the kel,j, ωl,j and gl,j for every mode l and every waveband j for 30 wavelength bands between 0.2 and 100 μm. These are often obtained using Mie calculations. Together with the local aerosol mass mixing ratios (ψ3,l) from ART and air density (ρa), they allow for calculation of the volume specific extinction coefficient37:

$${\beta }_{ext,l,j}={k}_{el,j}\cdot {\rho }_{a}\cdot {\psi }_{3,l}\cdot 1{0}^{-6}$$
(9)

ω gives the scattering coefficient:

$${\beta }_{scat,l,j}={\omega }_{l,j}\cdot {\beta }_{ext,l,j}$$
(10)

These volume specific properties are then converted to values per model layer by multiplying with the respective layer height (Δz), followed by summation across all model layers to calculate total aerosol optical depth (AOD) for the ART aerosol within a specific waveband. These computed values then serve as input parameters for the radiation scheme12. This approach ensures full coupling and feedback between aerosol processes, radiation, and the atmospheric state48,62.

The present study focuses on the interaction of internally mixed aerosols with radiation, which is comprehensively addressed through the use of the AEROsol DYNamic module (AERODYN) in ICON-ART. This module enables examination of aerosol dynamics processes, including nucleation, condensation and coagulation that generate internally mixed aerosols. AERODYN comprises flexible number of log-normal modes (up to 10) that accounts for Aitken, accumulation, and coarse particles in soluble, insoluble, and mixed states, alongside a giant insoluble mode43. The term “mixed state" here pertains to an aerosol that is composed of an insoluble core and a soluble shell, and the latter constitutes no less than 5% of the overall mass of the aerosol. The prognostic equations for number density and mass concentration are solved for each species and each mode while maintaining constant standard deviations. There exist two distinct circumstances that result in the alteration of particle modes. The first circumstance is when the mass threshold of soluble coating on insoluble particles surpasses 5%, leading to a transition from insoluble to mixed mode. The second circumstance is when the diameter threshold of the soluble and mixed mode is exceeded, resulting in a shift to a larger mode. Alterations in the particle modes can modify the optical properties of particles with consequential impacts on both the atmospheric state and radiation43,63.

In ICON-ART, each aerosol component was assigned a RI, and the RI values were obtained from64 for dust and65 for other species. The volume-average mixing rule is used to compute the complex RI of both core and shell, which then serves as input for the core-shell calculation. To facilitate a comparison between Mie calculations and MieAI predictions, we initially derived bulk AOPs for each aerosol mode by aggregating optical properties across individual aerosol population bins. To achieve this, we initially mapped each aerosol mode, based on its median diameter, to 15 log-normal bins, as illustrated in Supplementary Fig. 2. Both Mie calculations and MieAI emulation were then applied to these bins, and the results were subsequently integrated to obtain bulk optical properties for each mode. For our validation, we employed RI values at a wavelength (λ) of 550 nm.

Case studies

In order to validate accuracy and computational efficiency of MieAI, we apply both MieAI and Mie code to the outputs of three different case studies with different aerosol species. In the following, we briefly explain the experiments. Table 3 summarizes the relevant aerosol characteristics in each experiment. It is noteworthy that MieAI was exclusively trained on a dataset featuring shell thicknesses up to 40%, while the comparisons in all three cases encompass shell thicknesses beyond 40%, reaching up to 50%. This extension aims to demonstrate the generalization capability of MieAI. Additionally, it is imperative to recognize that the stability of the Mie code output diminishes as the coating exceeds 50%. Furthermore, we hypothesize that particles undergo a transition into an optically soluble mode beyond a coating threshold of 0.5 i.e. they are treated as particles in soluble mode instead of the mixed mode. Importantly, our focus is not to validate the model simulations in these events. Rather, we aim at evaluating the MieAI performance with real model data.

Table 3 Summary of the mixed mode properties in case studies

The first numerical experiment (case volcano) is a simulation of the last La Soufrière eruption in April 2021 and was performed by ref. 63. Located on the St Vincent island in the Caribbean, the La Soufrière volcanic eruption occurred during 09–21 April 2021 and emitted volcanic material such as ash and SO2 in 49 eruption phases. The simulation covered the initial four days of the eruption, encompassing 43 of the 49 eruption phases starting from 09 April at 12 UTC. The simulation had a grid spacing of 13 km with 2 nested grids around the volcano with 6.6 and 3.3 km grid spacing, respectively. The model employed 90 vertical levels to resolve the atmosphere up to 75 km. The experiment accounts for aging of volcanic ash (ash coated by sulfate-water mixture) due to aerosol dynamics. More details on this experiment is provided by ref. 63.

The second case (case wildfire) investigated the catastrophic 2019–20 Australian wildfires in Queensland, which severely affected over 7.5 million hectares and caused a decline in air quality. A 1-day simulation was performed on November 23rd in an area on the eastern coast of Australia (150E–160E, 23S–33S). The model featured a grid spacing of 6.6 km, extending vertically to 20 km with 125 levels in a limited area setting. The emission fluxes are taken from the Global Fire Assimilation System (GFAS). 25% of the particle mass is emitted in the Aitken mode and 75% in the accumulation mode. The emission height is parameterized with the plume rise model according to refs. 66,67,68,69.

The third case study (case dust) centered around a dust event over central Europe during 22–27 June 2019, involving global-scale simulations with a 40 km grid. The simulation considers comprehensive aerosol emissions (including sea salt, dust, and soot) and their dynamic processes (such as nucleation, condensation and coagulation), with simplifications made in gas-phase chemistry for operational forecasting. Similar to the wildfire case study, chemical species were reinitialized daily using CAM-Chem data.