Introduction

Sulfuric acid (H2SO4, SA) plays a crucial role in atmospheric new particle formation due to its extremely low volatility, strong hygroscopicity, and capacity for hydrogen bonding1,2,3,4. Current understanding suggests that gaseous SA participates in nucleation through binary or ternary mechanisms, interacting with water, ammonia, amines, and low-volatility organic compounds5,6,7,8. The nucleation rates are positively correlated with SA concentration, highlighting the importance of accurate SA measurements to quantify particle formation processes2,9,10. In addition to initiating nucleation, SA also contributes to aerosol growth through condensation on existing particles10,11. Therefore, a comprehensive understanding of its distribution and formation in the atmosphere is essential for a thorough investigation of new particle formation, aerosol growth, and sulfur chemistry.

While primary emissions contribute to SA in environments such as volcano, tunnels or roadside areas, secondary formation dominates SA in most regions12. Precursors of SA originate from both natural and anthropogenic sources, such as ocean emissions, industrial activity, and shipping. SO2, from either dimethyl sulfide oxidation or primary emission, can be oxidized to SO3, which reacts rapidly with water to produce SA13,14,15,16. During the day, OH radicals drive this oxidation, whereas at night, stabilized Criegee Intermediates (SCIs), and in some cases, residual OH radicals play important roles17,18. SA is removed from the atmosphere through nucleation, condensation on pre-existing particles, and deposition, with condensation generally recognized as the dominant sink17,19,20. Direct measurement of gaseous SA is technically challenging due to its low concentrations and fast formation and loss dynamics. Nitrate-based Chemical Ionization Mass Spectroscopy (CIMS) is the most widely used technique for detecting atmospheric SA, which is ionized and detected as ions of HSO4-, H2SO4NO3-, and H2SO4HNO3NO3- 21,22. Calibration is typically performed using in-situ generated SA via reaction of SO2 with certain OH radicals, but this process remains challenging due to uncertainties in OH radical generation, wall losses, and ion-molecule reaction efficiencies. Operational complexity, instrument sensitivity, and the need for regular calibration limit the deployment of CIMS for long-term or widespread monitoring. Field studies in forest, rural, and urban areas have shown that SA concentrations peak during the day, typically ranging from 106 to 108 molecules/cm3 17,20,23,24. Nighttime levels, though lower, remain relatively stable in the range of 104 to 106 molecules/cm3 17,20,23,24. In exceptional cases, such as during volcanic eruptions, SA concentrations can reach up to 109 molecules/cm3 due to elevated emissions of SA and SO225. Human activities increase SO2 emissions and, by extension, SA formation, but higher particle concentrations also accelerate its removal, creating a non-linear relationship between SA concentration and pollution level24.

To overcome the limitations of direct measurements, proxy models have been developed to estimate gaseous SA concentrations based on the formation and loss pathways and routinely monitored atmospheric variables, such as SO2 concentration, global solar radiation (SR) or ultraviolet radiation B (UVB), relative humidity (RH), and the condensation sink (CS)17,20,24,26. Earlier work by Petäjä et al.20 first proposed such proxies using data measured from the Hyytiälä SMEAR II station in Finland, and exploring different parameterizations with OH radicals, UVB, and SR. Mikkonen et al.24 expanded this approach by combining data from multiple locations and periods to develop a more generalizable model. However, questions about their applicability in other atmospheric settings are raised due to the environment differential. In particular, the relative importance of key factors, such as SO2, CS, and RH, differs across environments, highlighting the need for localized optimization. The proxy represented by Eq. 1 has been widely used in subsequent studies to localize parameters for daytime SA vapor, where a denotes an apparent coefficient and k represents the temperature-dependent reaction rate (cm3/molecule/s).

$${SA}=a\times k\times {{SR}}^{b}\times {{{SO}}_{2}}^{c}\times {{CS}}^{d}\times {{RH}}^{e}$$
(1)

Compared to the daytime proxies, nighttime SA proxies have received relatively little attention. Guo et al.18 emphasized the important role of SCIs in nighttime SA production in Beijing. Dada et al.17 incorporated both OH and SCIs (represented by O3 × alkenes) as oxidants, alongside CS and nucleation sink, and established a proxy that can estimate SA formation in daytime and nighttime at Beijing and Hyytiälä. Although nighttime SA concentrations are typically lower than daytime levels, understanding and accurate representation of nighttime SA remains crucial for comprehending the fate and atmospheric impacts.

Despite a growing body of research, existing measurements are insufficient to characterize SA’s global distribution or temporal variability. SA measurements in coastal regions where anthropogenic emissions interact with ship emissions and oceanic species remain limited. Furthermore, few studies have explicitly examined nighttime formation mechanisms in such regions. These motivated the present study, in which we conducted field measurements of gaseous SA and associated parameters at two coastal sites in Hong Kong, to investigate the SA characteristics in coastal boundary layers. The performance of existing proxies was evaluated, and improved models tailored to coastal conditions were then developed. Additionally, mechanism analysis for SA was performed to identify dominant drivers and sinks. We aim to propose reliable proxies for accurately estimating SA levels in Hong Kong and similar subtropical coastal environments.

Results and discussion

SA concentration and characteristics from Two Field Campaigns

Two field campaigns were, respectively, conducted at the Cape D’Aguilar Supersite Air Quality Monitoring Station (CD supersite: 22.22 °N, 114.25 °E) in 2018 and at the Hong Kong University of Science and Technology (HKUST) supersite (22.33 °N, 114.27 °E) in 2022 during fall and winter. Figure 1 and S1 illustrate the overall observation results of gaseous SA and key parameters relevant to its formation. Due to instrument maintenance and availability issues, SO2 data at the HKUST supersite were unavailable at the beginning of the campaign, while SR data at the CD supersite were missing on specific days. In both campaigns, SA showed clear diurnal variations, with concentration peaking at midday. At the CD supersite in 2018, the diurnal peak concentration was recorded at (8.0 ± 7.5) × 106 molecules/cm³, while at the HKUST supersite in 2022, it was (4.2 ± 2.9) × 106 molecules/cm³. Nighttime SA concentrations decreased to (6.7 ± 3.4) × 105 and (2.8 ± 1.3) × 105 molecules/cm³ at the respective sites. The median concentrations of SA were 2.4 × 106 molecules/cm³ in 2018 and 1.3 × 106 molecules/cm³ in 2022. The lower SA concentration observed at HKUST in 2022 aligned with reduced levels of SO2, O3, and isoprene compared to CD supersite. As shown in Fig. S2, no obvious correlations were observed between SA and CO, suggesting the negligible primary emission and the predominant role of secondary formation on SA levels. The secondary formation of SA should be determined by the concentration of SO2, OH radicals, SCIs, and the values of CS. An extremely strong correlation was observed between SA and SR, due to the decisive role of SR on OH radical formation in the daytime. A positive but moderate correlation was observed between SA and SO2 (Figs. S2, S3a), suggesting that the elevated SO2 would lead to an increase in SA, while it did not determine the diurnal variation of SA. In contrast, SA showed poor correlations with CS, PM2.5, likely due to the narrow variability of CS and PM2.5 during the field campaigns. It also should be noted that CS and SA are dynamically coupled, and SA increase can contribute to particle growth that subsequently enhances CS, leading to a delayed feedback effect rather than an immediate response. The seasonal similarity and comparable SR intensities between the two campaigns suggested analogous daytime OH radical levels, and also similar PM2.5 concentrations contributed to comparable CS conditions. However, the average concentration of SO2, the primary precursor of SA, was three times lower at HKUST in 2022 (0.40 ± 0.29 ppb) compared to the CD supersite in 2018 (1.23 ± 0.61 ppb), which should be the key reason for the reduced SA formation in 2022.

Fig. 1: Time series and diurnal variations of measured species.
Fig. 1: Time series and diurnal variations of measured species.
Full size image

a Time series of H2SO4, SO2, SR, and PM2.5 measured at the CD supersite in 2018 and the HKUST supersite in 2022. b Diurnal variation of H2SO4 and (c) SO2 measured in both field campaigns.

When compared with previous studies (Fig. 2), SA concentrations measured at both sites in this study fell within the typical range reported for other rural sites13,17,23,24,27. This similarity can be attributed to comparable levels of SO2, SR, and CS. Forest sites with lower SO2 levels generally exhibited lower SA concentrations, whereas in urban areas, despite elevated SO2 levels due to intensive human emissions, SA concentrations often showed comparable levels to rural and suburban locations17,18,26,28,29. This intriguing phenomenon can be explained by the concurrent higher CS values in urban environments, which accelerate SA consumption, offsetting the enhanced formation due to elevated precursor concentrations. Notable exceptions (SC in Fig. 2) include the Kilpilahti site in Finland and Maïdo observatory on Réunion island, which experienced exceptionally high SO₂ and SA concentrations due to unique local influences, such as emissions from the oil refinery at Kilpilahti, and volcanic activity at Maïdo17,25. These findings underscore the intricate interplay between SA precursor concentrations, loss rates, and environmental factors, all of which jointly determine SA concentrations across diverse atmospheric settings.

Fig. 2: Comparison of SA and SO2 concentration, SR, and CS across multiple studies.
Fig. 2: Comparison of SA and SO2 concentration, SR, and CS across multiple studies.
Full size image

Comparison of the results in this work and reported in previous studies (SMEAR II: SMEAR II-station13,17,20,24, NW: Niwot Ridge24, Amazon T3: Amazon T3 site57, KSU: Kent State University23, AM: Agia Marina17, MV: Michelstadt–Vielbrunn27, Hop: Hohenpeissenberg13, Melpitz24, SPC : San Pietro Capofium24, EML: Eagle Mountain Lake29, SMEAR III: SMEAR III-station17, SORPES: the Station for Observing Regional Processes of the Earth System in Nanjing19, Atlanta24, Budapest17, BUCT: Beijing University of Chemical Technology17,18,26,28, THU: Tsinghua University28, IUECAS: the Institute of Urban Environment, Chinese Academy of Sciences58, Kilpilahti17, Maïdo: Maïdo observatory25). Bars represent the range of 5 to 95% percentile, lines, and dots indicate medium, and mean values, respectively.

Evaluating the applicability of previously proposed proxies

Functions of typical proxies used in previous studies were summarized in Table 1, including linear and nonlinear regression-based proxies (DL1-DL3, and DNL1-DNL6) for the daytime SA, and proxy A1 for the whole day SA estimation, which are described in detail in the Method section. To evaluate the potential of using proxy functions to estimate SA concentrations, we first evaluated the applicability of previously established proxies from other studies (Table S1) in the context of Hong Kong coastal atmospheric conditions. Since SR was the only available indicator for OH radicals during the two field campaigns, proxies based on UVB or directly on OH radical measurements were not included in Table S1. Those previous proxies are all basic functions listed in Table 1, with parameter coefficients empirically derived from different measurement datasets to quantify the sensitivity of SA formation to key atmospheric drivers such as SR, SO2 oxidation, and condensation sink. Among the selected proxies, P1-P6 are specifically designed for daytime SA estimation, while P7 and P8 can be used for estimating SA concentrations throughout the entire day. The comparison between the estimated and measured SA concentrations is presented in Fig. 3. Proxies P1 to P8 showed strong correlations with observed SA (Pearson r = 0.69- 0.87), but variations in the slope, intercept, and the RE values indicate that not all proxies are well-suited to the coastal conditions in Hong Kong.

Fig. 3: Comparison of estimated SA concentrations from previous proxies with measurements.
Fig. 3: Comparison of estimated SA concentrations from previous proxies with measurements.
Full size image

Comparison between SA estimated using proxies P1-P8 proposed in previous studies (Table S2) and measured SA concentrations from 8th November to 3rd December 2018 at the CD supersite.

Table 1 Proxy functions used to construct proxies in this work

Proxy P1 proposed by Petäjä et al.20 was one of the earliest attempts to estimate SA concentration. It yielded a moderate correlation (0.69) but with a low regression slope of 0.39, indicating its inapplicability to SA estimation in Hong Kong. Proxy P2 with the function of DNL5, proposed by Mikkonen et al.24, further considered the influence of RH on CS using multi-site data, and performed better in estimating SA, with a slope of 1.07 and a relative error (RE) of 0.63. Proxy P3 and P4, from Kuerten et al.27 resemble DNL4 and DNL2, respectively. Although both proxies produced slopes close to 1 and low RE values, they tended to underestimate SA. Interestingly, the simpler P4 outperformed P3, likely because of limited variations in RH and CS during the campaign, and including them added noise rather than explanatory power27. This highlights a key trade-off in proxy design that additional variables can improve realism but may reduce robustness when their variability is small or their measurements are uncertain.

Proxy P5, established by Größ et al.30 with a linear function of DL1, consistently overestimated SA concentrations in our study, which might be attributed to differences in CS calculation methods. In their work, CS considered particles from 2 nm to 10 µm and adjusted with ambient RH30, while CS in this work was calculated for dry aerosols (14.1 to 736.5 nm), yielding smaller CS values and thus higher SA. Proxy P6 further considered nucleation as an additional sink for SA and predicted similar SA concentrations, matching our measurement17. It is likely due to similar atmospheric conditions between Agia Marina, Cyprus, where P6 was developed, and the CD supersite.

P7 and P8, with the same function as A1, were designed to simulate full-day SA concentration under boreal and megacity environments, respectively17. However, P7 with a regression slope of 0.37 and a high RE showed significant inconsistency with SA observation in Hong Kong, suggesting its inapplicability for the coastal measurements. While proxy P8 performed reasonably well during the daytime, its full-day simulation exhibited a non-linear relationship with the measurements. In summary, previously proposed proxies vary in their transferability due to site-specific dependencies of their empirical coefficients. P2 and P6 performed relatively well due to the usage of multi-site datasets covering different conditions or a similar environment to this work, but others did not translate well to Hong Kong’s unique coastal atmospheric profile. These results highlight the need for region-specific proxy development, especially for accurate nighttime estimation, where current models remain limited.

Construction of daytime proxies

Based on the proxy functions in Table 1, linear or nonlinear least square regressions were performed with the measurement data at the CD supersite from 8th November to 3rd December 2018 (Fig. 4), and the optimized parameters for each proxy are summarized in Table 2. The proxies were then validated against SA measurements from 4th December to 19th December 2018 at the same site (Fig. S4). Due to the lack of CS data at the HKUST supersite in 2022, only DL2, DNL2, and DNL3 were validated against the measured data (Fig. 5). Across these evaluations, the constructed proxies showed strong correlations with measured SA (R: 0.69–0.91), low relative errors (RE = 0.42–0.97), and slopes close to 1, outperforming earlier proposed proxies.

Fig. 4: Reconstruction of SA concentrations at the CD supersite using different proxies.
Fig. 4: Reconstruction of SA concentrations at the CD supersite using different proxies.
Full size image

Reconstructed SA concentrations from linear (DL1-DL3) and non-linear (DNL1-DNL6, DNL1-PM, DNL4-PM, DNL5-PM) proxies versus measured SA from 8th November to 3rd December 2018 at the CD supersite.

Fig. 5: Comparison of estimated SA concentrations from proxies with measurement.
Fig. 5: Comparison of estimated SA concentrations from proxies with measurement.
Full size image

Estimated SA concentrations from proxies DL2, DNL2, DNL3, DNL1-PM, DNL4-PM, and DNL5-PM versus measured values from 14th November to 20th December 2022 at the HKUST supersite.

Table 2 Obtained parameters for daytime proxies in Table 1

The fitted parameters for DL1 and DL2 in our study were consistent with previous reported ranges, while the fitted parameter for DL3 was slightly higher than those derived by Mikkonen et al.24. Among the linear proxies, simplified DL2 excluding CS showed the best performance, reflected by its higher correlation factor and lower RE value. This result can be attributed to the weak correlation between CS and SA during our campaign (Fig. S3b), echoing previous observations by Kuerten et al.27 and the above discussions in Section 3.2. When CS fluctuates within a limited range, [SO2] and SR alone may suffice for estimating daytime SA levels.

The non-linear proxies DNL1-DNL5 introduce additional parameters, enhancing flexibility in modeling SA behavior. Generally, the exponents for SR (b) and [SO2] (c) were close to 1, consistent with previous studies by Mikkonen et al.24 and Kuerten et al.27. In our study, the exponent for CS (d), ranged from −0.66 to −0.87, slightly higher than the theoretical value of −1, but still within a plausible range and lower than those reported by Mikkonen et al.24 and Kuerten et al.27. DNL1, which used the same variables as DL1 but incorporates nonlinearity, performed similarly to its linear counterpart. DNL2, which excluded CS, performed better than DNL1 and also its linear counterparts DL2, and DL3. Although measured SA was negatively correlated with RH (Fig. S3c), likely due to decreasing RH after sunrise as temperatures rise, RH had minimal impact on SA estimation in DNL3 and DNL4. Specifically, the values of RH−0.14 and RH0.18 during the campaign ranged narrowly from 0.53 to 0.57 and 2.1 to 2.3, respectively. This limited variation reduced RH’s influence on SA formation in these proxies. In DNL5, the exponents for CS and RH were fixed to the same value (−0.66). As shown in Fig. S5, while RH had little effect on the diurnal variation of 1/CS, combining the two 1/(CS × RH) increased the exponent of CS from −0.86 in DNL1 to −0.66 in DNL5. Nonetheless, DNL5 still underperformed compared to DNL2 and DNL3. DNL6 further considered nucleation as an additional sink for SA17, and the fitted coefficient ‘a’, representing OH radical as a function of SR, closely matched values in P6 proposed by Dada et al.17. However, the value of Kn obtained in our study was nine orders of magnitude lower than that in proxy P6, suggesting a negligible contribution of nucleation process to SA consumption under our conditions. While this observation aligns with local atmospheric behavior, further investigation is warranted due to potential uncertainties in proxy construction. Overall, all constructed proxies (DL1-DL3, DNL1-DNL6) effectively estimated daytime SA. Evaluated from the value of R, RE, and the slope of the reconstructed SA versus measured SA, non-linear functions slightly outperformed linear ones, and among them, DNL2 demonstrated the highest accuracy.

Given the strong correlation between CS and PM2.5 (Fig. S6), we explored the use of PM2.5 as a potential surrogate for CS in conditions where direct size-resolved data is unavailable. Although CS is strongly influenced by the size distribution, our measurements showed a clear positive correlation between CS with PM2.5 concentration, and PM2.5 with the mean aerosol diameter (Fig. S6), suggesting that PM2.5 can serve as a reasonable indicator of CS variability. However, caution is warranted because PM2.5 assumes a relatively homogeneous particle composition and hygroscopicity, which may not always hold true across regions and seasons, and thus validation is recommended before broader application. Using this approach, we constructed DNL1-PM, DNL4-PM, and DNL5-PM, as shown in Table 2. In DNL1-PM, the exponent ‘d’ for PM2.5 was only −0.007, resulting in PMd values close to 1. Consequently, the parameters in DNL1-PM were nearly identical to those in DNL2. Again, RH exponent had a minimal effect in DNL4-PM, while its addition increased the parameter ‘d’ to −0.67. Figures 4, 5, and S4 show that DNL1-PM, DNL4-PM, and DNL5-PM performed comparably to DNL1-DNL6, suggesting that replacing CS with PM2.5 can be a practical alternative for daytime SA proxy, especially in scenarios where direct CS measurements are not available.

Construction for the whole day SA proxies

To develop a proxy capable of estimating gaseous SA concentrations over both day and night, A1 was initially established by considering OH radicals as the daytime oxidant and SCIs for the whole day. As shown in Table 3, the fitted parameter ‘a’ was 7.7 × 103, yielding an average product a×k of 7.3 × 10−9, which is similar to the value of 8.6 × 10−9 (P7), previously reported by Dada et al.17. However, the estimated apparent reaction rate of alkenes with O3 (3.7 × 10−30 cm3/molecule1/s1) was 4–16 times lower than previously reported values (see Table S1). Consistent with the above proxy DNL6, the ‘kn’ value representing the nucleation consumption of SA revealed only a minor role for the SA sink. Overall, A1 still underperformed, particularly during nighttime, where SA formation was significantly underestimated (Fig. 6a). This suggests that using SCIs or [O3] × [Alkenes] may not sufficiently account for nighttime SA formation in coastal Hong Kong.

Fig. 6: SA estimation and box model simulation.
Fig. 6: SA estimation and box model simulation.
Full size image

a The estimated SA from proxy A1 and (b) the simulated SA from box model M0 versus measured SA from 8th November to 3rd December 2018 at the CD supersite (unit: molecule/cm3). c Simulated OH radicals (molecule/cm3) in the model M0 and M3. d Time series of estimated SA from M3-2 and observed SA. e Budget analysis of SO3 (molecule/cm3/s) in model M3-2.

Table 3 Obtained parameters for proxies designed for nighttime and full day

To investigate the reaction mechanisms of SA, an observation-based photochemical box model (PBM) built on the Master Chemical Mechanism (MCM v3.3.1) was employed to simulate SA concentrations (Figs. 6b and S7). A good linear relationship was observed between the simulated and observed SA levels; however, the model slightly underestimated the observed SA concentrations, particularly during nighttime. Since SA is generated from the reaction of SO3 with H2O, the budget analysis of SO3 can provide insights into the formation pathways of SA (Fig. S8). Notably, OH radicals were identified not only as the main contributor to daytime SA formation but also accounted for 96% of nighttime SA formation in the default model M0. This result contrasts with previous findings suggesting that SCIs should dominate the nighttime SA formation17. The reduced significance of SCIs role on SA formation might be attributed to the incomplete mechanisms of SCIs reacting with SO2 in M0. Therefore, modifications were made to improve the model performance and further investigate the formation mechanism of SA. The default reaction rates of SCIs with SO2 are 7 × 10−14 cm3/molecule/s, which are lower than the recommended values by the International Union of Pure and Applied Chemistry (IUPAC) for most SCIs (https://iupac.aeris-data.fr/). Given the abundance of different SCIs in our model, the reaction rates for CH2OO, CH3CH2OO, MACROO, and MVKOO reacting with SO2 were adjusted to the recommended values (M1 in Table S2). These modifications in M1 resulted in an increase in the average formation rate of SA from 4.5 × 103 to 3.9 × 104 molecule/cm3/s during nighttime, and from 7.6 × 104 to 1.1 × 105 molecule/cm3/s during daytime. While the simulated daytime SA from M1 still aligned well with observations, the nighttime simulation was overestimated (Fig. S7b).

As reported in previous studies, H2O or (H2O)2 could be dominant sinks for some SCIs. However, the default mechanisms only included reactions between H2O and SCIs with rates of 1 × 10−17  cm3/molecule/s, which do not accurately reflect the real atmospheric conditions31,32,33. Additionally, the reaction rates between SCIs with NO2 were set to 1 × 10−15 cm3/molecule/s, while the recommended reaction rates of CH2OO + NO2 and CH3CHOO + NO2 are 3 × 10−12 and 2 × 10−12 cm3/molecule/s, respectively34,35. Therefore, mechanisms related to SCIs with H2O and NO2 were added or modified in the model as M2. Due to the competition of H2O and NO2 with SO2 for SCIs, the formation rate of SO3 decreased in M2. Similar to models M0 and M1, the daytime SA was well reproduced in M2, and the simulated SA in the early evening showed better consistency with observations compared to M0 and M1 (Fig. S7c). However, the nighttime SA simulated by M2 was still lower than the observations, although the discrepancy was smaller than that observed in M0. These results suggest that, in addition to SCIs, there may be another source contributing to nighttime SA formation.

It is noteworthy that the average simulated nighttime OH concentration in M0, M1, and M2 was approximately 1.4 × 105 molecule/cm3 (Fig. 6c), which was three times lower than the levels observed in 2020 at the same site and season36. Nighttime OH radicals are mainly generated from the oxidation of VOCs by NO3 radicals and O3 via the formed RO2 and HO2 radicals reacting with NO or Criegee intermediates. Although nighttime O3 concentrations are typically low in many urban areas, but high nighttime O3 concentration (annual average of 20–30 ppbv) was observed in this region, favoring the formation of NO3 radicals and subsequent OH production. These processes are complex and are not yet fully represented in the MCM mechanism37. Moreover, a number of VOCs remain undermeasured or are not fully represented in the model. The combination of incomplete mechanisms and missing VOC reactivity potentially led to the underestimation of OH radicals. To determine whether the underestimation of nighttime SA was due to potentially missing sources of OH radicals, a direct unknown source with generation rates of 3 × 106, 4 × 106, and 5 × 106 molecules/cm3/s was added to the model (M3, including M3-1, M3-2, and M3-3). As shown in Fig. 6c, the additional source of OH radicals resulted in average nighttime OH radical concentrations of 3.9 × 105, 4.7 × 105, and 5.4 × 105 molecules/cm3 for M3-1, M3-2, M3-3, respectively. The simulated SA from M3 demonstrated better consistency with observations (Figs. 6d and S9). Based on the budget analysis of M3-2, OH radicals and SCIs contributed 69 and 31% to the nighttime SA formation, respectively, while daytime SA remained predominantly influenced by OH radicals (Fig. 6e). The role of SCIs in SA formation was more important during the evening, which aligns with the findings of Guo et al.18. These results emphasize that not only SCIs but also OH radicals played important roles in nighttime SA formation in Hong Kong. Due to the challenges associated with directly measuring OH radicals, particularly at low levels during nighttime, identifying reliable surrogate indicators remains essential for practical applications.

However, according to box model simulations, nighttime OH concentrations showed limited variation (Figs. 6c and S10). Observation data from Zou et al.36 also present no significant fluctuations during the night. This stability suggests that a representative value may be sufficient to represent nighttime OH radicals. To explore this, we evaluated several potential indicators, including benzene, toluene, and NOx, to estimate nighttime OH radicals in proxies N (Table 1) and N-PM (see SI). Regardless of the indicator used, the fitted parameter g, representing the initial OH concentration in the evening, remained consistent at 5.9 × 105 molecule/cm3 (Table S3), while the extremely low values of \({a}^{{\prime} }{\times \left[{In}.\right]}^{{b}^{{\prime} }}\) from different indicators had minimal impact on OH radicals estimation. The 25th, 50th, and 75th percentiles of observed SA during the same period were 5.6 × 105, 7.1 × 105, and 8.6 × 105 molecule/cm3, respectively, reflecting low variation in nighttime SA. These results suggest a relatively stable nighttime chemical environment in the coastal background environment36. The fitted OH concentrations were a little higher than the values obtained in model M3-2, which could be partly explained by the higher exponent value of CS than −1 fitted in proxy N. Then, the final expression for Proxy N and Proxy N-PM can be obtained. Figure 7a, b compare the estimated nighttime SA from proxies N and N-PM and observed values, both showing strong agreement, supporting the effectiveness of using a constant nighttime OH value in proxy development.

Fig. 7: Comparison between estimation and measurement.
Fig. 7: Comparison between estimation and measurement.
Full size image

The estimated SA from the proxy of N (a), N-PM (b), A2 (c), and A2-PM (d) versus measured SA from 8th November to 3rd December 2018 at the CD supersite.

Based on these insights, we proposed a piecewise full-day proxy, A2, which uses proxy N for nighttime SA and the daytime proxy DNL1 (Table 1). To maintain consistency, the exponents for CS and SO2 were held constant across both day and night components. Regress results showed an OH radical concentration of 5.7 × 105 molecule/cm3, with parameter values for CS and the alkene oxidation term similar to those in proxy N. Parameters for daytime function (a = 4.0 × 104 and b = 0.87) also aligned well with those used in earlier non-linear functions in Table 2. Besides, proxy A2-PM constructed with PM2.5 concentration was also obtained (Table 3). Figure 7c and Fig. 7d show strong linear correlations between the simulated SA from both proxies and the observed SA, validating their accuracy.

The constructed proxies N, N-PM, A2, and A2-PM were further validated using data collected from 4th to 19th December 2018 at the CD supersite and from 14th November to 19th December 2022 at the HKUST supersite. As shown in Fig. S11, all proxies maintained good correlations with observed SA, confirming their robustness and applicability across different locations and timeframes. The results underscore the significance of generated OH radicals in nighttime SA formation, with concentrations averaging between 4 to 6 × 105 molecules/cm3. Despite the challenges in developing reliable nighttime SA proxies, N and N-PM effectively reproduced nighttime SA dynamics. The piecewise proxies A2 and A2-PM offer practical solutions for estimating SA throughout the entire day and show promise for broader application in Hong Kong and similar coastal environments. Further research is needed to better understand nighttime SA formation mechanisms and to refine indicators for OH radicals, which remain critical to improving proxy accuracy and usability.

Atmospheric implication

This study provides a comprehensive analysis of SA formation in Hong Kong, combining field measurements, box model simulations, and statistical proxy development. A series of localized linear and nonlinear daytime proxies tailored for subtropical coastal environments were constructed, with DNL2 (incorporating SO₂ and SR) demonstrating the best performance. In addition, for cases where CS data is unavailable or uncertain, PM2.5 was found to be a practical substitute, enhancing the usability of the proxies for long-term or resource-limited monitoring efforts. Based on the important role of nighttime OH radicals on SA formation, new nighttime proxies (N and NPM) and full-day piecewise proxies (A2 and A2-PM) were developed.

$${A}_{2}=\left\{\begin{array}{rcl}\left[{SA}\right] &=& 4.0\times {10}^{4}\times k\times {{SR}}^{0.87}\times \left[{{SO}}_{2}\right]\times {{CS}}^{-0.8}\,(D)\\ \left[{SA}\right] & = & \left(k\times 5.7\times {10}^{5}+1.7\times {10}^{-30}\times \left[{O}_{3}\right]\times \left[{Alkene}\right]\right)\times \left[{{SO}}_{2}\right]\times {{CS}}^{-0.8}\,\left(N\right)\end{array}\right.$$
$${A}_{2-{\rm{PM}}}=\left\{\begin{array}{rcl}\left[{SA}\right] &=& 5.4\times {10}^{6}\times k\times {{SR}}^{0.91}\times \left[{{SO}}_{2}\right]\times {{PM}}^{-0.58}\,(D)\\ \left[{SA}\right] & = & \left(k\times 1.0\times {10}^{8}+1.6\times {10}^{-28}\times \left[{O}_{3}\right]\times \left[{Alkene}\right]\right)\times \left[{{SO}}_{2}\right]\times {{PM}}^{-0.58}\,\left(N\right)\end{array}\right.$$

Although the current work focuses on the autumn–winter transition period, the underlying physical and chemical relationships embedded in the proxies remain valid across seasons. The observed air mass types and meteorological conditions were representative of typical regional patterns, but the influence of precursor levels should be evaluated across seasons to confirm robustness. While the proxies are optimized for Hong Kong’s coastal subtropical conditions, their structure and variable dependence suggest transferability to similar maritime or coastal atmospheres with similar chemical and meteorological regimes. The ranges of key parameters in Fig. 2 can serve as diagnostic indicators to assess whether these proxies are suitable for use in other locations. Overall, this study offers several robust and practical proxy models for estimating SA concentrations and provides a framework for extending SA analysis to regions and periods lacking direct measurements. Further research is still needed to better understand the role of nucleation in the SA sink and the underlying mechanisms of nighttime OH radicals and SA productions, therefore, developing a complete understanding of the fate and impacts of SA in the atmosphere.

Methods

Field measurements

Two field campaigns were conducted to measure atmospheric SA concentration in Hong Kong during fall and winter. The first campaign took place at the CD supersite from 1st November to 19th December 2018, and the second campaign was held at the HKUST supersite over the same periods in 2022. The CD supersite, located at the south tip of Hong Kong Island and faced the South China Sea, serves as a regional background station38. It is surrounded by vegetation and situated about 10 km from the nearest urban center. The site is influenced by local biogenic emissions, marine air masses from the South China Sea, and long-range transport from the eastern coastal area of south China and the Pearl River Delta (PRD) region. Previous back-trajectory analyses have shown the site alternates between clean marine conditions and polluted continental outflow39,40. The HKUST supersite is located on the university campus, positioned on a coastal cliff facing Port Shelter and Silver Strand Bay. This site represents a typical suburban environment with minimal nearby residential or commercial activity. It experiences similar regional air mass influences as the CD supersite. Unlike the CD supersite, the HKUST site is also affected by on-campus human activities and constructions41.

Gaseous SA concentrations were measured by the nitrate-based Time-of-Flight Chemical Ionization Mass Spectrometer (ToF-CIMS, Aerodyne Inc., USA) in both campaigns. A HR version and a long version of ToF were employed at the CD supersite and HKUST supersite, respectively39, with higher mass resolution (9000 for m/z > 200) for the latter than the former (5200 for m/z > 200). Both resolutions were sufficient for the precise identification of targeted ions. Both instruments share the same ionization and detection principles. Primary ions, (HNO3)nNO3- (n = 0,1,2), were generated by exposing a mixture of HNO3 stream (5–8 mL/min) and sheath gas (20–30 L/min) to soft X-ray radiation. Ambient air was pumped through a sample inlet at a flow of 10 L/min into the center of a coaxial laminar flow reactor, where SA was ionized by (HNO3)nNO3- to form HSO4-, H2SO4NO3-, and H2SO4HNO3NO3-, and then detected by ToF mass spectrometer. Instrument calibration was performed using a custom-built SA calibrator onsite before and at the end of each filed campaign42,43. The calibration methodology is described in detail in our previous studies39,40. Briefly, known SA was produced by SO2 reacting with certain OH radicals, which were generated via the photolysis of H2O under the irradiation of UV lamp (185 nm). Then, the calibration factor of SA (\(\gamma\)) can be obtained from Eq. 2, which was 9.88 × 109 and 6.14 × 109 molecule cm−3 in CD and HKUST campaigns, respectively. The limit of detection, calculated as three times the standard deviation of the zero signal, was 1.2 × 105 and 3.1 × 104 molecules cm−3 in CD and HKUST campaigns, respectively. Zero air was injected periodically, and the background signal was subtracted before the calculation.

$$\left[{{\rm{H}}}_{2}{\mathrm{SO}}_{4}\right]=\gamma \times \mathrm{In}\left(1+\frac{\left({\mathrm{HSO}}_{4}^{-}+{\text{H}}_{2}{\text{SO}}_{4}{\mathrm{NO}}_{3}^{-}+{\text{H}}_{2}{\text{SO}}_{4}{\mathrm{HNO}}_{3}{\mathrm{NO}}_{3}^{-}\right)}{{\sum }_{{\rm{j}}=0}^{2}{\mathrm{NO}}_{3}^{-}\cdot {\left({\mathrm{HNO}}_{3}\right)}_{{\rm{j}}}}\right)$$
(2)

Volatile organic compounds (VOCs) were measured by a Proton Transfer Reaction ToF-MS (PTR-QiToF-MS, Ionicon Inc., Austria)44,45 and online gas chromatography at the CD supersite (Syntech Spectras GC955 Series 600/800, The Netherlands), and a VOCUS-PTR-ToF-MS (VOCUS 2 R, Aerodyne, Inc., USA) at the HKUST supersite. Both PTR-ToF-MS were periodically calibrated using multi-component VOC gas standards, which included 27 and 15 types of VOCs (covering key VOCs such as benzene, toluene, and isoprene) for the campaigns in 2018 (RESTEK canister, IONICON Analytik, Austria) and in 2022 (Apel-Riemer Environmental, Inc., USA), respectively45. Concentrations were retrieved using the sensitivities of calibrated species and theoretical estimation through proton-transfer reaction rate coefficients (kPTR) for uncalibrated species41. Trace gases, including NO, NO2 O3, SO2, and CO (42i, 49i, 43i, and 48i, Thermo Fisher Scientific Inc., USA) and meteorological parameters (wind, temperature, RH, and SR) were also concurrently measured at both sites. PM2.5 mass concentration was detected at the CD supersite by a SHARP monitor (model 5030, Thermo Scientific Inc., USA), and the particle number and size distribution ranging from 14.1 to 736.5 nm was measured using a Scanning Mobility Particle Sizer spectrometer (SPMS 3938, TSI, USA). Unfortunately, particle size and number concentration were not available at the HKUST supersite during the campaign. Instead, PM2.5 concentrations were obtained from the nearby Tap Mun station, where previous studies have shown to correlate closely with HKUST measurement under similar environmental conditions46.

Box model simulation

An observation-based photochemical box model (PBM) built on the Master Chemical Mechanism (MCM v3.3.1) (http://mcm.york.ac.uk)47,48,49,50 was employed to simulate the formation of gaseous SA and OH radicals at the CD supersite from November 8 to December 3, 2018. The PBM simulates atmospheric chemical processes of inorganic and organic species in a zero-dimensional framework, and concentrations of trace gases, VOCs, and meteorological parameters are used as time-resolved constraints, while the model integrates coupled gas-phase reactions to reproduce the evolution of target SA species. This approach has been successfully used in our previous studies39,51,52. In addition to the default reaction mechanisms for SA formation in the MCM, a condensation sink (CS) term of SA was included as baseline case (M0, Table S2). Further modifications (M1, M2, and M3 in Table S2) were also made to improve the model performance and investigate the gaseous SA formation mechanisms, including adjusted reaction rates for SCIs with SO2 based on experimental results (M1) and additional reactions with H2O and NO2 (M2). An additional nighttime OH source (with generation rate of 3 × 106 to 5 × 106 molecules/cm3/s) was also added to explore the nighttime SA formation in M3. Measured SO2, VOC species, other trace gases, CS, and meteorological parameters were input into the model every 10 min to constrain the simulations. The CS values ranged from 0.005 s−1 to 0.07 s−1, with an average value of 0.02 ± 0.01 s−1. Details on the CS calculation are provided in the next section. Further information on the model framework and input parameters can be found in our previous study, which used the same model setup to investigate the formation and sink of nitro-phenolic compounds during the same field campaign39.

Proxy construction

As mentioned above, atmospheric SA is primarily formed through the oxidation of SO2 by OH radical and SCIs, while its removal is controlled by the CS and nucleation processes. The changes in gaseous SA concentration can be expressed as Eq. 3:

$$-\frac{d\left[{SA}\right]}{dt}=k\times \left[{OH}\right]\times \left[{{SO}}_{2}\right]+{k}_{{SCIs}}\times \left[{SCIs}\right]\times \left[{{SO}}_{2}\right]-{CS}\times \left[{SA}\right]-{k}_{n}\times {\left[{SA}\right]}^{2}$$
(3)

Here, [SA], [OH], [SO2], [SCIs] are the concentrations of SA, OH radicals, SO2, and SCIs, respectively, with unit of molecules/cm3. k and \({k}_{{SCIs}}\) denote the reaction rates of SO2 with OH radical and SCIs, and \({k}_{n}\) represents the apparent consumption rate of SA via nucleation. The temperature-dependent reaction rate k (cm3/molecule/s) can be calculated from Eq. 4, Eq. 524. The constants are defined as: k1 = 4 × 10−31, k2 = 3.3, k3 = 2 × 10−12, k4 = −0.8, and [M] = 0.101 × (1.381 × 10−23 × T)−1. The average reaction rate \({k}_{{SCIs}}\) reflects the contribution of multiple types of SCIs present in the atmosphere:

$$k=\frac{A\times {k}_{3}}{(A+{k}_{3})}\times \exp \left({k}_{4}\times {\left[1+lo{g}_{10}{\left(\frac{A}{{k}_{3}}\right)}^{2}\right]}^{-1}\right)$$
(4)
$$A={k}_{1}\cdot [M]\cdot {(300/T)}^{{k}_{2}}$$
(5)

CS was calculated from Eq. 6, where D is the diffusion coefficient of gaseous SA, \({r}_{i}\) is the geometric mean particle radius in size bin i, and Ni is the particle number concentration53. The transitional correction factor, \({\beta }_{{M}_{i}}\) is expressed as Eq. 7. Knudsen number (Kn) is defined as \(\frac{{\lambda }_{v}}{{r}_{i}}\), with \({\lambda }_{v}\) being the mean free path. The mass accommodation coefficient (α) is typically assumed to be 153:

$${CS}=4\pi D\mathop{\sum }\limits_{i}{\beta }_{{M}_{i}}{r}_{i}{N}_{i}$$
(6)
$${\beta }_{M}=\frac{{K}_{n}+1}{0.377{K}_{n}+1+\frac{4}{3}{\alpha }^{-1}{{K}_{n}}^{2}+\frac{4}{3}{\alpha }^{-1}{K}_{n}}$$
(7)

To simplify the system, the formation and consumption of gaseous SA were assumed to be in a steady state20. Under this assumption, Eq. 3 can be solved as Eq. 8. If nucleation is neglected, the simplified form becomes Eq. 9. Given that OH oxidation is the dominate SA source during the daytime, SCIs can be neglected, and the equation further simplifies to Eq. 10. Neglecting nucleation, which is a valid assumption on most non-NPF days, and the relatively minor contribution of SCIs during daytime conditions yields the simplest and widely used daytime SA proxy20,24,27,30, shown in Eq. 11:

$$[{SA}]=-\frac{{CS}}{2\times {k}_{n}}+{\left[{\left(\frac{CS}{2\times {k}_{n}}\right)}^{2}+\frac{\left[S{O}_{2}\right]}{{k}_{n}}\times \left(k\times [OH]+{k}_{SCIs}\times \left[SCIs\right]\right)\right]}^{1/2}$$
(8)
$$\left[{SA}\right]=(k\times \left[{OH}\right]+{k}_{{SCIs}}\times \left[{SCIs}\right])\times \left[{{SO}}_{2}\right]\times {{CS}}^{-1}$$
(9)
$$[{SA}]=-\frac{{CS}}{2\times {k}_{n}}+{\left[{\left(\frac{CS}{2\times {k}_{n}}\right)}^{2}+\frac{\left[S{O}_{2}\right]}{{k}_{n}}\times (k\times [OH])\right]}^{1/2}$$
(10)
$$\left[{SA}\right]=k\times \left[{OH}\right]\times \left[{{SO}}_{2}\right]\times {{CS}}^{-1}$$
(11)

Direct measurements of OH radicals and SCIs remain technically challenging and less practical than SA measurements. However, previous studies indicate that daytime OH radical levels correlate strongly with SR, particularly UVB, due to its formation via O3 and HONO photolysis, and the reaction between HO2 with NO26,30,54,55. Since only SR was measured during the two field campaigns, SR was used as an indicator for daytime OH concentrations in the present study. SCIs, mainly generated from the ozonolysis of alkenes56, were represented using the product of O3 and alkene concentrations ([O3] × [Alkene])17,20,24,26,27,30. At both CD and HKUST sites, isoprene was the dominant alkene species observed during the campaigns, with average concentrations of 0.81 ± 0.48 ppb and 0.28 ± 0.21 ppb, much higher than monoterpenes. Moreover, isoprene reacts rapidly with O3 and produces SCIs with high yield, making it a reasonable proxy variable for alkene-derived SCIs under subtropical coastal conditions41,45.

Then, a series of gaseous SA proxies can be developed based on Eq. 1, Eq. 8 to Eq. 11, and the indicators for OH radical and SCIs. For the daytime, a general proxy expressed as Eq. 1 can be used. Three linear regression-based proxies (DL1-DL3) can be obtained based on Eq. 1 and Eq. 11, as summarized in Table 1. In these equations, k is derived from Eq. 4. Proxy DL1 is based on the fundamental theory and assumes a linear relationship between SR and OH. DL2 represents a simplified proxy excluding CS, based on findings from Kuerten et al.27 that this omission still performed adequately when CS variability was small. DL3 incorporates RH, suggested by Mikkonen et al.24, who noted its influence on aerosol distribution and CS. To account for the non-equilibrium conditions and the existence of other influencing and competing pollutants in the SO2 + OH reactions24, nonlinear proxies with individual powers for proxy variables were also derived and tested, including proxies DNL1-DNL5. In addition, considering nucleation as a sink of SA, DNL6 could be derived from Eq. 1017.

The CS calculation requires continuous measurement of SMPS; however, such instrumentation is not available at many air quality monitoring stations, and long-term data is often limited. This limitation reduces the practical applicability of CS-based proxies. Despite greatly affected by the size distribution, CS has been shown to correlate positively with PM2.5 concentration in our measurement (Fig. S6). Therefore, alternative proxies (DNL1-PM, DNL4-PM, and DNL5-PM) were constructed by substituting CS with PM2.5 in the corresponding nonlinear proxies, i.e., DNL1, DNL4, and DNL5. PM2.5 data were collected with a time resolution of 2 min and subsequently averaged to 10 min to match the proxy construction interval.

In addition, to simulate the SA concentrations through the entire day, we also applied a function derived from Eq. 8, designated as proxy A117. However, regression results indicated that A1 was insufficient for capturing SA variability over a full day, particularly during nighttime periods. This limitation will be further explored in the results section. The inadequacy of A1 stems from its exclusive consideration of SCIs as nighttime contributors to SA formation. Several studies have shown that OH radical, typically associated with daytime photochemistry, may also be generated during the night, through the oxidation of VOCs by NO3 radical and O3, resulting in non-negligible nighttime OH values. For example, Zou et al.36 reported an average nighttime OH concentration of 5.1 ± 1.8 × 105 molecule cm−3 during the 2020 autumn field campaign at the CD supersite. Similar findings from Myers et al.57 highlight the significance of nighttime OH in SA formation in the Amazon forest.

Given these insights, we attempted to identify suitable indicators of nighttime OH levels that could be used for constructing a nighttime proxy for SA, and more detailed discussions are provided in the SI. By replacing [OH] in Eq. 9 with an empirical expression of \({a}^{{\prime} }{\times \left[{In}.\right]}^{{b}^{{\prime} }}+g\), proxy N can be derived for nighttime SA estimation, in which [In.] is the concentration of selected chemical indicators, including benzene, toluene, and NOx (molecule/cm3). To provide a complete daily representation of SA formation, we developed a piecewise function, designated as A2, which combines the daytime proxy DNL1 with c = 1 and the nighttime proxy N. The reason for assigning c to 1 is to keep the exponents of variables coexisted in the daytime and nighttime part of A2 with the same values. Additionally, PM was also tested as an alternative to CS in constructing parallel versions of the nighttime and whole-day proxies, referred to as N-PM and A2-PM, respectively.

To assess the general applicability of these proxies, we divided the collected dataset into two parts: one for the proxies fitting and derivation, and the other for validation. The proxy parameters were derived using data collected at the CD supersite from 8th November to 3rd December 2018, and validation was performed using data collected at the same location from 4th to 19th December 2018. To further test transferability across environments, further verification was conducted using data collected at the HKUST supersite from 14th November to 19th December 2022. Proxy performance was assessed by comparing reconstructed and measured SA concentration, [SA]s and [SA]m. The relative error (RE) was calculated by Eq. 12, where Num. is the total number of the SA samples19.

$${RE}=\frac{1}{{Num}.}\sum \frac{|{\left[{SA}\right]}_{s}-{\left[{SA}\right]}_{m}|}{{\left[{SA}\right]}_{m}}$$
(12)