Introduction

Fine particulate matter, i.e., PM2.5 (particulate matter with aerodynamic diameter less than or equal to 2.5 micrometers), continues to be one of the air pollutants of greatest interest in the scientific community. This airborne pollutant can be emitted from multiple primary emission sources, while secondary production occurs through complex gas-particle conversion processes, and during its transport, PM2.5 chemical composition can change substantially with the variations in meteorological conditions to which it is subjected1,2,3,4,5,6,7,8. One component of PM2.5 of great relevance due to its implications on health impacts and climate forcing is black carbon (BC)9,10,11,12,13,14,15,16. BC is predominantly found in the fine fraction of the PM, and its main origin is combustion processes17,18,19. Thus, assessing the relationship between BC and PM2.5 has become more relevant than with the coarser fraction (i.e., PM10, particulate matter with aerodynamic diameter less than or equal to 10 microns)20.

Current air quality mitigation efforts center their attention not only on the mass concentration of PM2.5, but also on its composition. Several studies have reported BC levels in urban centers across the globe21,22,23,24,25,26,27,28,29becoming an increasingly area of interest. When addressing long-term variations of BC, most of the studies now rely on the use of equipment that determines BC concentrations in almost real-time using optical techniques27,30,31,32,33,34in contrast to other methods where fine particulate matter is captured in filters, typically during 24-hour intervals, and then it is subsequently subjected to laboratory analyses for the determination of BC and other chemical species of interest35,36. Having BC measurements with fine temporal resolution allows relating the levels of this pollutant with other gas-phase chemical species that are monitored by routine air quality stations, as well as the mass concentration of PM2.5 or even PM10, (as PM2.5-to-PM10 ratios or other indicators related to PM10 can provide further information on source contribution) and thus offers more information about the possible origin of the BC material37,38.To carry out a preliminary source contribution analysis of BC ambient air levels, several studies have used aethalometers32,39,40,41,42. These instruments are used frequently to quantify BC emitted directly from emission sources, BC levels in ambient air, or even to assess the contribution of fossil fuel and biomass burning sources to BC levels observed at a receptor site. Aethalometers are based on a method in which the optical attenuation of light by particles deposited onto a filter is correlated to its BC mass concentration through wavelength-dependent mass-specific attenuation cross-section values43,44. Near real-time reporting of BC mass concentration is achieved by having a filter strip that advances the collection spot with a predetermined frequency. Of particular usefulness are aethalometers that perform measurements at different wavelengths. Absorption by BC particles is dominant at the 880 nm wavelength, but having attenuation readings at other wavelengths has enabled attempts to assess the presence of Brown Carbon (BrC) within the sampled particulate matter34. However, estimating BrC content from multi-wavelength attenuation is not direct and involves assumptions regarding the Absorption Ångström Exponent (AAE) and source-specific optical properties, which can vary significantly depending on the type of combustion material and aging processes in the atmosphere. These multi-wavelength measurements have also allowed the development of algorithms that attempt to perform source contribution estimations of observed BC levels, typically distinguishing between fossil fuel combustion and biomass burning. While both originate from combustion processes (i.e., are pyrogenic sources), biomass burning is often associated with anthropogenic activities such as residential cooking or open waste burning21,41.

Despite the considerable number of studies that have used the “aethalometer model” 41 to estimate contributions of fossil fuel and biomass burning combustion sources to observed BC levels, uncertainties still exist in applying these methodologies, particularly associated with the assumed value of the AAE related to biomass burning. For example, Healy et al.23 found that using a typical value for the biomass burning AAE led to a large underestimation in their biomass contribution estimates during wildfire events, as this parameter has proven to have a broader range of values depending on the biomass being burnt. Similarly, Ravi Kiran et al. 40 recognize that assigning a value to the AAE for biomass burning can be challenging and inherent uncertainties can cloud the attribution estimates, and thus proposed an empirical method to obtain estimates of the model parameter based on light attenuation values at other wavelengths. Thus, it is necessary to explore ways to obtain robust source attribution estimates that can use the advantages of aethalometer equipment: relatively low cost and high frequency near real-time measurements.

Here, we propose complementing the “aethalometer model” with ambient BC and CO data to explore possible values of the biomass burning AAE coefficient for source contribution estimates. CO and BC are associated mainly with combustion sources, and previous studies have used the BC-to-CO ratio to identify air masses impacted mainly by fossil fuel combustion (low values) or biomass burning (high values) under urban37,45,46 and regional conditions47,48. In the methodology proposed, ambient values of BC/CO ratios characteristic of source conditions are used as an independent route to the aethalometer method to derive values for the biomass burning AAE coefficient. As a case study, we explore one-year air quality data (gas-phase and particulate matter), including BC mass concentrations from a 7-wavelength aethalometer, obtained by routine monitoring stations at two sites in the Monterrey Metropolitan Area, arguably one of the most polluted urban conurbations in Latin America. The two sites represent different conditions: a suburban site typically upwind of major anthropogenic sources and an urban site mostly under downwind conditions. First, we analyze the collected air quality data to have an overall description of the potential sources that impact the selected monitoring sites, followed by an analysis of the behavior and variations of BC levels throughout the year, and finalize with an analysis of the source contributions obtained from the proposed method against the conventionally used approach.

Methods

Urban case study

The Monterrey Metropolitan Area (MMA; altitude of 540 meters above sea level [m.a.s.l.], and geographic coordinates 25° 40’ 30.3’’ N, 100° 19.11’ W) is one of the three most important urban areas in Mexico. The geography and demographics of this region, along with its socioeconomic dynamics, contribute to significant challenges in air quality and public health. The MMA has an area of 7440 km2 with a population of over 5 million inhabitants (the second most populated conurbation in the country) and a contribution to the national Gross Domestic Product of 7.9% in 2023 (third in the country)49. It experiences a semi-arid climate characterized by low rainfall and high temperatures for most of the year. The surrounding topography is dominated by a + 3,000 m.a.s.l. mountain chain (the Sierra Madre Oriental), contributes to the formation of microclimates within the city, with significant variations in temperature and humidity between urban areas and nearby rural areas. These conditions play an important role in the air quality of the MMA, as the mountainous topography limits the dispersion of pollutants, which can lead to the accumulation of harmful particles and gases in the atmosphere, especially during thermal inversion events. The uncontrolled urban growth (2.6 times between 1990 and 2020)50 and heat-absorbing building materials significantly raise local temperatures, fostering the presence of a heat island effect in the conurbation51.

Likewise, with a constantly increasing population and insufficient transportation infrastructure, the region faces challenges managing exhaust gas and particulate emissions from vehicular traffic. In addition, industrial activity, including the production of steel, cement, and other energy-intensive manufactures, contributes to the mix of pollutants that affect air quality in the area. Air quality in the Monterrey Metropolitan Area (MMA) is monitored through the Integrated Environmental Monitoring System (SIMA, for its name in Spanish: Sistema Integral de Monitoreo Ambiental), which comprises 15 stations strategically distributed across the urban area. These stations provide hourly measurements of pollutants and meteorological parameters relevant to this study. A total of 14 variables were considered, including the criteria pollutants CO, SO₂, NO₂, O₃, PM₁₀, and PM₂.₅, as well as NO and BC. In addition, six meteorological variables were analyzed: temperature (T), relative humidity (RH), atmospheric pressure (P), solar radiation (SR), wind speed (WS), and wind direction (WD). Among the pollutants of interest, BC stands out due to the limited information available for the MMA23. Its main sources in the region include vehicular transport, industrial activities, and domestic combustion, highlighting the need to implement targeted strategies to reduce emissions and mitigate their environmental and health impacts52.

Sampling sites

The BC was collected from January to December 2022 at two sampling sites of the SIMA network, located in the municipalities of Apodaca and Santa Catarina, belonging to the MMA (Fig. 1). The Apodaca (APO) station (25° 44’ 29.81” N, 100° 18’ 60.00” W, 612 m.a.s.l.) is located northeast of the MMA. The Apodaca municipality is recognized as an important industrial center where companies in the metals manufacturing, automotive, aerospace, metalworking, and plastic products sectors exist. Vehicular traffic in Apodaca is moderate to intense, where heavy-duty vehicles with diesel engines stand out. In addition, burning biomass to cook food on the grill is a common activity in this municipality53. On the other hand, the Santa Catarina (SCT) station (25° 40’ 48.00” N, 100° 27’ 36.00” W, 691 m.a.s.l.) is in an area of moderate vehicular traffic with a significant contribution from diesel engine vehicles. It is a municipality recognized for being part of the dynamic industrial corridor of the northeast, housing a wide range of industries, including chemical, petrochemical, metalworking, electronics, construction, and manufacturing of ceramics and glass. Historically, the highest average PM concentrations in the MMA have occurred in the SCT site. Besides the local emission sources, this is related to the fact that the prevailing wind patterns in the basin are from East to West, making SCT a downwind site most of the year.

Fig. 1
figure 1

Depiction of the monitoring sites used in this study (Source: Esri, Maxar, Earthstar Geographics, and the GIS User Community).

BC monitoring was carried out using two Magee Scientific AE33 aethalometers, with a resolution of 1 ng m− 3, DL of < 0.005 µg m− 3 at one h, sensitivity of 0.03 µg m− 3 to 1 min and airflow of 5 L min− 1. A neutral density optical filter kit was used to perform the calibration of the equipment. The samples were collected using a PM₂.₅ inlet head. The measurement of light attenuation on the instrument was carried out at seven wavelengths: 370, 470, 520, 590, 660, 880, and 950 nm. The aethalometer performs measurements at two points in parallel to avoid bias due to filter loading effects54,55. The BC data used in this study consist of hourly averages computed in real time by the instrument’s internal software, which applies automatic corrections for filter loading and multiple light scattering. The aethalometers were installed inside SIMA monitoring stations, which have their sampling inlets at an approximate height of three meters above ground level. Air pollutant concentrations and meteorological variables values (1-h averages) were downloaded from the SIMA data clearinghouse (http://aire.nl.gob.mx/) for QA/QC ready values.

Data analysis

Python 3 was employed as the programming language, incorporating libraries such as NumPy56Pandas57SciPy58and Matplotlib59. All analyses were conducted using the Google Colab platform (https://colab.research.google.com/), which provided an interactive and reproducible computing environment. These tools facilitated efficient data preprocessing and enabled the application of advanced statistical methods. The methodology was organized into two sequential stages: data preprocessing and statistical analysis, which are described below.

Data preprocessing

Hourly air quality and meteorological data were collected from January to December 2022 at two monitoring stations mentioned in Sect. 2.2. Given the hourly resolution, 8,760 data points per variable were expected at each site. The actual number of available observations varied across variables and sites. A summary of missing data before and after imputation is presented in Figure S1 (Supplementary Material, SM), enabling a clear assessment of data completeness. The preprocessing phase aimed to ensure the consistency and integrity of the dataset prior to statistical analysis. An initial assessment was conducted to identify outliers, missing values, and other anomalies in the dataset60,61. Outliers were detected using a combination of visual inspection, descriptive statistics, and the interquartile range (IQR) method52. The latter was selected for its robustness to non-normal distributions and its ability to detect atypical values without compromising the integrity of true environmental events. Summary boxplots and diagnostics are provided in the SM (Figure S2).

Missing values were handled using linear interpolation based on neighboring non-missing observations, applied per variable. This method was preferred over more complex imputation techniques, such as K-Nearest Neighbors (K-NN), as it preserves the temporal continuity of the data and avoids introducing artificial correlations. The choice of imputation approach was informed by the extent and distribution of missing data, as shown in Figure S3. For variables with limited and scattered missing values, linear interpolation was applied. However, in cases where extended gaps were present, such as BC in both monitoring stations, and CO in Santa Catarina, missing data were preserved and flagged for interpretation, rather than imputed, to avoid distorting potential trends or episodic events.

Outlier treatment was applied selectively, without generalized removal across variables. Only physically invalid values, such as SR readings exceeding 1 kWh/m² or anomalous temperature spikes, were corrected. Exploratory visual and statistical diagnostics indicated that such anomalies were limited to four variables in Apodaca (T, P, SR, and RH) and one variable in Santa Catarina (SR). All other variables were preserved in their original form to maintain the natural variability of the dataset and to support the exploratory analysis of anomalies potentially associated with maintenance activities, extreme meteorological conditions, or urban dynamics.

In cases where missing values or uncorrectable outliers could not be addressed reliably, the corresponding record was removed to maintain consistency across synchronized variables. That is, the entire row for the affected timestamp was discarded from the dataset. Figure S4 reports the remaining percentage of missing data per variable after data post-processing, while Figure S5 visualizes the final dataset.

Statistical analysis

In the following stage, the Shapiro-Wilk test was applied to determine the data distribution, which is necessary to evaluate the suitability of statistical tests62. Results suggested a deviation from normality (Figure S6), leading to the selection of non-parametric approaches for subsequent analysis. Significant differences between means were evaluated using the Mann-Whitney test, suitable for non-normally distributed data63. A significance level of p < 0.05 was considered.

Associations between variables, including BC concentration, meteorological parameters, and air pollutants, were determined using Spearman’s correlation, with a 0.5 reference value64. Given the exploratory nature of the correlation analysis and the absence of hypothesis-driven testing, no adjustments for multiple comparisons (e.g., Bonferroni or false discovery rate) were implemented, as the results were intended to inform subsequent multivariate analyses rather than support inferential conclusions.

Additionally, multivariate techniques such as cluster analysis and principal component analysis (PCA) were used to identify latent patterns in the data. A total of 14 variables were included in the PCA, consisting of seven air pollutants and seven meteorological parameters, as described in Sect. 2.1, all relevant to air quality dynamics. Prior to PCA, all variables were standardized using z-score normalization to ensure comparability across variables with differing units and magnitudes. Moreover, PCA was implemented using the correlation matrix to account for differences in measurement scales across variables. The optimal number of components and clusters was determined using the elbow method, based on the point where additional components provided diminishing marginal gains in explained variance. Subsequently, the K-means algorithm was applied to the PCA-transformed data, allowing for a detailed exploration of natural groupings65.

The database was structured to account for temporal and seasonal changes, enabling a more accurate interpretation of data and identification of significant patterns. Three main climatological seasons were defined based on previous research and the climatological characteristics of the MMA: dry-cold (November-February), dry-hot (March-June), and hot-humid (July-September)66. Finally, the database was classified according to seasons, months, weeks, weekends, days, hours, and day/night periods.

Source contribution

The “aethalometer model” 41 is a well-known, extensively used method to derive source contribution estimates from aethalometer readings. Briefly, the method assumes that the observed BC is emitted from either fossil fuel or biomass combustion sources. Given that the light-absorption properties of BC emitted from these sources differ, the method proposes using the short- and long-wavelength readings where biomass and fossil fuel BC tend to show their major absorption, respectively. No consensus exists on the pair of wavelengths to use, as some authors have used the 370–880 nm pair33while others have used the 470–950 nm pair42 or even the 470–880 nm pair28,39. In any case, given that the absorption coefficient (babs) at a predefined wavelength (l) is proportional to l− a, where a is the AAE, then, if the 470–950 nm pair is used (as was the case in this study), it can be stated that:

$$\:\frac{{b}_{abs,ff\left(\lambda\:=470\:nm\right)}}{{b}_{abs,ff\left(\lambda\:=950\:nm\right)}}={\left(\frac{470}{950}\right)}^{{-\propto\:}_{ff}}$$
(1)
$$\:\frac{{b}_{abs,bb\left(\lambda\:=470\:nm\right)}}{{b}_{abs,bb\left(\lambda\:=950\:nm\right)}}={\left(\frac{470}{950}\right)}^{{-\propto\:}_{bb}}$$
(2)

where the subindices ff and bb represent prevailing conditions for fossil fuel combustion and biomass burning. The absorption coefficient at each wavelength (measured by the aethalometer) can be estimated from a linear combination of the contributions of each source:

$$\:{b}_{abs\left(950\:nm\right)}={b}_{abs,ff\left(\lambda\:=950\:nm\right)}+{b}_{abs,bb\left(\lambda\:=950\:nm\right)}$$
(3)
$$\:{b}_{abs\left(470\:nm\right)}={b}_{abs,ff\left(\lambda\:=470\:nm\right)}+{b}_{abs,bb\left(\lambda\:=470\:nm\right)}$$
(4)

For closure, the values for aff and abb are assumed. Values for aff tend to be in the range of 0.9 to 1.1, with an average taken in many studies of 1.0, while abb values tend to fall in a wider range (e.g., 1.2–2.5), with many studies assuming a value of 2.0 22,32,42. Finally, source contributions can be estimated by solving (1–4) and using Eqs. (5) and (6):

$$\:{x}_{ff}=\frac{{b\:}_{abs,ff}\left(\lambda\:=950\:nm\right)}{{b}_{abs}\left(\lambda\:=950\:nm\right)}$$
(5)
$$\:{x}_{bb}=1-{x}_{ff}$$
(6)

where xff and xbb are the mass fraction contributions of fossil fuel combustion and biomass burning emissions to the observed BC, respectively. A major uncertainty in this method is the assumption of abb, as it has been proven to be a parameter that strongly varies depending on the biomass burned and the combustion conditions. For example, Rajesh et al. 32 used combustion chambers to obtain source-specific values for abb, finding an average value of 1.87. In their calculation, if the value of 2.0 was used instead of the chamber-derived value, the biomass burning contribution decreased by 14%. Some studies have used filter-derived36 or size-resolved data67 to estimate appropriate values for abb, while others have used mobile micro-aethalometer devices to have controlled conditions while measuring near-source emissions22.

On the other hand, other studies have used the DBC/DCO ratio obtained from ambient air readings to indicate the prevalence of emissions from fossil fuels combustion or biomass burning in the observed air masses or to perform top-down assessments of emissions inventories37,47. Even more, this ratio has been used in combination with trace-type measurements to constrain source attribution estimates29,35. A benefit of using this approach is that it becomes quite inexpensive if concurrent CO measurements to the BC measurements are available. When DBC/DCO is “high”, biomass-burning emissions dominate, while if the ratio is “low” emissions from fossil fuel combustion dominate. Thus, we define (DBC/DCO)amb as the value obtained from ambient air sampling, which could be obtained from a linear combination of the contributions from fossil fuel combustion and biomass burning, i.e.:

$$\:{\left(\frac{\varDelta\:BC}{\varDelta\:CO}\right)}_{amb}={x}_{ff}{\left(\frac{\varDelta\:BC}{\varDelta\:CO}\right)}_{ff}+{x}_{bb}{\left(\frac{\varDelta\:BC}{\varDelta\:CO}\right)}_{bb}$$
(7)

Following Xiao et al. 29, we calculated the denominator in (DBC/DCO)amb as (COamb – CObll), where COamb is the measured CO level at each time interval (e.g., hourly averages) and CObll is the CO baseline level (estimated as the lowest 1.25th percentile of the CO time series measured during the sampling period), while the numerator is directly the measured BC level at the same time interval. For (DBC/DCO)ff and (DBC/DCO)bb, which would represent “pure” fossil fuel combustion or biomass burning conditions, values reported in the literature could be used29,68,69. Here, we explore an approach in which source ratios (i.e., (DBC/DCO)ff and (DBC/DCO)bb) were obtained from ambient data. Assumed source ratios were obtained from the analysis of predefined percentile pairs obtained from the entire time series of the monitoring period. Then, similar to what has been used consistently to characterize tendencies in mobile source emissions from CO/NOx ambient ratios706–9 am observations were used in conjunction with assumed values of (DBC/DCO)ff and (DBC/DCO)bb to derive source contributions as specified in Eq. 7. In addition, obtained ratios at those periods were plotted against wind speed to explore distribution patterns. Finally, the values for (DBC/DCO)ff and (DBC/DCO)bb that better represented the source contributions were taken to establish a “top-down” value of abb. (i.e., solving the set of Eqs. 16, but leaving abb as an unknown).

There are limitations to this method. As with other approaches like the “aethalometer model”, this approximation assumes that only two major sources contribute to the observed BC levels. Second, this method is valid only when:

$$\:{\left(\frac{\varDelta\:BC}{\varDelta\:CO}\right)}_{ff}\le\:{\left(\frac{\varDelta\:BC}{\varDelta\:CO}\right)}_{amb}\le\:{\left(\frac{\varDelta\:BC}{\varDelta\:CO}\right)}_{bb}$$
(8)

which is a condition that can be violated when other processes or conditions unaccounted for dominate the observed levels of pollutants (e.g., secondary production of CO or long-range transport).

Results

Air quality conditions at the sampling sites

Descriptive statistics for the air pollutant concentrations and meteorological variables monitored at the two sampling sites (APO and SCT) are reported in Table 1. Similar information but with the data classified according to the three assumed climatic conditions is presented in Tables S1 and S2 of the SM for brevity. Meteorological variables indicate that the typical East-to-West wind pattern observed historically in the MMA prevails, with overall lower wind speeds at the SCT site, probably because of its higher urbanized condition (i.e., larger surface roughness) and that wind parcels have to traverse across the conurbation. On the other hand, temperatures tend to be lower in SCT as it is closer to the mountain regions of the basin. Concerning air pollutants, SCT reports higher concentrations of PM2.5 and all gas-phase species except CO, while APO registered higher levels of PM10. The pollution levels reported in SCT are associated with a mix of primary and secondary contributions to the air masses, aging as they are transported across the metropolitan area71,72. In the case of APO, local emission sources seem to be driving pollutant concentrations, as the levels of CO and PM10 are higher than in SCT; it is well-known that both species preferentially originate from primary sources.

Table 1 Statistical summary of air pollutant concentrations (including BC) and meteorological variables(a) recorded during the study period (2022) for the two sampling sites. For the APO site, N = 7,659 (for all retained variables), and for the SCT site, N = 6,804).

Preliminary patterns in the BC levels are harder to establish by merely looking at the descriptive statistics. For BC, Table 1 shows that both the mean and median concentrations are higher at APO, suggesting a persistent influence of local combustion sources. This pattern aligns with the known characteristics of the Apodaca site, which is located near major roadways and densely populated urban areas with high vehicular activity, including freight corridors and heavy-duty traffic, all of which are recognized sources of BC emissions. In contrast, SCT exhibits greater dispersion in BC levels, as evidenced by its higher standard deviation and interquartile range, which may be attributed to intermittent industrial emissions or long-range pollutant transport. The Mann-Whitney U test was used to investigate spatial and temporal differences in BC levels. The results were significant when comparing the medians of the BC concentrations between sampling sites. A similar result was obtained when looking at the differences between weekdays vs. weekends and day (6 am to 6 pm) vs. night (6 pm to 6 am) for each site. In addition, a Kruskal-Wallis One-Way ANOVA revealed that the BC median concentrations at both sites differed (p < 0.001) between the three suggested climatic conditions, with higher medians in APO. Exploring further the differences among the climatic stations, Tables S1 and S2, as well as Figure S7, reveal that BC, PM2.5, and CO levels in SCT tend to change more between stations compared to the levels found in APO, with the lowest and highest PM2.5 values occurring in SCT during the hot-humid and cold-dry seasons, respectively. These results indicate that BC levels are not homogeneous in the MMA and can vary substantially according to activity (emissions) and meteorological conditions around the sampling site.

BC relationship with routine monitoring variables

Univariate relationships involving air pollutants and meteorological variables were computed to explore possible differences in source contributions among the monitoring sites. The analysis focused on pollutants commonly associated with combustion processes, including CO, NOₓ, PM₂.₅, and PM₁₀, due to their relevance as species frequently emitted alongside BC. CO and NOₓ are key indicators of vehicular and industrial combustion, while particulate matter, particularly in the PM₂.₅ fraction, often contains BC produced through incomplete combustion. Establishing statistical relationships between BC and these species helps to infer potential common sources, assess emission patterns, and explore meteorological influences across time and space. Figure 2 depicts the corresponding Spearman correlation coefficients (r); high absolute values indicate possible shared origin when dealing with pairs of pollutant species or shared origin process when dealing with meteorological variables (e.g., same emission class, driven by distant transport, etc.).

Fig. 2
figure 2

Spearman correlation coefficients between BC, criteria pollutants, and meteorological parameters at a) APO and b) SCT sampling sites.

Overall, the SCT site exhibits stronger positive or negative correlations among variables than the APO site. At Apodaca, strong correlations (r > 0.5) were obtained between BC and the NOx species (i.e., NO and NO2), while moderate correlations (0.4 < r < 0.5) were obtained between BC and CO, BC and PM2.5 and CO and the NOx species. BC and PM10 correlation were borderline strong (r = 0.49). Other moderate correlations were found for well-known relationships like O3 temperature and O3 and NO2. At Santa Catarina, strong correlations were also obtained between BC and the NOx species and PM10, but r values were > 0.7. In contrast to APO, BC and PM2.5 were highly correlated (r = 0.76), while BC and CO and CO and the NOx species exhibited strong correlations (0.55 < r < 0.60). A relevant difference among sites is the higher correlation that wind speed exhibits with air pollutants, notably BC, NOx species, O3, and PM2.5 at the SCT site. These results could be explained by the fact that the SCT site is highly influenced by air masses that transverse the basin and contain a mix of aged and fresh emissions. In contrast, APO, a near-background site, is immersed in a setting influenced by primary combustion processes.

Dispersion graphs were constructed for both sampling sites to explore further the relationship between BC and the pollutants with which it exhibited higher correlations (i.e., CO, NOx, PM2.5, and PM10). In addition, correlations were obtained for the three climate categories (hot-dry, hot-humid, and cold-dry). Stratifying the analysis by season helps to examine how atmospheric conditions such as boundary layer height, wind patterns, sunlight intensity, and humidity affect the relationship between BC and related pollutants. It also helps to understand how changes in the intensity of emissions or in human activities, such as greater use of heating systems during colder months or increased traffic at certain times affect how closely the concentrations of these pollutants are linked.

At the APO site (Fig. 3), the most consistent correlations across seasons are those of BC with NOx and CO. That is, independently of the season, the slope of the correlation line remains similar, and its uncertainty bounds are low. Correlations between BC and PM are stronger during the cold-dry season, where low wind speeds, low temperatures, and shallow mixing heights prevail, and fall for the hot months, where the conditions shift to higher wind speeds, temperatures, and mixing heights. These patterns suggest that in colder months, primary emissions accumulate due to reduced atmospheric dilution, reinforcing the associations between BC and other combustion-related species. During warmer periods, enhanced dispersion and photochemical activity may alter pollutant ratios and reduce correlation strengths.

Fig. 3
figure 3

Dispersion graphs and corresponding linear regressions of BC and (a) CO, (b) NOx, (c) PM10, and (d) PM2.5 at the APO site by assumed climatic season.

At the SCT site (Fig. 4), the relationships between BC and CO (ρ = 0.51–0.69), NOₓ (ρ = 0.67–0.87), and PM₂.₅ (ρ = 0.69–0.80) remain consistently strong across the three seasons. In contrast, the correlation between BC and PM₁₀ shows notable seasonal variation: it weakens significantly during the hot-dry season (ρ = 0.39), compared to stronger correlations observed in the cold-dry (ρ = 0.66) and hot-humid (ρ = 0.57) seasons. During this season, the increase in PM₁₀ concentrations, driven primarily by dust resuspension, can mask or dilute the contribution of combustion sources, thus weakening the correlation between BC and PM₁₀. These findings emphasize how local environmental factors, such as meteorology and topography, influence the temporal patterns of pollutant interactions by modulating emission dispersion, accumulation, and transformation.

Fig. 4
figure 4

Dispersion graphs and corresponding linear regressions of BC and a) CO, b) NOx, c) PM10, and d) PM2.5 at the SCT site by assumed climatic season.

BC variation patterns

Temporal profiles

The above results provide evidence of the different conditions that foster the BC levels observed at the two selected sites. This information is relevant to contextualize source contribution estimates (e.g., identifying whether fossil fuels or biomass-burning activities impact the site and its transport origin). To delve into BC dynamics, the average temporal BC concentrations were analyzed. On one hand, in 2023 it was estimated that the MMA had 2.2 million registered vehicles (INEGI, 2025c) that transit the main streets during rush hours from 7:00 to 9:00 and 18:00 to 20:00 (Montalvo-Urquizo et al., 2017). Therefore, from Fig. 5, daily profiles at both sites follow typical maximum morning rush hour patterns, though this is 30% more pronounced in SCT. Interestingly, BC concentrations are 31–43% lower at SCT during the late afternoon. At APO, two similar maximums are reached during the morning and evening rush hours. The weekly profiles are similar at both sites. However, APO shows a smooth transition from day-to-day averages, going through a maximum on Wednesdays (probably due to accumulated concentrations) and falling to a Sunday minimum. At the same time, SCT’s weekly pattern is not as smooth, probably because of its response to varying basin-wide emissions. In the other hand, regarding monthly averages, Fig. 6 indicates that the APO site reports the highest BC monthly average among the two analyzed sites for most of the months. This can be associated with the fact that APO is the highest industrial sector of the MMA in terms of infrastructure and development. The year profile follows a decreasing trend from the cold months (APO: 2497 ± 2340 ng/m3; SCT: 2731 ± 2545 ng/m3) (where the highest consumption of fossil fuels and biomass burning tends to occur, as well as the lowest wind speeds and mixing heights), passing to lower concentrations during spring and early summer (APO: 1936 ± 1235 ng/m3; SCT: 1481 ± 1524 ng/m3) (where highest wind speeds occur, as well as mixing heights), with a minimum around the wet-summer time. Differences in the dynamics at both sites appear when comparing the weekend/weekday monthly averages. At APO, the weekday monthly averages are always 36–109% larger than the weekend averages. However, from April to September at SCT, the weekday and weekend monthly averages tend to be similar. These results could reinforce the fact that the APO site is responding preferentially to local emission sources that differ from weekday to weekend conditions, while the SCT responds to a mix of local and transport conditions. All these results are in line with the source apportionment findings from the PCA and cluster analysis.

Fig. 5
figure 5

Hourly (a) and daily (b) average variations of BC concentrations at the APO and SCT sampling sites during 2022. Colored bands represent the 95% confidence interval over the mean.

Fig. 6
figure 6

BC time series at the APO and SCT sampling sites: (a) monthly averages (colored bands represent the 95% confidence interval over the mean) and (b) weekend/weekday monthly averages.

Multivariate analysis

A total of 14 variables were considered in the PCA to further explore the prevailing conditions accompanying BC variations. A total of three principal components (PCs) were extracted (Table 2) accounting for 94% and 83% of the variance observed at SCT and APO, respectively. The loading values higher or equal than 0.30 were considered as source markers for each PC. At SCT, 69% of the variance is explained by the first PC with high BC (-0.35), NO2 (-0.34), NO (-0.37), and WS (0.37) loadings. This PC can be associated with mostly freshly emitted air masses. The second most important PC presents high PM10 (-0.37), SO2 (-30), RH (0.42), and SR (-0.34) loadings and it explained 20% of the variance. This PC can be associated preferentially with aged air masses. Finally, the third PC was related with P (0.64) and T (-0.45) suggesting local stagnation of the pollutants. Interestingly, O3 and CO have a relevant contribution in both PCs, as well as PM2.5.

Table 2 Principal components loadings and accumulated variance at the SCT and APO monitoring sites.

Following similar arguments, for the case of APO, the first PC was related with O3 (0.46), SR (0.30), T (0.42) and WS (0.31) and it explained 42% of the variance. This PC can be associated with secondary air pollutant production. The second PC was related with BC (-0.39), CO (-0.37), NO2 (-0.34), NO (-0.31), PM10 (-0.39) and PM2.5 (-0.34) and it explained 34% of the variances. This PC can be associated with fresh emissions. Finally, the third PC was related with WD (-0.56), WS (-0.32) and RH (0.37) suggesting local dispersion of the pollutants.

Of interest is that the SCT site seems to be dominated by BC fresh emissions and aged emissions of particulate matter followed by local stagnation. In contrast, the APO site seems to be dominated by secondary BC emissions due to both the local influence of CO suggesting incomplete combustion origin and the influence of O3, RH and PM2.5 suggesting photochemical aging. Therefore, it is expected higher concentrations of BC in APO than in SCT due to the influence of more fresh emissions in APO.

To support the results obtained from the PCA analysis, a cluster analysis was performed using the same variables. This analysis provided groups equivalent to those PCs obtained from the PCA, supporting the source of origin of the BC concentrations. The dendogram obtained using the data from both monitoring sites is presented in Figure S8.

Estimation of abb using DBC/DCO ambient data

As presented in Eq. 7, we propose estimating source contributions from fossil fuel combustion and biomass burning by representing the observed DBC/DCO ambient ratios as a linear combination of the source ratios. To accomplish this, we calculated the (0.5,99.5), (1.0,99.0), (1.25,98.75), (2.5,97.5), and (5.0,95.0) percentile bounds pairs of the DBC/DCO time series at the APO and SCT sites constructed using all available data. By doing this, we assume that the extremes of the distribution represent conditions where relatively “pure” emissions of fossil fuel combustion (lower ratio values) or biomass burning (higher ratio values) prevail. Also, by selecting the pairs, we provide a formal method to select the values for (DBC/DCO)ff and (DBC/DCO)bb. This approach has similarities to the well-known EC tracer method, which attempts to discriminate between the primary and secondary organic components of atmospheric aerosols. One approach to obtaining the parameters of the model is to use the low end of the organic carbon (OC) to elemental carbon (EC) ratios distribution as a proxy for the primary (OC/EC) ratio73,74. The EC tracer method has been extended to also utilize the upper end of the OC/EC distribution (“high EC edge method”) in areas where the primary contribution to OC tends to be small75.

Table 3 presents the values of the pairs obtained. We verified how sensitive these values are by reconstructing random samples of the original time series using 70% of the data each time a random sample was taken (without replacement). Ten random samples were created for each site, percentile bounds were calculated, and the 95% Confidence Interval (CI) of the means of the percentile bounds was estimated (Table 3). In all cases, a t-Test indicated no significant difference (p > 0.05) between the mean of the percentile bound estimated from the random samples and the value obtained from the whole dataset. In addition, the CI for the bounds belonging to the same tail do not overlap. Thus, indicating that the percentile bounds are robust.

Table 3 DBC/DCO ratios at specific percentile pairs obtained from the time series of observed values at both monitoring sites. Values in parentheses represent the 95% confidence interval of the mean obtained from random samples of the whole time series (see text for details).

Then, we selected the 6–9 am periods to analyze the distribution of the DBC/DCO ratios and to estimate source contributions. In addition, given that previous results indicated that the best correlations between BC and CO were obtained for the cold-dry/hot-dry climatic conditions, the analysis was conducted using January-April data. Figure 7 depicts the behavior of the DBC/DCO ratio with respect to wind speed. At SCT (Fig. 7a), a typical L-shaped pattern is obtained with a wind threshold (elbow of the curve) between 1 and 2 m/s. The highest value obtained for the ratios is around 8.0, and the curve tends to settle with a minimum value of around 0.5. In the case of APO (Fig. 7b), a similar highest value was obtained. However, the L-shaped curve is less well-defined, with ratios between 4 and 6 m/s around 0.5, and then dispersed observations at higher speeds, probably an influence of long-transported air masses. No substantial differences were observed when performing the same analysis separating the weekend/weekday data. We contrasted the results with a similar analysis performed during the 9 pm-12 midnight period, as it could have analogous conditions, as illustrated in Fig. 5. The results for brevity are presented in the SM (Figure S9). The L-shaped curve obtained for the SCT data has a smoother transition and again settles around a ratio of 0.5, with maximum values less than 5.0 (Figure S9a). Results for the APO site data do not present a clear pattern (Figure S9b). Thus, SCT data have higher consistency, and the 6–9 am period seems to better represent the ratio distribution. These results provide an indication that (DBC/DCO)ff could be around 0.5 (µg/m3)/ppmv, while (DBC/DCO)bb could be in the 6 to 8 range.

Finally, we obtained the hourly contribution percentage from fossil fuel combustion and biomass burning sources using Eq. 7 for all the [(DBC/DCO)ff, (DBC/DCO)bb] pairs listed in Table 3 at each monitoring site employing the 6–9 am data. These were regressed against the contributions calculated from the ratios used by others as reported in the literature. The most consistent results were obtained with the (2.5, 97.5) percentile pair obtained from the SCT data. For example, when using (DBC/DCO)ff =0.9 and (DBC/D CO)bb = 15.6 as reported by Xiao et al. 29, the slope of the regression line was 0.88 and the intercept 0.03 (R2 > 0.95) at both sites when the complementary data was estimated using (DBC/DCO)ff =0.4068 and (DBC/D CO)bb = 17.1927. All other pairs reported in Table 3 from either site gave slopes larger than 1.5 or less than 0.5. Even more, as expected, when using (DBC/DCO)ff =0.9, a substantial number of negative contributions were obtained, given the number of data that falls below this value. Thus, given these and previous results, a robust selection of [(DBC/DCO)ff, (DBC/DCO)bb] could be (0.4, 17.2). Clearly, these are average values and proper for the conditions analyzed. Further investigation would be needed if finer temporal resolution estimates were needed, or if large variations occur among yearly conditions.

Fig. 7
figure 7

ΔBC/ΔCO ambient ratios as a function of wind speed for the SCT (a) and APO (b) sampling sites during the 6–9 am periods from January to April 2022.

Discussion

An issue that remains to be further addressed is the appropriateness of the DBC/DCO ratios selected after applying the proposed methodology. Different authors have suggested DBC/DCO ratios based on bottom-up and top-down approaches. Table 4 summarizes some of the value ranges reported in the literature. Notably, the ranges can be quite large, highlighting the difficulty of establishing average values. Nevertheless, the suggested ratios for the MMA seem to be reasonable. For instance, the value of 0.4 for (DBC/DCO)ff is in the mid-range for gasoline vehicles proposed by Wang et al.68. Coal is not consumed in the MMA for industrial activities; this sector is limited to using natural gas. Diesel or fuel oil is used sparsely in the productive sector. One major industrial source of BC is the Cadereyta refinery located 40 km East of the MMA. Thus, most fossil fuel-related BC inside the conurbation could be attributed to the transportation sector. According to official data7675% of the vehicle fleet in the MMA is composed of light-duty gasoline vehicles, SUVs, and less < 3.5 t trucks, while 18% are diesel trucks and buses and > 3.5 t trucks, and the rest are motorcycles. In contrast, biomass burning (in the form of charcoal, wood, or generic biomass waste) is performed across the MMA and is associated with domestic and commercial activities alike (e.g., charbroiling and other food preparation activities)77. Particularly, DBC/DCO ratios from charcoal burning can be relatively high, even higher than those found in coal combustion78. Thus, the DBC/DCO ratio from the mix of biomass burnt could be well represented by the biofuel category reported in Table 4.

The estimated (DBC/DCO)ff and (DBC/DCO)bb source ratios could be compared to emissions estimates, searching for similarities and differences. However, a significant challenge in this respect is the lack of quantitative uncertainty assessment of emission inventories of Mexican cities other than Mexico City79. The most recent report for the MMA indicates the emission of 436 metric tons of BC and 346,829 metric tons of CO per year from mobile sources76which translates to roughly 1.1 (µg/m3)/ppmv68. However, the same report indicates 9,858 metric tons of BC and 3,027 metric tons of CO per year emitted from area sources, translating to approximately 2,600 (µg/m3)/ppmv. This provides evidence of the necessity for further research on BC bottom-up/top-down emission inventory reconciliation efforts.

Table 4 DBC/DCO ratios reported in the literature for different emission sources.

The percentage contributions of fossil fuel combustion and biomass burning obtained from using 0.4 and 17.2 as the values for (DBC/DCO)ff and (DBC/DCO)bb, respectively, were used to estimate hourly values for abb. This step was performed using data from 6 to 9 am and with January to April SCT site records, following the methodology’s previous steps. The median value obtained for abb was 2.34 (95% CI: 2.21, 2.45), while the average value was 2.53 (95% CI: 2.39, 2.67). The distribution of the hourly values for abb is not normal, skewed to the left; thus, we elected to continue the analysis using the median value as it provides a conservative alternative to abb = 2.0. Figure 8 presents the biomass-burning source contribution estimates using aff = 1.0 and abb = 2.0, as suggested in the literature, and changing abb to 2.34, as suggested from the analysis of the ambient air DBC/DCO reported in this study. On average, the contribution from biomass burning decreased by 7.6% at the SCT site and 5.7% at the APO site. The temporal contribution profiles, as expected, do not change, only the magnitude associated with each one. The highest contributions occur during the cold months and decrease during summer. The SCT site presents higher biomass-burning contributions than the APO site, probably related to the accumulation of commercial and domestic emissions that are closely released or transported to the monitoring site. In particular, transport from basin-wide emissions is relevant, as depicted in Fig. 9 and highlighted in Sect. 3. Higher DBC/DCO ratios are observed at SCT from the prevailing wind sector, while for APO, the observed ratios are more evenly distributed from all wind sectors that impact the site.

Fig. 8
figure 8

Monthly average biomass burning emission contributions at the SC site (upper panel) and APO site (lower panel). Colored bands represent the 95% confidence interval over the mean.

Fig. 9
figure 9

BC concentration roses and ΔBC/ΔCO ambient ratios for the SCT (a and a’, respectively) and APO (b and b’, respectively) sampling sites.

Given the tight range of the values reported in Table 3 for (DBC/ΔCO)ff, a sensitivity test was conducted in which the value for (DBC/DCO)bb was fixed at 17.2 and (DBC/ΔCO)ff was modified in the range of 0.28 to 0.51 (i.e., the minimum and maximum values reported for SCT in Table 3). APO values were not used as per the results presented in Sect. 3.4. For each value of (DBC/ΔCO)ff tested, the corresponding abb was obtained, and the new source contributions were estimated. Values for abb were obtained in the range of 2.30 to 2.37. For APO, the maximum difference in the monthly biomass burning source contribution between using abb = 2.34 and abb in the range 2.30 to 2.37 was marginal (-0.42%). A similar result was obtained for the SCT site (-0.54%). These results demonstrate that the results are not that sensitive to small changes in the values of (DBC/ΔCO)ff.

The estimated value of 2.34 for abb is higher than the commonly used value of 2.0. However, reports of values for abb as high as 4.4 are not uncommon80,81,82. Recently, Savadkoohi et al. 83 reported abb values in the range of 2.4 to 3.0 for the Los Angeles -California- Basin (LAB) using a time-resolved multi-species receptor modeling approach combined with aethalometer data. This result is relevant given the relative geographic proximity between the LAB and the MMA and the fact that they share some common characteristics of the major emission sources present. Even more, Acuña Askar et al. 84 analyzed PM2.5 samples obtained at one site of the MMA to determine their optical properties. The AAE values for water-soluble organic carbon (WSOC) extracts in the wavelength range of 300 to 600 nm for winter (7.33 ± 1.02) and summer (6.32 ± 0.78) periods obtained by the authors indicate an important contribution of Brown Carbon, supporting the estimation of a higher abb.

Other studies have reported AAE values for some Mexican cities. Even though no source separation is provided, comparing those values with the results obtained here is relevant. For example, Lara et al. 85 reported average AAE values of 2.15 ± 0.68 and 3.12 ± 1.4 for winter and summer conditions, respectively, in Ciudad Juárez. This northern city shares the international border with El Paso (Texas), a metropolitan region with intense transport (including heavy-duty diesel vehicles crossing the border) and commercial and industrial emissions. On the other hand, Retama et al. 86 reported a two-year average AAE for Mexico City of 1.19 ± 0.14. This average increased to 1.57 ± 0.25 during a 10-day period where the regional wildfires constantly impacted the city. Mexico City has the strictest emission control strategies of the three conurbations. Thus, the approach proposed provides consistent AEE values based on top-down (DBC/DCO)ff and (DBC/DCO)bb estimates, where these also provide guidance on possible areas of improvement in emissions inventory estimates.

Conclusions

The “aethalometer model” is a method that has been used extensively to provide source contribution estimates of BC emitted from fossil fuel combustion and biomass burning. However, the method assumes fixed values for the AAE related to both sources, a condition that has been demonstrated in the literature that does not necessarily hold, particularly for biomass burning. Thus, we propose using ambient ΔBC/ΔCO ratio data to complement the “aethalometer model,” providing an effective way to refine source contribution estimates, particularly by estimating region-specific values for the biomass-burning AEE. By implementing the proposed methodology, we obtained a biomass-burning AAE value of 2.34 (95% CI: 2.21, 2.45), higher than the commonly used value of 2. This result implied non-negligible reductions of the annual average biomass burning contribution to BC emissions in the MMA by 6 to 8%. One benefit of the approach is that only routine air quality measurements are required, making it rather inexpensive compared to approaches that use chemical speciation analyses. On the downside, the method is still limited as it assumes fixed average values for fossil fuel combustion and biomass burning AEE.