Abstract
Particulate matter (PM) stands out as a highly perilous form of atmospheric pollution, posing significant risks to human health by triggering or worsening numerous heart, brain, and lung ailments, and even increasing the likelihood of cancer and premature mortality. Therefore, ensuring accurate monitoring of PM levels holds paramount significance, particularly urban zones of dense population. Still, achieving precise readings of PM concentration demands the use of bulky and costly equipment, typically stationed at widely spaced reference sites. The rise in popularity of low-cost PM sensors as potential substitutes has been noted, although their reliability is hampered by manufacturing flaws, instability, and susceptibility to environmental variations. In this work, we introduce a novel approach to field calibration for cheap PM sensors. Our method integrates multiplicative and additive corrections, with coefficients determined by an artificial neural network (ANN) surrogate. The ANN model accounts for environmental parameters and the sensor’s PM readings as inputs, with its architecture fine-tuned to ensure optimal generalization capability. Additionally, we consider an extended set of input parameters, including local temporal changes of environmental variables, and short sequences of low-sensor readings, to further enhance calibration reliability. We validate our technique using a non-stationary measurement equipment alongside reference data acquired by government-approved reference stations in Gdansk, Poland. The obtained values of coefficients of determination reach as high as 0.89 for PM1, 0.87 for PM2.5, and 0.77 for PM10, respectively, while the root mean square error (RMSE) is merely 3.0, 3.9, and 4.9 µg/m³. Such a performance positions the calibrated low-cost sensor as a potential alternative to stationary measurement equipment.
Similar content being viewed by others
Introduction
Polluted air worsens citizens’ wellbeing and increases morbidity. Some works estimate that nearly nine million premature deaths per year are caused by atmospheric contamination1,2. European Environmental Agency (EEA)3 considers polluted air in urban areas as the predominant health concern related to environmental conditionsleading to chronic diseases and increased mortality4. Most of the current research on PM (Particle Matter) pollutants concentrates on fine PM2.5 particles of diameter below 2.5 μm. This is because small-diameter particles are deemed most harmful, as, once inhaled, they can permeate the lungs at a considerable depth in comparison to larges particles5. Multiple reports unfailingly link PM2.5 with raised incidence of cardiac conditions6,7, cancer diseases8,9, and prematurely born infants10. In 2020, according to World Health Organization (WHO), 96% of European Union’s urban population experienced concentrations of PM2.5 surpassing the admissible concentration of 5 micrograms per cubic meter (µg/m3)3. The primary sources of PM2.511 include household combustion, traffic-related sources, as well as mineral dust emitted at construction sites and industrial processes of increased temperature (especially steel processing)11.
At present, stationary government-approved reference stations are employed for precise monitoring of PM2.5. The most common measurement techniques are particle gravimetric methods, where the molecules are deposited on filters and then stabilized. Finally, the particle mass is quantified by accredited laboratories with the filters weighted before and after sampling. Despite being highly accurate, the said method is expensive and time-consuming. Moreover, it generates data sets of unsatisfactory spatial and temporal resolution, making it difficult to assess air PM2.5 concentrations in their entirety and complexity12,13. As a result, high-cost government facilities are predominantly used as a benchmark for calibrating data collected with the use of affordable low-cost sensors (LCS). This has instigated the development of numerous field calibration techniques. As a result, nowadays, high-cost government facilities are predominantly used as a benchmark for calibrating data collected using affordable low-cost sensors.
LCSs hold significant potential for assessing exposure to ambient pollutants with enhanced spatial density14,15. The benefits of using LCSs include enhancement of spatial coverage, reduced energy consumption, straightforward operation and easy maintenance, as well as relocation flexibility. LCSs may be employed independently or as supplements to existing government stations16. They may also be integrated into dense stationary networks17, deployed within vehicular networks18,19, or utilized as wearable instruments20,21,22 for personal observation.
Inexpensive devices predominantly utilize optical measurement techniques, which provide an approximation of mass concentrations directly measured by reference facilities. The drawbacks of optical particle sensors include decreased accuracy, inferior dependability, measurement inconsistency, but also necessity of calibration23,24,25,26. The basic operating principle of cost-effective commercial PM sensors is based on rapid yet imprecise light scattering27. Hence, significant discrepancies are observed between LCS readings and those of reference stations28, one of the reasons being the fact that increased relative humidity may lead to hygroscopic particle growth, resulting in dry mass overestimation29,30,31. LCSs’ inability to detect particles of diameters lower than a threshold value poses another challenge. Moreover, research conducted in laboratory-controlled settings indicates considerable accuracy variation of concentrations of pollutants across various optical sensors32. Thus, LCSs-acquired data needs to undergo a thorough calibration.
In recent years, a considerable rise in calibration techniques occurred. One of the least intricate approaches is linear regression, where sensor readings are utilized as the sole input33,34,35. Multivariate linear regression is slightly more complicated by including supplementary input variables (e.g., temperature or humidity) in the calibration process36,37,38,39. An alternative method involves gain-offset model40,41,42, which accounts for additive and multiplicative bias. Still, aforementioned straightforward techniques fail to address the nonlinearities of sensors43, which can be successfully handled by machine learning (ML) approaches44,45,46. The reported frameworks include random forest47,48, support vector regression49,50 or gradient boosting methods51,52. Recently, a growing popularity of correction techniques using neural networks (NNs) has been observed, such as Feedforward NNs53, Long Short-Term Memory NNs54,55, Recurrent NNs56,57, and convolutional NNs58,59.
This study aims to introduce a novel methodology for a reliable calibration of cheap PM sensors. Our method involves a collective multiplicative and additive sensor correction, with coefficients determined by an artificial neural network (ANN) surrogate in the form of a multi-layer perceptron (MLP). Environment-related data (temperature, humidity, and atmospheric pressure) and the current PM reading from LCS serve as inputs for calibration. A dedicated hyper-parameter controls the weight distribution between multiplicative and additive scaling, which is optimized alongside the MLP architecture during surrogate training to enhance model generalization. Augmenting correction reliability, supplementary input variables are included, incorporating temporal changes of environmental variables and short time sequences of previous sensor measurements. These additions enable the MLP surrogate to learn typical temporal dependencies between environmental parameters and PM sensor outputs, ultimately improving calibration reliability. Reference data was collected from multiple government monitoring facilities across the city of Gdansk. As demonstrated, all components of our calibration framework contribute to its exquisite performance.
Sensor hardware and software
This section describes a custom-designed measurement platform utilizing affordable PM sensors for outdoor air pollution monitoring. The included tailored hardware and software facilitated data acquisition to compile a comprehensive dataset from the sensors. The procedure for collecting reference measurements is discussed in Sect. Collecting Reference Data. Section ML-based Sensor Correction elucidates the PM sensor calibration approach.
The primary hardware module is the Beaglebone® Blue microprocessor board60 tailored for robotic applications. This board meets the project’s needs due to robust processing capabilities and numerous connectivity options. It is a compact device (87 × 55 mm) featuring an ARM Cortex-A8 processor equipped with 64KB RAM. The board also incorporates a circuit for charging a 2-cell Li-Po battery, enabling the platform to operate autonomously for about 24 h with a 7.4 V, 4400mAh battery. For prolonged experimentation, an external power source ranging from 9 V to 18 V DC can be connected. Long-distance communication, essential in field testing and calibration, was ensured by employing a compact modem61 incorporating a GPS module for geolocation purposes.
The low-cost sensor of choice was a SPS30 Sensirion device62 selected for its affordability, electrical characteristics, and size. At the time of equipment development, the availability of low-cost PM sensors with sufficiently high declared quality was limited, so the sensor SPS30 meeting the requirements was used. As new sensors become available, their evaluation is planned as an extension of the study as part of future work. It uses an optical method to measure PM, where a fan moves air through the laser beam, and a sensor detects PM particles. The sensor can measure PM concentrations from 0 to 1000 µg/m3 and offers specific accuracies for different PM sizes. The measurement system runs on Ubuntu Linux version 18.04 LTS and features two layers of software: (i) drivers implemented in C and Python for interfacing the GSM modem, along with the PM and environmental sensors, and (ii) the primary software (Python), which manages the system’s overall synchronization and operation.
Designed for outdoor use, the platform is encased in a weather-resistant enclosure, as seen in Fig. 1. All components are mounted on a PET-G chassis, produced using 3D printing with fused deposition modeling (FDM) technology, which also was used to create the enclosure and mounting bracket. This design facilitates maintenance by providing convenient access to inner components.
Collecting reference data
Several air quality monitoring stations were established by the municipality of Gdansk, Poland, which are handled by the ARMAG foundation (Agencja Regionalnego Monitoringu Atmosfery Gdansk-Gdynia-Sopot; English: Agency of Regional Air Quality Monitoring in Gdansk metropolitan area)63. The equipment operated by ARMAG serves as the official reference for the entire city area, and no alternative PM data sources are currently available. These stations are kept in air-conditioned containers, and their equipment includes professional-grade devices for automated measurement of key air pollutants: PM1, PM2.5, PM10, carbon monoxide, and nitrogen oxide, alongside environmental parameters (temperature, atmospheric pressure, direction and speed of wind, precipitation, and humidity). For particulate matter quantification, we employ GRIMM #180 Environmental Dust Monitors, utilizing the 90º laser light scattering technique. The measurements procured from these stations constitute the reference dataset for the calibration of cheap PM sensors. Table 1 provides the important details of the GRIMM analyzer. The measurements are taken hourly and are disseminated daily via the ARMAG foundation’s website. To amass data gathered within extended periods, a Python script was devised to automate the extraction of data from the ARMAG’s website into CSV format.
The hardware units employing cheap PM sensors were placed in the proximity of the ARMAG’s reference stations (cf. Figure 2) over a period of about two months (March to May 2023). The equipment was mounted at the reference stations. The units used GSM modems to transfer measurement data to the cloud, from when the data was retrieved in CSV format containing raw sensor readings. This dataset, in conjunction with CSV file comprising reference data, constituted the foundation for the development of the correction model elaborated on in Sect. 4.
Geographic distribution of ARMAG’s air quality monitoring stations in Gdansk, Poland. Map from OpenStreetMap64.
ML-based sensor correction
This section outlines the proposed technique for LCS calibration. The section is structured as follows: Sect. Calibration Task provides a comprehensive formulation of the sensor calibration task. Joint multiplicative and additive scaling is explained in Sect. Affine Output Scaling. Section ANN Calibration Model delves into the calibration model in the form of the artificial neural network and discusses auxiliary calibration inputs. The operational flow of entire framework is detailed in Sect. Calibration Procedure.
Calibration task
The data collected by both reference stations and LCSs has been illustrated in Fig. 3. In this work, we consider particulate matter (PM) pollutions, referred to as PMr.x. The subscript x refers to a particular type of PM, i.e., 1, 2.5, or 10 (ultrafine, fine, coarse; size in µm). LCS yields also environmental parameters such as temperature, humidity, and atmospheric pressure. Distinct sensors are employed to measure external and internal conditions, reflecting variations attributable to heating from the embedded electronic devices within the hardware unit. Given the sensor’s sensitivity to temperature and humidity, both internal and external parameters are considered as calibration factors to bolster the calibration process’s dependability. The notation employed to represent the outputs of the reference stations and LCSs has been consolidated in Fig. 3(c).
We will denote as N the total number of collected data samples. The dataset is divided into the training and testing parts of the sizes Nb and Nt samples, respectively; Nt is set to be about 20% of N. The following notation will be employed:
-
PMr.x(b.j), j = 1, …, Nb – reference training samples;
-
PMs.x(b.j), j = 1, …, Nb – LCS training samples (PM values);
-
v(b.j), j = 1, …, Nb – LCS training samples (environmental data);
-
PMr.x(t.j), j = 1, …, Nt – reference testing samples;
-
PMs.x(t.j), j = 1, …, Nt – LCS training samples (PM);
-
v(t.j), j = 1, …, Nt – LCS training samples (environmental data).
The notation used for the calibration model is C(PMs.x,v;p), with p referring to the aggregated parameter vector (e.g., weights of the NN model, see Sect. ANN Calibration Model). The surrogate yields predicted output of the calibrated LCS. Identification of parameters p is carried out to better the alignment between LCS data and the reference readings. The optimal vector p* of parameters of calibration model is obtained by minimizing the loss function (mean square error, MSE):
Low-cost sensor and reference station outputs: (a) reference PMx readings; (b) LCS PMx readings. The supplementary data includes: external temperature and humidity, To, and Ho, respectively, internal temperature and humidity, Ti, and Hi, respectively, as well as atmospheric pressure P; (c) utilized notation; (d) definitions of performance metrics: correlation coefficient, and RMSE.
The corrected LCS reading equals
We use two performance metrics, a coefficient of determination between the corrected LCS and reference data, along with the RMSE error (cf. Figure 3(d) for definitions), which assess the closeness of sensor readings with respect to the reference data.
Affine output scaling
The employed sensor calibration scheme involves affine response scaling with joint additive and multiplicative corrections. By additive correction, we understand a functional dependence between the uncorrected (raw) and calibrated sensor reading, which is in the form of adding a correction term to the latter. Whereas multiplicative correction is multiplying the sensor’s reading by the correction term. The additive correction is preceded by the multiplicative one. The equilibrium between these correction types is governed by an adjustable coefficient.
Figure 4 summarizes the proposed correction approach. Because the LCS output PMs.x is scalar, determination of correction coefficients (multiplicative and additive ones) is non-unique. To ensure uniqueness, we introduce hyper-parameter α, controlling the balance between both types of scaling. This parameter will be optimized simultaneously with identification of the neural network calibration model as elaborated on in Sect. ANN Calibration Model.
ANN calibration model
In our work, LCS calibration utilizes artificial neural networks65. Specifically, we employ a feedforward ANN in the configuration of a multi-layer perceptron66,67. With three fully-connected hidden layers, MLP is flexible enough o while remaining resilient to overfitting—an essential characteristic given the significant dissimilarities between the reference and LCS measurements. In addition to the conventional training of network weights, we also optimize the number of neurons in the hidden layers, along with the affine scaling coefficient α discussed in Sect. Affine Output Scaling.
The training setup of the MLP correction model is as follows: model is trained via backpropagation Levenberg–Marquardt routine68 (sigmoid activation function, max. 1000 epochs, loss function in the form of mean-square error (MSE), and a random training/testing data split). The model’s inputs consist of environmental parameters (represented as vector v) and the particulate matter measurement from the sensor (PMs.x). The model outputs correction coefficients Aa and Am.
Owing to its architectural simplicity, the network effectively mitigates the inherent noise in PM measurements. Moreover, with quick model training (completed within a few seconds), it becomes feasible to implement a nested optimization procedure. This procedure enables testing of various architectural arrangements and adjusting relevant hyper-parameters to enhance the model’s generalization capability.
Figure 5 presents the calibration model’s identification process involving optimization of the model’s hyper-parameters. Notably, an exhaustive search is conducted within a discrete space of hyper-parameters, encompassing multiple combinations of ANN architectures and coefficient α. For each vector H, the network undergoes fifty training iterations, and the best-performing model is chosen. Numerous training runs are essential due to the random (internal) partitioning of data into training and testing sample.
Auxiliary calibration inputs
For calibration process enhancement, the standard set of calibration model inputs, i.e., environmental variables (vector v) and PM measurements collected by LCS (PMs.x) will also include differentials of the environmental parameters summarized in Table 2. More specifically, we will consider the following quantities Δx = [x(0) – x(–dt)]/dt, with x(t) referring to a parameter measured by the sensor, whereas dt is the time gap between acquiring the readings (here, one hour for both the reference stations and LCS). If the collected differential vector Δv is used as an auxiliary calibration model input, the correction model is referred to as C(PMs.x,v,Δv;p).
Evaluation of differentials requires storing only one additional set of measurements, namely, one previous sensor reading. However, their capture the local temporal changes in environmental conditions, potentially aiding in the prediction of future alterations. Additionally, they may enhance understanding of the dynamics of explicit or implicit factors influencing LCS operation.
During ANN model identification, the employment of differentials results in extension of the training dataset by adding differential vectors Δv(b.j), j = 1, …, Nb, corresponding to all original training samples. Because the samples are allocated sequentially in time, i.e., the sample with index j has been acquired dt later than the sample with index j – 1, we simply have Δv(b.j) = [v(b.j) – v(b.j–1)]/dt, j = 1, …, Nb.
Procedure of identifying the correction model with the meta hyper-parameters H = [NL ML α]T simultaneously adjusted (a) MLP model hyper-parameters and their optimization setup; (b) flowchart of the entire procedure. The ANN model training is carried out for each meta hyper-parameter combination to find the optimal configuration and the best value of the scaling coefficient α.
Another type of auxiliary calibration inputs are short time series of previously collected PMx measurements provided by LCS. More specifically, we consider vectors of the form
where K refers to the time series length, and dt denotes the inter-measurement time interval. Time series is often handled using recurrent neural networks (RNN)69. Nevertheless, here, fixed-length series will be considered, which makes feedforward networks a sufficient and simpler tool. It can also be noted that using K = 1, is equivalent to employing a differential of the primary reading PMx. If the time series vector wK is used as an auxiliary calibration model input, the correction model may be referred to as C(PMs.x,v,wK;p), or C(PMs.x,v,Δv,wK;p) if both wK and differentials are utilized as well.
The reason for integrating the time series data is to enable the calibration model to understand the typical temporal variations in the LCS readings, including its dependency on environmental conditions. This may enhance the reliability of predicting the values of the correction coefficients Aa and Am.
For the purpose of calibration model identification, the training dataset needs to be extended by adding vectors wK(b.j), j = 1, …, Nb, corresponding to all Nb original samples. Because the sample indices are in correspondence with the timestamps of respective measurements, i.e., sample with index j – 1 was acquired dt earlier than sample with index j, sample with index j – 2 was taken dt earlier than that with sample j – 1, and so on, the vector wK(b.j) takes the form of wK(b.j) = [PMs.x(b.j–K) PMs.x(b.j–K+1) … PMs.x(b.j–2) PMs.x(b.j–1)]T.
Calibration procedure
Figure 6 illustrates the PMx measurement process utilizing the corrected LCS readings. The initial step involves preparing the calibration inputs, which, if utilizing differentials and time series, necessitates accessing prior sensor readings stored in a memory unit. Subsequently, in the second step, the MLP calibration model generates predicted correction coefficients, which are subsequently employed to calculate the corrected outputs of the LCS. It should be mentioned that the work70 discusses an alternative calibration procedure employing ANN and time series alignment, where the calibrated model output is produced directly by the surrogate. In contrast, the technique proposed here is primarily based on a combination of multiplicative and additive corrections with an optimizable control factor, which enhances the calibration process flexibility.
Results and discussion
Here, we focus on validating the introduced calibration strategy, applied to the mobile measurement platform. The field calibration process utilizes reference data collected by the equipment of governmental stations outlined in Sect. Collecting Reference Data, along with the LCS data obtained from portable platforms situated near the respective reference stations. Section Training and Testing Data explores the reference and LCS data, including their partitioning into training and testing sets. The experimental setup is explained in Sect. Results, detailing the employed scenarios. Key points of investigation include the effects of affine versus additive-only correction, MLP architecture optimization, and the significance of different calibration inputs, particularly differentials and the sensor’s time series. Numerical results are compiled within this section as well. Section Discussion analyses the calibration process performance and offers an in-depth discussion of the paper’s findings.
Training and testing data
The calibration methodology introduced in Sect. ML-based Sensor Correction was demonstrated using the mobile hardware units of Sect. 2, along with reference data acquired at five stationary monitoring stations located in Gdansk, Poland, as detailed in Sect. 3 of this paper. The reference and LCS data were gathered over a nearly two-month period, spanning from March to May 2023. The developed measurement units were positioned in the vicinity of their respective reference stations. Measurements of PMx and environmental parameters were recorded hourly. The entire dataset was partitioned into training and testing subsets in a 5:1 ratio. Testing data was organized into seven-day intervals, spanning four and six weeks for PM1 and both PM2.5 and PM10, respectively. Detailed information regarding data acquisition and the composition of the training and testing sets can be found in Fig. 7. Meanwhile, Fig. 8 illustrates the combined reference and LCS measurements for all five stations, with testing periods highlighted in grey. It should be mentioned that our main objective in selecting the allocation of the testing data was to ensure that it represents several consecutive periods (rather than individual samples randomly picked up from the data pool). On the one hand, this makes the calibration problem considerably more challenging. On the other hand, the distribution of testing periods corresponds to different PM levels (low, medium, and high), thereby providing a dependable representation of the sensor’s actual working conditions.
It should also be mentioned that the ranges of environmental parameters throughout the data acquisition periods were quite broad. In particular, the recorded internal temperature Ti changed between 7 C° and 35 C°, the range of the external temperature To was − 2 C° to 25 C°. The ranges of internal and external humidity Hi and Ho were 9–49% and 30–82%, respectively, whereas the range of atmospheric pressure P was from 974 hPa to 1030 hPa.
These numbers cover typical environmental conditions that might be encountered in urban areas, especially in central and northern parts of Europe. As mentioned earlier, the internal temperature is higher than the external one as a result of heating by the electronic devices installed in the measurement unit. For the same reason, the internal humidity is lower than the external one.
Results
In this section we compile the results obtained for calibration of the PMx sensors installed on the mobile hardware units of Sect. Sensor Hardware and Software. The results have been gathered for all categories of pollutants, i.e., PM10, PM2.5, and PM1. We considered numerous scenarios presented in Fig. 9. The first two scenarios address different levels of hyper-parameter optimization: (i) fixed α = 1 (corresponding to purely additive correction), and (ii) optimizable α and optimizable neural network architecture (i.e., adjusting the complete hyper-parameter vector H = [NL ML α]T). The remaining scenarios correspond to fully optimized vector H, and different setups concerning calibration inputs. We consider the basic setup with vector v comprising environmental parameters as the only input, usage of differentials Δv, and utilization of time series of previous PMx samples wK acquired by LCS The calibration scenarios referring to PM1, PM2.5, and PM10 were labeled as 1.k, 2.k, and 3.k, k = 1, 2, …, 5, respectively. In each configuration, model identification and optimization were conducted fifty times, with the selection of the best setup based on the achieved loss function value to determine the ultimate MLP model. Multiple iterations were required due to the random internal split of testing and training data employed by the training algorithm.
Table 3 presents a summary of the results obtained for all examined calibration scenarios and types of PM. The table includes the coefficient of determination between the reference and corrected LCS data, as well as the root mean squared error (RMSE), calculated for the testing and testing data sets. It is essential to note that our primary interest lies in the sensor’s performance over the testing data, as it reflects the generalization capability of the calibration process and is crucial for assessing the practical utility of the corrected LCS. Definitions of the coefficient of determination and RMSE can be found in Sect. Calibration Task (Fig. 3(d)).
Reference and calibrated LCS data are presented in Figs. 10 and 11, respectively. Figure 12 displays the PMx samples for selected training intervals, Fig. 11 shows the testing data, and Fig. 12 presents the scatter plots. The most comprehensive setup of each PM type (configurations 1.5, 2.5, and 3.5) is illustrated, which includes full hyper-parameter optimization and incorporates environmental variable differentials, as well as sensor time series, as additional calibration inputs.
Discussion
The performance of the developed calibration methodology was verified by assessing the accuracy of the calibrated LCS in terms of its correlation with the reference data and typical error levels. These metrics are crucial for determining the practical utility of the corrected sensor for monitoring particulate matter pollution. Additionally, we investigated the significance and impact of various constituent parts of the calibration procedure, such as the utilization of affine response scaling, hyper-parameter optimization of the MLP model, and the incorporation of supplementary inputs (temporal changes and series of previous PMx measurements). Observe, that the calibration process poses a significant challenge due to the significant discrepancies between the reference and LCS readings. For instance, the coefficient of determination between the raw sensor measurements and reference data is only 0.40 (for PM1), 0.44 (for PM2.5), and 0.17 (for PM10) Furthermore, the measurements exhibit a wide range, from almost zero to nearly 60 µg/m3, while PMx values vary significantly over short timeframes.
Despite the challenges mentioned earlier, the proposed calibration methodology demonstrates remarkable reliability. For the most comprehensive configurations 1.5, 2.5, and 3.5 (i.e., optimized hyper-parameter vector H, incorporating supplementary inputs), the coefficient of determination reaches impressive levels of 0.89, 0.87, and 0.77 for PM1, PM2.5, and PM10, respectively. As previously noted, PM10 presents the most challenging scenario with an extremely poor correlation (0.17) for raw LCS readings. However, even in this case, our calibration framework was able to produce satisfactory results. These improvements are demonstrated in Figs. 10 and 11, and 12, where visual agreement between reference measurements and corrected LCS data is significantly enhanced in comparison to raw sensor data. Moreover, scatter plots of the corrected sensor exhibit much closer concentration around the identity function in comparison to raw readings. Similar enhancements are observed for RMSE values. For the uncorrected sensor, we have RMSE values of 9.6, 5.6, and 5.3 µg/m3 (testing data); analogical data for corrected LCS is 4.9, 3.9, and 3.0 µg/m3 for PM10, PM2.5, and PM1, respectively. Analysis of the average relative error indicates values of 29, 22, and 18% for PM10, PM2.5, and PM1, respectively, meaning they are practically acceptable.
In addition to evaluating the overall performance of the calibration methodology, we are interested in analysing the relevance of specific components of the procedure. This includes comparing affine scaling to purely additive correction (i.e., coefficient α = 1), assessing the impact of MLP hyper-parameter optimization, and examining the effect of incorporating differentials and time series. As observed in Table 2, varying the coefficient α leads to noticeable improvements compared to conventional additive correction. The average increase in the coefficient of determination is approximately 0.04 for PM1 (setup 1.2 versus 1.1), and around 0.02 for PM2.5 and PM10. Visible enhancements have been also obtained by optimizing parameters NL and ML of an MLP model, resulting in an increase of the coefficient of determination by 0.03 to 0.04 for all PM categories. Additionally, both factors permit reducing RMSE values, their combined effect amounts to approximately 0.5 µg/m3. Further improvements albeit minor ones can be achieved by incorporating environmental parameter differentials as auxiliary calibration inputs (setups 1.3, 2.3, and 3.3 versus 1.2, 2.2, and 3.2). This results in an average improvement of up to 0.01 in the coefficient of determination, and an RMSE reduction of approximately 0.03 µg/m3. Utilization of time series data (setups 1.4 and 1.5, 2.4 and 2.5, and 3.4 through 3.5) is also associated with some benefits: an average improvement in the coefficient of determination of 0.01, and an RMSE reduction of approximately 0.1 to 0.15 µg/m3. However, increasing the time series length K has mixed effects depending on the PM type.
The proposed combination of correction mechanisms, calibration model and its optimization, as well as primary and auxiliary inputs (including supplementary data accounting for local temporal variations of the environmental variables and PMx readings) results in a significant improvement in the accuracy of LCS. This is evidenced by the high values of the coefficient of determination, which approach 0.9 (or 0.8 for PM10), despite the poor dependability of the raw sensor. Similar improvements are observed for the modelling error, which ranges from three to four µg/m3 of RMSE, depending on the PM type. In practical implementation, calibration can be achieved using the built-in computational resources of the portable platform. Alternatively, it can be applied after transmitting the raw sensor readings from the platform (and before making the data available to the end user).
Conclusion
This study presents an innovative approach to efficiently calibrate low-cost particulate matter sensors in field conditions. Our methodology integrates an artificial neural network surrogate model serving as the main prediction tool determining the coefficients of additive and multiplicative response scaling. The ANN configuration specifically employed is a multi-layer perceptron, whose hidden layers are fully connected. The layers’ sizes are optimized during the model identification process. In addition to environmental parameters assessed by LCS, our calibration inputs include supplementary data. These auxiliary inputs encompass time derivatives of environmental variables and short time series of previous PMx samples from the sensor undergoing calibration. Incorporation of these supplementary inputs facilitates learning of typical temporal dependencies between parameters such as temperature, humidity, or atmospheric pressure, and the PMx measurements by the MLP model.
The introduced correction scheme was verified using mobile hardware units constructed at Gdansk University of Technology, Poland. The employed equipment included low-cost PMx and environmental sensors, as well as electronic circuits designed for carrying out measurements, storing data, and wirelessly transmitting it using the built-in GMS modem. Field calibration of the sensor was performed based on the reference readings gathered by five public monitoring stations located in Gdansk, Poland. Corresponding LCS data was collected using multiple copies of the portable platform located in the vicinity of the reference stations. Extensive experiments were performed involving various configurations of the calibration model. The results obtained demonstrate exceptional reliability of the presented procedure for all three categories of particulate matter: PM10, PM2.5, and PM1. For the most comprehensive calibration setup (optimized MLP hyper-parameters, combined multiplicative/additive scaling, auxiliary inputs), the achieved coefficient of determination between the corrected LCS and reference readings equal 0.89 for PM1, 0.87 for PM2.5, and 0.77 for PM10. These values indicate a remarkable improvement over the uncorrected sensor, which exhibit the coefficient of determination of 0.40, 0.44, and 0.17, respectively. Moreover, the RMSE error levels equal about 3.0, 3.9 and 4.9 µg/m3.
Further experiments were conducted to assess the significance and impact of specific constituent parts of the calibration framework. These experiments confirmed the essential nature of these components in achieving top performance in the sensor correction process. In particular, the utilization of combined additive/multiplicative scaling led to an increase of up to 0.04 in the coefficient of determination over additive scaling alone. Optimization of MLP model architecture further contributed up to 0.04 to the average increase in coefficient of determination. While the effects of incorporating supplementary inputs (such as differentials and time series) were less prominent, they were still noticeable. Overall, these components jointly resulted in reducing RMSE from 0.5 to 1.3 µg/m3 (e.g., from 3.64 to 2.97 µg/m3 for PM1).
The future research will also aim at enhancing calibration process reliability even further. One avenue to explore involves considering more sophisticated artificial intelligence methods, particularly convolutional and recurrent neural networks, as potential calibration models. Additionally, other options include developing global correction mechanisms aimed at reducing discrepancies between LCS and reference data at the level of complete datasets, before applying specific correction mechanisms at the level of individual sensor measurements.
Data availability
Data availability: The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. Contact person: anna.dabrowska@pg.edu.pl.
References
Lelieveld, J. et al. Cardiovascular disease burden from ambient air pollution in Europe reassessed using novel hazard ratio functions. Eur. Heart J. 20, 1590–1596 (2019).
Ambient air pollution: a global assessment of exposure and burden of disease, World Health Organization, Geneva. (2016). https://apps.who.int/iris/handle/10665/250141
Air quality in Europe. Report no. 05/2022, European Environment Agency, doi: 10.2800/488115, 2022. (2022).
Loomis, D. et al. The carcinogenicity of outdoor air pollution. Lancet Oncol. 14 (13), 1262–1263 (2013).
Feng, S., Gao, D., Liao, F., Zhou, F. & Wang, X. The health effects of ambient PM2.5 and potential mechanisms. Ecotoxicol. Environ. Saf. 128, 67–74 (2016).
GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of disease study 2019. Lancet 396, 1204–1222 (2019).
Krittanawong, C. et al. PM2.5 and cardiovascular health risks, current problems in cardiology. Int. J. Cardiol. Cardiovasc. Risk Prev. 48 (6), 101670 (2023).
Zhang, T. et al. The effects of PM2.5 on lung cancer-related mortality in different regions and races: A systematic review and meta-analysis of cohort studies. Air Qual. Atmos. Health. 15, 1523–1532 (2022).
Hamra, G. B. et al. Outdoor particulate matter exposure and lung cancer: a systematic review and meta-analysis. Environ. Health Perspect. 122, 906–911 (2014).
Alman, B. L. et al. Associations between PM2.5 and risk of preterm birth among liveborn infants. Ann. Epidemiol. 39, 46–53 (2019).
Juda-Rezler, K., Reizer, M., Maciejewska, K., Błaszczak, B. & Klejnowski, K. Characterization of atmospheric PM2.5 sources at a central European urban background site. Sci. Total Environ. 713, 136729 (2020).
Mehadi, A. et al. Laboratory and field evaluation of real-time and near real-time PM2.5 smoke monitors. J. Air Waste Manage. Association. 70 (2), 158–179 (2020).
Bagkis, E., Kassandros, T. & Karatzas, K. Learning calibration functions on the fly: hybrid batch online stacking ensembles for the calibration of low-cost air quality sensor networks in the presence of concept drift, Atmosphere vol. 13, 416, (2022).
Morawska, L. et al. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: how Far have they gone? Environ. Int. 116, 286–299 (2018).
Zheng, T. et al. Field evaluation of low-cost particulate matter sensors in high-and low-concentration environments. Atmos. Meas. Tech. 11 (8), 4823–4846 (2018).
Datta, A. et al. Statistical field calibration of a low-cost PM2.5 monitoring network in Baltimore. Atmos. Environ. 242, 117761 (2020).
Gao, M., Cao, J. & Seto, E. A distributed network of low-cost continuous reading sensors to measure Spatiotemporal variations of PM2. 5 in Xi’an China. Environmen Pollut. 199, 56–65 (2015).
Apte, J. S. et al. High-resolution air pollution mapping with Google street view cars: exploiting big data. Environ. Sci. Technol. 51 (12), 6999–7008 (2017).
Hasenfratz, D. et al. Deriving high-resolution urban air pollution maps using mobile sensor nodes. Pervasive Mob. Comput. 16, 268–285 (2015). Part B.
Kane, F., Abbate, J., Landahl, E. C. & Potosnak, M. J. Monitoring particulate matter with wearable sensors and the influence on student environmental attitudes. Sensors 22, 1295 (2022).
Palomeque-Mangut, S. et al. Wearable system for outdoor air quality monitoring in a WSN with cloud computing: design, validation and deployment. Chemosphere 307 (3), 135948 (2022).
Zamora, L. et al. Maternal exposure to PM2.5 in South Texas, a pilot study. Sci. Total Environ. 628, 1497–1507 (2018).
Castell, N. et al. Can commercial low-cost sensor platforms contribute to air quality monitoring and exposure estimates? Environ. Int. 99, 293–302 (2017).
Malings, C. et al. Fine particle mass monitoring with low-cost sensors: corrections and long-term performance evaluation. Aerosol Sci. Technol, 54, pp. 160–174, 2020.
Giordano, M. R. et al. From low-cost sensors to high-quality data: A summary of challenges and best practices for effectively calibrating low-cost particulate matter mass sensors. J Aerosol Sci, 158, 105833, 2021.
Barkjohn, K. K., Gantt, B. & Clements, A. L. Development and application of a united States-wide correction for PM2.5 data collected with the PurpleAir sensor. Atmos Meas. Tech, 14, pp. 4617–4637, 2021.
Khreis, H., Johnson, J., Jack, K. & Dadashova, B. ParkEvaluating the performance of low-cost air quality monitors in Dallas. Tex. Int. J. Environ. Res. Public. Health. 19, 1647 (2022).
deSouza, P. et al. Calibrating networks of low-cost air quality sensors. Atmos. Meas. Tech. 15, 6309–6328 (2022).
Kelly, K. E. et al. Ambient and laboratory evaluation of a low-cost particulate matter sensor. Environ. Pollut. 221, 491–500 (2017).
Badura, M., Batog, P., Drzeniecka-Osiadacz, A. & Modzel, P. Evaluation of low-cost sensors for ambient PM2.5 monitoring. J Sens, 5096540, (2018).
Jayaratne, R., Liu, X., Thai, P., Dunbabin, M. & Morawska, L. The influence of humidity on the performance of a low-cost air particle mass sensor and the effect of atmospheric fog. Atmos. Meas. Tech. 11, 4883–4890 (2018).
Kim, D., Shin, D. & Hwang, J. Calibration of low-cost sensors for measurement of indoor particulate matter concentrations via laboratory/field evaluation. Aerosol Air Qual. Res. 23, 230097 (2023).
Sousan, S. et al. Inter-comparison of low-cost sensors for measuring the mass concentration of occupational aerosols. Aerosol Sci. Technol. 50, 462–473 (2016).
Tancev, G. & Pascale, C. The relocation problem of field calibrated low-cost sensor systems in air quality monitoring: A sampling bias. Sensors 20, 6198 (2020).
Liu, H. Y., Schneider, P. & Haugen, R. Performance assessment of a low-cost PM2.5 sensor for a near four-month period in Oslo, Norway. Atmosphere 10 (2), 41 (2019).
Holstius, D. M., Pillarisetti, A., Smith, K. R. & Seto, E. Field calibrations of a low-cost aerosol sensor at a regulatory monitoring site in California. Atmos Meas. Tech, 7, pp. 1121–1131, 2014.
Magi, B. I., Cupini, C., Francis, J., Green, M. & Hauser, C. Evaluation of PM2.5 measured in an urban setting using a low-cost optical particle counter and a federal equivalent method Beta Attenuation monitor. Aerosol Sci. Technol. 54, 147–159 (2019).
Zimmerman, N. et al. A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. Atmos. Meas. Tech., 11, 291–313, 2018.
Masiol, M., Squizzato, S., Chalupa, D., Rich, D. Q. & Hopke, P. K. Evaluation and field calibration of a low-cost Ozone monitor at a regulatory urban monitoring station. Aerosol Air Qual. Res. 18, 2029–2037 (2018).
Hong, G. H. et al. Long-term evaluation and calibration of three types of low-cost PM2.5 sensors at different air quality monitoring stations. J. Aerosol Sci. 157, 105829 (2021).
Balzano, L. & Nowak, R. Blind calibration of sensor networks. Int Conf. Process. SensorNetworks (IPSN), pp. 79–88, (2007).
Hofman, J. et al. Mapping air quality in IoT cities: Cloud calibration and air quality inference of sensor data, Conf. IEEE Sensors, Rotterdam, Netherlands, pp. 1–4. (2020).
Narayana, M. V., Jalihal, D. & Nagendra, S. Establishing a sustainable low-cost air quality monitoring setup: A survey of the state-of-the-art, Sensors, vol. 22, 394, (2022).
Zusman, M. et al. Calibration of low-cost particulate matter sensors: model development for a multi-city epidemiological study. Environ. Int. 134, 105329 (2020).
Considine, E. M., Reid, C. E., Ogletree, M. R. & Dye, T. Improving accuracy of air pollution exposure measurements: statistical correction of a municipal low-cost airborne particulate matter sensor network. Environ. Pollut. 268, 115833 (2021).
Liang, L. Calibrating low-cost sensors for ambient air monitoring: techniques, trends, and challenges. Environ. Res. 197, 111163 (2021).
Venkatraman Jagatha, J. et al. Calibration method for particulate matter low-cost sensors used in ambient air quality monitoring and research. Sensors 21, 3960 (2021).
Wang, Y., Du, Y., Yanjun, J. & Li, T. Calibration of a low-cost PM2. 5 monitor using a random forest model. Environ. Int. 133, 105161 (2019).
De Vito, S. et al. Calibrating chemical multisensory devices for real world applications: an indepth comparison of quantitative machine learning approaches. Sens. Actuator B Chem. 255, 1191–1210 (2018).
Mahajan, S. & Kumar, P. Evaluation of low-cost sensors for quantitative personal exposure monitoring. Sustain Cities Soc, 57, (2020).
Si, M. & Du, K. Development of a predictive emissions model using a gradient boosting machine learning method. Environ. Technol. Innov. 20, 101028 (2020).
Loh, B. G. & Choi, G. H. Calibration of portable particulate matter–Monitoring device using web query and machine learning. Saf. Health Work. 10 (4), 452–460 (2019).
Chen, C. C. et al. Calibration of low-cost particle sensors by using machine-learning method, IEEE Asia Pacific Conf. Circuits and Systems (APCCAS), Chengdu, China, pp. 111–114. (2018).
Jeon, H., Ryu, J., Kim, K. M. & An, J. The development of a low-cost particulate matter 2.5 sensor calibration model in daycare centers using long short-term memory algorithms. Atmosphere 14, 1228 (2023).
Ali, S., Alam, F., Arif, K. M. & Potgieter, J. Low-cost CO sensor calibration using one dimensional convolutional neural network. Sensors 23, 854 (2023).
Athira, V., Geetha, P., Vinayakumar, R. & Soman, K. P. DeepAirNet: applying recurrent networks for air quality prediction. Procedia Comput. Sci. 132, 1394–1403 (2018).
Dai, X., Liu, J. & Li, Y. A recurrent neural network using historical data to predict time series indoor PM2.5 concentrations for residential buildings. Indoor Air. 31, 1228–1237 (2021).
Yu, H. et al. A deep calibration method for low-cost air monitoring sensors with multilevel sequence modeling. IEEE Trans. Instrument Meas. 69 (9), 7167–7179 (2020).
Kureshi, R. R. et al. Data-driven techniques for low-cost sensor selection and calibration for the use case of air quality monitoring. Sensors 22, 1093 (2022).
BeagleBone® Blue, B. B. https://www.beagleboard.org/boards/beaglebone-blue
Arduino and RaspberryPI modems - u-GSM shield LTE CAT M1 & IoT, N. B. LTE CAT NB1, LTE CAT NB2, LTE CAT4, LTE CAT1, UMTS, and GSM: presentation, (2024). https://itbrainpower.net/u-GSM/features.php
Datasheet, S. P. S. Particulate matter sensor for air quality monitoring and control, Sensirion, : (2024). https://sensirion.com/media/documents/8600FF88/616542B5/Sensirion_PM_Sensors_Datasheet_SPS30.pdf
ARMAG Foundation. Home: https://armaag.gda.pl/en/index.htm
Map data from OpenStreetMap. http://openstreetmap.org/copyright.
Aggarwal, C. C. Neural Networks and Deep Learning (Springer, 2018).
Vang-Mata, R. (ed) Multilayer Perceptrons (Nova Science Pub. Inc., 2020).
Dlugosz, S. Multi-layer Perceptron Networks for Ordinal Data Analysis (Logos, 2008).
Hagan, M. T. & Menhaj, M. Training feed-forward networks with the Marquardt algorithm. IEEE Trans. Neural Networks. 5 (6), 989–993 (1994).
Salem, F. M. Recurrent Neural Networks. From Simple To Gated Architectures (Springer, 2022).
Koziel, S., Pietrenko-Dabrowska, A., Wojcikowski, M. & Pankiewicz, B. „Efficient calibration of cost-efficient particulate matter sensors using machine learning and time-series alignment, Knowledge Based Syst., vol. 295, paper no. 111879, (2024).
Acknowledgements
This work was supported in part by the Icelandic Research Fund Grant 2410297 and by the National Science Centre of Poland Grant 2020/37/B/ST7/01448.
Author information
Authors and Affiliations
Contributions
S.K.: development of concept, methodology, software and numerical results, validation, writing original manuscript, funding acquisition; A.P.D.: development of concept, methodology, manuscript corrections; M.W.: project supervision, hardware development, funding acquisition, manuscript corrections; B.P.: hardware development, software drivers, validation, funding acquisition, manuscript corrections.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Koziel, S., Pietrenko-Dabrowska, A., Wojcikowski, M. et al. Efficient field correction of low-cost particulate matter sensors using machine learning, mixed multiplicative/additive scaling and extended calibration inputs. Sci Rep 15, 18573 (2025). https://doi.org/10.1038/s41598-025-02069-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-02069-w














