QUANT: a long-term multi-city commercial air sensor dataset for performance evaluation

Diez, Sebastian; Lacy, Stuart; Urquiza, Josefina; Edwards, Pete

doi:10.1038/s41597-024-03767-2

Download PDF

Data Descriptor
Open access
Published: 21 August 2024

QUANT: a long-term multi-city commercial air sensor dataset for performance evaluation

Sebastian Diez ORCID: orcid.org/0000-0001-9659-0356^1,2,
Stuart Lacy²,
Josefina Urquiza^3,4 &
…
Pete Edwards²

Scientific Data volume 11, Article number: 904 (2024) Cite this article

3104 Accesses
4 Citations
7 Altmetric
Metrics details

Subjects

Abstract

The QUANT study represents the most extensive open-access evaluation of commercial air quality sensor systems to date. This comprehensive study assessed 49 systems from 14 manufacturers across three urban sites in the UK over a three-year period. The resulting open-access dataset captures high time-resolution measurements of a variety of gasses (NO, NO₂, O₃, CO, CO₂), particulate matter (PM₁, PM_2.5, PM₁₀), and key meteorological parameters (humidity, temperature, atmospheric pressure). The quality and scope of the dataset is enhanced by reference monitors’ data and calibrated products from sensor manufacturers across the three sites. This publicly accessible dataset serves as a robust and transparent resource that details the methods used for data collection and procedures to ensure dataset integrity. It provides a valuable tool for a wide range of stakeholders to analyze the performance of air quality sensors in real-world settings. Policymakers can leverage this data to refine sensor deployment guidelines and develop standardized protocols, while manufacturers can utilize it as a benchmark for technological innovation and product certification. Moreover, the dataset has supported the development of a UK code of practice, and the certification of one of the participating companies, underscoring the dataset’s utility and reliability.

SensEURCity: A multi-city air quality dataset collected for 2020/2021 using open low-cost sensor systems

Article Open access 26 May 2023

UK daily meteorology, air quality, and pollen measurements for 2016–2019, with estimates for missing data

Article Open access 09 February 2022

The effect of national protest in Ecuador on PM pollution

Article Open access 02 September 2021

Background & Summary

In a world where the impacts of air pollution are increasingly relevant¹, sensor technologies emerge as potentially transformative tools² designed to augment monitoring³ and intervention strategies⁴. While the advantages of extensive spatial coverage⁵ and real-time data collection⁶ are compelling, the accuracy⁷ and reliability⁸ of the data obtained from air sensors remain fundamental concerns⁹. End-users must have a clear and accurate understanding of the performance of sensors in real-world environments to make well-informed decisions¹⁰. This is particularly critical in the realm of commercial applications, where proprietary systems often operate as “black boxes”¹¹ providing users with limited insight into data processing mechanisms.

Despite the rapid evolution of commercial systems, significant challenges remain, such as cross sensitivities¹², internal consistency¹³, signal drift¹⁴, long-term performance¹⁵, data coverage¹⁶, and environmental influences¹⁷. The wide range of devices available on the market and few impartial real-world evaluations make it hard for end-users to predict device performance in specific applications. Furthermore, the variety of assessment methodologies¹⁸, the use of diverse data quality metrics, and the lack of robust open-access datasets render the comparison of studies a complex task.

Recent studies have addressed sensor performance evaluations with various approaches, albeit with some limitations. For example, Park et al.¹⁹ evaluated 30 nodes in urban settings (measuring CO, NO₂, O₃, PM_2.5, and PM₁₀) and conducting short-term evaluations. Jiao et al.²⁰ focused on a suburban environment (evaluating sensors for NO₂, O₃, PM_2.5, and SO₂) over eight months. Collier-Oxandale et al.²¹ conducted laboratory and field tests in California with 28 gas sensors (for CO, NO₂, and O₃), while Liu et al.⁷ extended the duration of the study to 13 months in Australia (evaluating PM_2.5 and CO) with an unspecified number of sensors. Munir et al.⁶, the only evaluation in the UK (Sheffield) found in the literature, focused on the evaluation of 10 sensors (measuring NO, NO₂, and CO) over a year. None of the mentioned studies seem to provide public data access.

The QUANT dataset represents, to the best of our knowledge, the most extensive open-access evaluation of commercial sensor systems on a global scale to date¹⁸. Part of the UK Research and Innovation Clean Air programme, the QUANT (Quantification of Utility of Atmospheric Network Technologies) project, aims to tackle these issues by evaluating the performance of commercial sensor systems within urban environments across the UK¹⁸. Moreover, limited access to highly accurate measurement instruments —and the expertise required to effectively employ them— continues to restrict improvements in these newer technologies²². To address this issue, collaborative efforts are needed to transform academic knowledge into practical insights that can benefit the wider user community.

Through the QUANT project, a wide array of sensor technologies was systematically deployed across three representative urban sites in the UK, divided into two distinct phases: the first called “Main QUANT” and the second, the “Wider Participation Study” (WPS). The chosen sites for this initiative included two urban-background measurement supersites: the Manchester Air Quality Supersite (MAQS) and the London Air Quality Supersite (LAQS), along with a roadside monitoring station in York (YoFi), part of the Automatic Urban and Rural Network (AURN). The workflow of the QUANT study is depicted in Fig. 1.

The Main QUANT phase, spanning nearly three years (December 2019 to October 2022), focused on a long-term, transparent evaluation of selected commercial sensor devices, acquiring and assessing 26 units from 5 commercial brands at the MAQS, LAQS, and YoFi sites. Additionally, with the aim of fostering sensor innovation, the WPS was organized, covering the period from June 2021 to October 2022, and conducted entirely at the MAQS site. This second stage offered a cost-free opportunity for any commercial entities to engage in an impartial evaluation. During the WPS, 23 units from 9 companies were assessed. Altogether, 49 commercial devices were evaluated, yielding 119 gas (including NO, NO₂, O₃, CO and CO₂), 118 particulate matter (PM) measurements (almost all measuring PM₁, PM_2.5, PM₁₀), and a number of meteorological measurements (including temperature (Temp), relative humidity (RH), and atmospheric pressure (Pres)). Throughout both phases, the study encompassed a range of meteorological conditions and pollutant concentrations, providing a comprehensive view of sensor performance in varied environments. In order to minimize uncertainties external to the systems and the companies involved, the study implemented a robust study design complemented by stringent quality control procedures. To rigorously evaluate system performance and identify some of their strengths and weaknesses, comprehensive reference (and equivalent-to-reference as defined by the European Commission²³) measurement data were collected throughout the study. Making use of this extensive reference data, the study also explored how local corrections by manufacturers influenced sensor performance, further enriching the understanding of each device’s capabilities.

The QUANT dataset empower stakeholders—including researchers, policymakers, and urban planners—to understand the behavior of commercial sensor technologies in various environments and refine correction models for specific applications. Policymakers, for instance, can leverage this data to refine sensor deployment guidelines and develop standardized protocols. The insights from this dataset have already contributed to the development of the UK PAS 4023:2023 (https://standardsdevelopment.bsigroup.com/projects/2022-00710), which outlines best practices for the selection, deployment, and quality assurance of air quality sensor systems. For companies not participating in the original study, the dataset can reveal challenges associated with long-term sensor evaluations and pinpoint specific opportunities for innovation. For participating companies, on the other hand, it serves as an invaluable resource for benchmarking products and facilitating improvements, as well as for certification processes. An example is AQMesh, which recently leveraged the QUANT data to achieve UK MCERTS certification (see https://www.csagroup.org/wp-content/uploads/MC240422.pdf).

Methods

Systems selection

The selection and purchasing process of the devices for the Main QUANT took place between September and October 2019. Our choice of sensor systems was informed by specific criteria:

Measure key pollutants: each device had to measure either NO₂ or PM_2.5, due to their importance in the UK regulatory framework, and we also opted to include devices that also reported O₃ due to its importance globally.
High temporal resolution: the sensors were required to provide data at resolutions ranging from 1 to 15 minutes, to allow for detailed temporal analysis.
Continuous unattended operation: it was important for the devices to operate continuously over extended periods to minimize personnel interventions.
Data accessibility in near real-time: to prevent further post-processing of the data and also to support timely analysis and internal decision-making processes (e.g., maintenance scheduling).
Documented performance: proven performance in prior research and/or market presence was also of key consideration.

The selected products were (in parenthesis the abbreviations employed for the study to identify each system):

AQY from Aeroqual (https://www.aeroqual.com): NO₂, O₃, PM_2.5 and PM₁₀;
AQMesh (AQM) from Environmental Instruments (https://www.aqmesh.com): NO₂, NO, O₃, CO₂, PM₁, PM_2.5 and PM₁₀;
ARIsense (Ari) from QuantAQ (https://quant-aq.com): NO, NO₂, O₃, CO, CO₂, PM₁, PM_2.5 and PM₁₀;
Zephyr (Zep) from EarthSense (https://www.earthsense.co.uk): NO, NO₂, O₃, PM₁, PM_2.5 and PM₁₀;
PurpleAir (PA; https://www2.purpleair.com): PM₁, PM_2.5 and PM₁₀;

For more details on the specifications and hardware of each system, please refer to Table 1.

Table 1 Overview of sensor hardware and measurement capabilities for the sensor systems in the Main QUANT study.

Full size table

Following more than one year after starting the Main QUANT study, the WPS phase was initiated. Offered at no cost, the call for participation in this stage was publicly announced in March 2021, leveraging the established test-bed infrastructure to demonstrate sensor performance. The WPS encompassed a wider array of platforms and was exclusively carried out at MAQS, (as detailed in “Sites selection”), with manufacturers supplying a minimum of two sensor devices each. The participating products were:

Atmos (Atm) from Urban Sciences (http://urbansciences.in/): PM₁, PM_2.5 and PM₁₀;
IMB from Bosh (https://www.bosch-mobility-solutions.com): NO₂, O₃, PM_2.5 and PM₁₀;
Polludrone (Poll) from Oizom (https://oizom.com): NO, NO₂, O₃, PM_2.5 and PM₁₀;
Kunak Air Pro (AP) from Kunak (https://www.kunak.es/): NO, NO₂, O₃, CO, PM₁, PM_2.5 and PM₁₀;
Silax Air (SA) from Vortex (https://vortexiot.com): NO₂, O₃, PM_2.5 and PM₁₀;
Node-S (NS) from Clarity (https://www.clarity.io): NO₂, PM₁, PM_2.5 and PM₁₀;
Praxis/Urban (Prax) from South Coast Science (https://www.southcoastscience.com): NO, NO₂, O₃, CO₂, PM₁, PM_2.5 and PM₁₀.
Modulair-PM (Mod) from QuantAQ: PM₁, PM_2.5 and PM₁₀;
AQMesh (AQM): NO₂, NO, O₃, CO₂, PM₁, PM_2.5 and PM₁₀;

Details about the measured variables and the hardware components of these devices are presented in Table 2.

Table 2 Overview of sensor hardware and measurement capabilities for the sensor systems in the WPS study.

Full size table

Co-location sites

For the Main QUANT deployment, three different field sites across the UK were selected in order to capture a variety of conditions. This included two extensively equipped urban background supersites (i.e., MAQS and LAQS), plus a roadside monitoring site (i.e., YoFi). The selection of these sites was based on three primary criteria: (i) extensive instrumentation for measuring the chemical composition and physical properties of the atmosphere, (ii) practical considerations, such as ease of access, available space, and continuous technical assistance from site managers, and (iii) the inclusion of at least one representative roadside site. Of the three urban supersites currently available in the UK—LAQS, MAQS, and the Birmingham supersite—only two were considered due to funding constraints. MAQS was selected for its practical aspects, including ample space for installing a large number of sensors, easy access, full-time dedicated technical personnel, and transportation facilities to and from the site. London was chosen for its uniqueness in the UK, both in terms of population size and emission profile. Additionally, the space available for sensor deployment at LAQS and the on-site technical assistance made it the second site selected. The YoFi roadside site was chosen as the third site due to its ease of access and the support received from its administrators, allowing for the accommodation of additional instrumentation. Although the highly instrumented Marylebone Road site in the UK was considered, logistical and cost constraints limited its selection. Furthermore, the high traffic volume on this central London road makes it less representative of typical roadside sites across the UK, where low-cost sensors are commonly deployed. Figure 2 shows some panoramic pictures of the sites.

These selected sites offer a wide range of reference measurements, representing chemical environments typical of UK urban areas. Given time constraints, but also motivated by the MAQS capabilities, this was the only site used for the WPS study.

MAQS measures gases, aerosols and meteorology and is one of the most extensive air quality study facilities in the UK (for detailed information visit: http://www.cas.manchester.ac.uk/restools/firs/). Located in the south of the metropolitan area (Fallowfield Campus, University of Manchester; 53° 26′ 39.2″ N, 2° 12′ 51.9″ W), it offers a typical UK urban background setting. This site is free from direct traffic emissions and surrounded by student accommodations, university buildings, and sports facilities. The neighborhood’s shops, bars, and restaurants contribute to foot traffic and vehicle movement. Additionally, emissions from heating and cooking in residential buildings affect the area’s ambient air quality. The average winter Temp at MAQS is 4–5 °C and RH is around 87%. In summer, the mean Temp is 16–17 °C with RH approximately 88% (see Fig. 3). The research-grade instrumentation used for this analysis is compounded by chemiluminescence NO analyzer (Thermo, 42i-y. Limit of detection <50 ppt, root mean square “zero” noise <25 ppt), a Cavity Attenuated Phase Shift Spectroscopy (CAPS) NO₂ analyzer (Teledyne, T500. Limit of detection <40 ppt, root mean square “zero” noise <20 ppt), a UV photometric O₃ analyzer (Thermo Scientific, 49i. Limit of detection <1.0 ppb, root mean square “zero” noise <0.25 ppb), and an optical aerosol spectrometer (Palas, FIDAS200. Mass range 0–10000 µg/m3, particle size range 0.18–18 µm).

LAQS (as of this writing, this site does not yet have a website) also supports the extensive measurements of gasses, aerosols and meteorology, comparable to MAQS. It is located in an urban background (Honor Oak Park; 51° 26′ 58.9″ N, 0° 02′ 14.6″ W), within the vast urban sprawl of Greater London. LAQS is surrounded by middle-class neighborhoods, parks, and green spaces, away from major roads and pollution sources. The area features low commercial activity, with local shops and restaurants barely affecting the overall noise and bustle. This setting offers a representative view of typical residential London air quality. The site experiences a winter Temp of approximately 5 °C on average and an RH of 84%, while in summer the mean Temp is around 17 °C with 72% RH. From this site, the research-grade instrumentation employed was integrated by a chemiluminescence analyzer NO analyzer (Teledyne, T200U. Limit of detection < 50 ppt, root mean square “zero” noise < 25 ppt), a CAPS NO₂ analyzer (Teledyne, T500), a UV photometric O₃ analyzer (Teledyne, 400E. Limit of detection < 0.6 ppb, root mean square “zero” noise < 0.3ppb), and an optical aerosol spectrometer to measure a Palas FIDAS200 for PM.

The York Fishergate (YoFi) is a roadside monitoring station embedded within a mixed-use neighborhood, very close to the York city center (53° 57′ 06.9” N, 1° 04′ 33.1” W). Located on a traffic island in a residential area, the site sits between two key lanes of Fishergate Road, close to a commercial zone with pubs and restaurants, and near Walmgate Stray’s recreational fields, blending light industrial features. This air quality monitoring station registers typical winter temperatures near 4 °C with 87% RH, and summer conditions averaging 15 °C and 80% RH. YoFi offered more diverse pollutant levels commonly associated with traffic-dominated areas, contrasting with the urban background sites like MAQS and LAQS. This site is equipped with a chemiluminescence NOx analyzer (Teledyne, T200UP. Limit of detection < 50 ppt, root mean square “zero” noise < 25 ppt) and two beta-attenuation PM monitors (Met One, BAM 1020. Mass range 0–10000 µg/m3, Limit of detection < 4.8 μg/m3 for 1-hour avg.), one dedicated to PM_2.5 and the other to PM₁₀.

Table 3 summarizes the information referring to the reference instruments for all sites.

Table 3 Research grade instrumentation used for the QUANT study.

Full size table

Sensor systems deployment

The installation of the Main QUANT systems was carried out at MAQS between December 10 and 19, 2019 (see Fig. 4). For the first three months, all systems remained at MAQS (until mid-March 2020). Subsequently, more than half of them were distributed to the other two co-location sites, LAQS (London, March 11, 2020) and YoFi (York, March 23, 2020).

For 2 years and 3 months (mid-March 2020 to early July 2022), 12 systems remained at MAQS, with the rest distributed between LAQS (7) and YoFi (7). All systems were relocated back to MAQS (July 2022) until the end of the study (November 2022). This schedule was established to initially subject all sensors to identical conditions to evaluate their performance, followed by exposure to varied environments to understand their adaptation, and finally regrouping them at MAQS to gather data reflecting the systems’ aging.

All systems were mounted on poles, acquired specifically for this project, or mounted on rails at the co-location sites. The manufacturers’ instructions were carefully followed, such as in terms of electrical installation, mounting, cleaning, and maintenance of the sensors. At YoFi, space constraints required meticulous planning to guarantee an effective co-location without compromising their operation. This involved optimizing spatial usage to maintain data integrity.

Tailored electrical setups were implemented for each sensor, considering their energy requirements. This involved using location-specific energy sources, connecting to the electrical grid with weather-resistant safety systems, and implementing security measures against vandalism. The sensors underwent maintenance checks at least once a month, except during the period of COVID-19 restrictions (March to June 2022), where site visits were limited to a maximum period of four months without on-site maintenance.

Complementary to the Main QUANT setup, the WPS was carried out exclusively at MAQS from (10 June 2021 to 31 October 2022, 16-month in total). In terms of installation and mounting, the WPS sensors were installed following the same practices as the Main QUANT, including compliance with the manufacturers’ instructions, electrical installation, mounting height and proximity to the inlets. Also, similar strategies were implemented to ensure installation efficacy and maintenance of data integrity.

Sensor data collection

Data from all sensors were collected and processed using standardized methods throughout our study. This included maintaining uniform data logging intervals and adhering to consistent data transmission protocols (GPRS/LTE, supplemented by WiFi for specific units). To safeguard data integrity and prevent any potential data manipulation, we implemented a bespoke Extract, Transform, Load (ETL) pipeline in Python, executed daily within Amazon Web Services (AWS) containers. This automated pipeline systematically retrieved the previous day’s data from each company’s API, organizing it into a standardized CSV format. This daily retrieval ensures that data is captured and stored in near real-time, also reducing the risk of data loss or alteration. An exception involved the PurpleAir devices, which, due to connectivity challenges, required on-site data gathering and manual upload. These data were then integrated into the standardized CSV format to maintain consistency. All CSV files were securely stored in Cloud Storage (Google Drive), with strict version control and backup protocols to secure data availability and integrity.

Throughout the study, we further processed the raw data into minute-by-minute averages for consistent timestamps and batch-inserted them into a relational database (Postgres) with relevant metadata and co-located reference data using custom R and SQL scripts. No additional modifications to the original measurements were applied. The final database (outlined in the “Data Records” section) was created by converting the CSV files into NetCDF format.

Duplicate reference instrument deployment

As part of QUANT, a specific deployment of duplicate reference instruments was conducted exclusively in Manchester. This was aimed at providing end-users with a more accurate characterization of the measurement uncertainties associated with reference methods. Initially, we planned to install duplicate monitors for PM_2.5, NOx, and O₃, reflecting their status as critical pollutants in UK air quality management. However, while the duplicate NOx and O₃ monitors were successfully deployed, the installation of a second PM_2.5 monitor encountered significant delays. These were primarily due to the COVID-19 pandemic and funding constraints, leading to its deployment only towards the end of the QUANT project. Consequently, data from this second PM_2.5 instrument were not included in the initial dataset presented in this study.

For NOx, we utilized two Teledyne T200 instruments (employing chemiluminescence; temporal resolution ~2 min) installed in two different portions of the QUANT study. The first instrument (serial 21842), was operational from October 13, 2020 to December 17, 2020. The subsequent instrument (serial 23828), worked from March 27, 2021, to December 1, 2021.

For Ozone, two distinct devices were deployed. Initially, a 2B Technologies 202 instrument, (utilizing ultraviolet (UV) photometry; serial 312D and temp. res. of 1-2 min), was deployed from April 9, 2021, to July 29, 2021. This was complemented by a Thermo 49i instrument (also based on UV photometry; serial 1008241369 and a temp. res. ~1 min, operational from June 30, 2021 to November 19, 2021.

Quality Assurance and Quality Control (QA/QC) procedures were meticulously applied to all instruments both before and after their deployment, conducted by the skilled personnel from our lab (the Wolfson Atmospheric Chemistry Laboratories, WACL). During their co-location, these instruments adhered to the same rigorous checking routines as those already on-site, ensuring data integrity and comparability. For more detailed information on these QA/QC routines, refer to the section “Reference Data Validation”.

Data Records

The QUANT sensors dataset is available at CEDA²⁴. Within the repository, there are three folders, one per site: i.e., fishergate, maqs, and laqs. Refer to Table 4 for a descriptive summary. Also, in the same root directory, it can be found three files:

metadata.yaml: it is a YAML format document with a detailed description of the QUANT dataset.
00README_catalogue_and_licence.txt: contains information on the publication status, a link to the CEDA data catalog, and the data usage license.
Quant_instrument_list.csv: it offers details about:
- system_id: devices identification (internal to the project);
- study: co-location studies, i.e., “QUANT” and “Wider Participation Study”.
- manufacturer: company name.
- model: instrument model version.
- url: manufacturer’s website.
- serial: manufacturer devices ID.
- description: brief description of the use given to each sensor, detailing manufacturers, models, and the pollutants measured (i.e., particles, gasses, and met parameters).

Table 4 Distribution and count of sensor data files by brand across site folders.

Full size table

It is important to note that this repository²⁴ does not include reference data.

The naming of the main data files (NetCDF files) follows this convention: “Manufacturer-system_id-variable_site_initDate-finishDate.nc”, where:

Manufacturer: company name;
system_id: systems ID (internal to the project);
variable: either a pollutant (i.e., NO2, O3, NO, PM1, PM2.5, PM10) or the meteorological (abbreviated as “Met”) variables measured by the system;
site: identifier of the study in which the sensor was used;
initDate: start date of the data collection by that specific sensor;
finishDate: end date of the data collection by that sensor.

Following the naming convention detailed earlier for the data files, Table 5 provides the structure and format of the sensor system files and Table 6 outlines the quality flag variables are. For more details on the calibrated data products, refer to the “Calibrated products” section in Technical Validation.

Table 5 Description of the main variables and attributes of the NetCDF files.

Full size table

Table 6 Description of the quality flag variables and attributes of the NetCDF files.

Full size table

Hourly records for sensor systems, reference and duplicate reference monitors

To simplify access and enhance user interaction, we have standardized and consolidated the QUANT sensor data, reference monitor data, and duplicate reference data into a more user-friendly CSV format available on a Zenodo repository²⁵. This repository contains three files:

QUANT_SensorSystems_hourly.csv: it contains the complete QUANT sensors dataset in hourly averages (detailed in Table 7);
Table 7 Description of the variables and attributes of the “QUANT_SensorSystems_hourly.csv”.
Full size table
QUANT_Reference_hourly.csv: it includes reference data from MAQS, LAQS, and YoFi (see Table 8 for details);
Table 8 Description of the variables and attributes included in the “QUANT_Reference_hourly.csv” file.
Full size table
QUANT_DuplicateRef_hourly.csv: it offers duplicate reference monitor data (refer to Table 9).
Table 9 Description of the variables and attributes included in the “QUANT_DuplicateRef_hourly.csv”.
Full size table

The choice of CSV format for the QUANT dataset improves its accessibility, leveraging its widespread familiarity to facilitate ease of use compared to the more complex NetCDF files housed at the CEDA repository²⁴. These files, while robust, often challenge end-users with their volume and technical demands. Additionally, accessing reference data from MAQS, LAQS, and YoFi involves navigating multiple repositories, which are not necessarily easy to find and vary in terms of data origin, accessibility, and format (e.g., variable naming uniformity, physical units, and time formatting). Our streamlined approach enhances the dataset’s utility for diverse end-users, supporting more effective analysis.

Interactive data visualization platform: the QUANT Shiny app

To further our commitment to data accessibility, especially for non-experts, we have developed a user-friendly platform called QUANT Shiny app (https://shiny.york.ac.uk/quant/). Currently under active development, this platform facilitates the exploration of the dataset through interactive visualizations and basic analysis. This tool is publicly accessible and allows users to select data products (O₃, NO₂, PM_2.5, and various calibration versions), sensors by brand, co-location periods, and preview performance characteristics like time series for sensors and reference instruments, Bland-Altman plots, regression plots, including the regression equation, the Coefficient of Determination (R²), and the Root Mean Squared Error (RMSE). This tool enhances the dataset’s practical utility.

Technical Validation

The overarching aim of the QUANT dataset is to provide end-users means to characterize the performance of current commercial air quality sensors and to assess the associated uncertainties across different systems and brands under real-world conditions.

Sensor data quality assurance and quality control

To guarantee the consistency and comparability of the data collected, and to mitigate the impact of external factors on sensor performance, all sensors were deployed, maintained, and operated under identical conditions. This standardized methodology included uniform installation procedures, operational settings, QA/QC protocols, maintenance schedules, and documentation practices.

All sensors across these three sites were subjected to identical testing conditions during the co-location period. Sensors were placed within 3 meters of the reference instruments’ inlets to maximize data representativeness and accuracy, given the rapid changes in urban environments.

The measurement capabilities at each site allowed for the monitoring of critical parameters such as temperature, relative humidity, along with other potential confounders (e.g., wind speed, and interfering gasses like O₃, CO and CO₂) to be rigorously monitored. This enhances the reliability of the QUANT dataset, enabling end-users to accurately assess the influence of environmental factors on the observed variations in sensor performance.

Data integrity was maximized through daily retrievals, with periodic comparisons against the data available on the manufacturer’s cloud to verify that no unauthorized post-collection modifications had been made to the data. Throughout the study, no undue changes were identified.

No data post-processing aiming at improving data product quality was performed on the data from the sensor devices by the QUANT team. This was to ensure that the data collected in this study is representative of that collected by any end-user of these technologies. The data processing done by sensor device manufacturers prior to reporting of the data is treated as confidential intellectual property by the majority of device manufacturers, and as such is unknown to the QUANT team, and any other end-user.

Minor processing was carried out to prepare the data files, including aligning data to standard formats and applying time averaging where necessary. We did not apply any modifications or imputations to the original measurements. Missing values, regardless of the cause, were preserved as missing to maintain the authenticity of the dataset.

Potential issues (e.g., malfunction, disruption of data, anomalies, etc.) were closely monitored through a master record (internally called “Units Log”; see “Documentation practices” for more details) and daily summary emails sent to the QUANT team providing quantitative information for each company and sensor ID, detailing the percentage of data received (i.e., timestamps, pollutant and environmental variables measurements). It utilized a color-coded system (green for “all OK”, yellow for “attention needed”, and red for “potential issues”), offering a quick qualitative insight into the instrument’s status. This dynamic monitoring allowed us to take preventive actions, such as addressing the deterioration in data reception over time, and corrective measures, such as immediate intervention. Basic manual time series analysis was also used to identify early signs of sensor malfunctions, facilitating proactive maintenance. In cases of data disruption, the first response was to consult site administrators (MAQS, LAQS, YoFi); if unresolved, the QUANT team contacted suppliers for further support. Site visits were arranged as necessary to inspect and maintain the devices. Detailed metadata also document periods when instruments were non-operational, with reasons for these outages noted (see “Documentation practices”).

Documentation practices

Rigorous documentation practices were implemented throughout QUANT. These centered around the “Units Log”, supplemented by the documentation provided by sensor manufacturers and the site managers. Maintained manually on a daily basis by our team, the master record was used to log day-by-day information for each site and sensor, including reference instruments from the sites and our own. It documented a range of data: installation/de-installation events, instrument locations; operational status (sensors and reference monitors), changes in operational conditions, calibrations performed; records of cartridge and unit changes, errors, failures, power outages, maintenance visits, and sensor replacements; links to internal documents (such as site audits, calibration certificates, manufacturers’ operational procedures, contracts and service agreements, software updates documentation, site plans, communication logs, technical decision records, and incident and problem resolution reports) as well as external resources like company dashboards links and relevant websites. This enabled a comprehensive oversight of sensor functionality and the identification and resolution of issues.

Relevant information collected through the Units Log was associated with corresponding metadata in the database, including software details (e.g., calibration versions), hardware information (e.g., parts replacements), and manufacturer-provided flags. This enhances the traceability of each measurement.

Sensor data availability

During QUANT a number of device failures resulted in lost data (see Figs. 5, 6), with some of the most significant issues experienced during the Main QUANT assessment being: mechanical malfunction, connectivity, water ingress, power supply failures, and wiring/connector failures. Some specific problems include a compromised SD card causing the on-board computer to fail (e.g. AQY875, missing data from Feb to May 2020), moisture seeping into a power supply PCB due to a broken seal (e.g. Ari078, missing data from Feb to Jul 2020), a main unit chip failure compounded by supply chain delays for a replacement (e.g. Zep311, missing data from Sep 2021 to Mar 2022 & Poll1, from Feb to Apr 2022), and sudden irreversible failure (e.g. PA2, 3 and 4, after Nov 2021; Atm1 and 2 after Aug 2021).

Reference data sharing

We periodically supplied the manufacturers with retrospective reference data on gas and particulate phase pollutants. This would allow makers to use the data for (i) validation, (ii) corrections, (iii) calibration and (iv) benchmarking of their products. In turn, suppliers were expected to provide updated data products if they developed new ones during this process.

Data sharing was conducted in three phases, each consisting of one-month periods of reference data. After the close of each data collection period, we shared preliminary reference data, allowing manufacturers for an immediate check into the sensors’ performance. Once the reference data was audited (process called “data ratification”. For more details see “Reference data validation”) by the National Physics Laboratory (NPL, UK), we distributed the ratified data. The intervals between these sharing phases were set at approximately six months, providing manufacturers with ample time to analyze and apply any correction to their devices.

To preserve the fairness and integrity of the evaluation process, all reference data was kept under embargo until it was ready to be released. Once available, it was disseminated simultaneously to all participating manufacturers. Table 10 outlines the dates and durations for each reference data release for both the Main QUANT and the WPS.

Table 10 Timeline for the release of the reference data during the Main QUANT and WPS.

Full size table

Reference data validation

The procedures implemented for reference data validation at MAQS and LAQS are as follows:

for NO, regular calibration checks are carried out at least once a month. These include zero and span checks using a calibrated standard cylinder and a scrubber to remove any trace gasses that may interfere with the measurements. Following these checks, any necessary corrections to zero and span values are applied to uphold measurement accuracy.
for NO₂, daily automatic zero and span checks are applied. These are facilitated by an internal NO₂ diffusion tube and scrubber. Zero values are corrected in response to these checks, and the span readings are closely monitored for any indications of instability.
for O₃, zero and span corrections are daily and automatically applied using an internal O₃ lamp and a scrubber daily. Adjustments are made to the zero readings, and span checks provide insight into the stability of the readings.
for CO, the instrumentation is checked every three hours for zero and monthly for span with the use of an onsite standard gas cylinder. Both zero and span values are then adjusted based on these frequent checks.
for CO₂, stability checks are regularly performed using an onsite cylinder, although these checks do not lead to direct corrections.
for PM, the QA/QC process involves the verification of sizing response using manufacturer-provided Mono dust, and flow rates are confirmed with a Gilibrator flow calibrator.

To warrant sustained quality and consistency, all instruments are set to continuously log operational parameters. These parameters are systematically monitored, and any deviations from the established ranges trigger automatic alerts to the site operators and the inclusion of flags within the data records.

In addition to these procedures, both sites undergo biannual data ratification audits conducted by NPL, which include comparisons with external gas standards, along with assessments of sizing and flow for PM. Any final data corrections are informed by audit results, which help define the concentration values for the onsite standards.

In the case of YoFi, the standard procedures set out in²⁶ are followed. For gas analyzers (NO, NO₂, O₃, CO and CO₂), routine QA/QC procedures include:

Regular manual and automatic calibrations of analysers: zero and span controls, and stability checks, using certified calibration standards and contaminant-specific equipment.
Site audits and network intercalibrations are carried out at semi-annual intervals by the QA/QC unit, providing a detailed assessment of network performance and compliance with national metrology standards.

For particle analyzers, similar procedures include:

Verification of size response and flow rates using manufacturer-specific standards and calibrators.
Semiannual zero checks to identify high baseline responses in the absence of particulate matter, with corrections applied based on the results of these tests.

Calibrated data products

During the full QUANT study (Main QUANT and WPS), the calibration of sensor devices was conducted exclusively by the manufacturers, without any intervention from our research team. This was chosen to warrant that the sensor outputs and any subsequent calibrations mirrored the experience of standard consumers in the market. This arrangement enabled manufacturers to engage in an independent review and, if they chose, to apply this data towards the creation and submission of advanced calibrated data products. However, it’s important to note that not all manufacturers opted to incorporate this reference data for improving their calibrations. For those who did take advantage of this option, the result was a set of updated data products. These were treated as separate and distinct data versions and included various iterations such as “out-of-box” (the initial data provided with no additional calibration), “cal1” (the first round of calibrations), and “cal2” (subsequent calibration adjustments). Tables 11, 12 provide a summary of these different data products.

Table 11 QUANT data products and calibration-related information.

Full size table

Table 12 WPS data products and calibration-related information.

Full size table

Limitations of the QUANT dataset

The QUANT dataset, while comprehensive, is subject to several limitations that are inherent to the use of air quality sensors. The study tested a limited array of sensors and brands over a specified duration, which may not capture the diversity of technologies available. Sensor performance can vary due to environmental factors, affecting their chemical sensitivity and physical responses. Additionally, calibration procedures varied as each manufacturer applied their own standards, beyond our control potentially leading to (internal-to-the-systems) data inconsistencies. Rapid technological advancements may also date the findings, limiting their future applicability. Moreover, the specific conditions tested in the UK may not be directly extrapolatable to other regions with different atmospheric compositions and climate conditions. Users should remain cautious of these limitations when interpreting the dataset and drawing conclusions from it, particularly when applying the findings to different environmental conditions or sensor configurations not directly tested in this study.

Usage Notes

Besides the R Shiny web app (https://shiny.york.ac.uk/quant/) described in the “Interactive data visualization platform: the QUANT Shiny app”, a repository containing Python and R code for estimating some diagnostic plots and metrics developed for QUANT can be found in this GitHub repository: https://github.com/wacl-york/quant-air-pollution-measurement-errors. It also includes examples taken from the QUANT dataset.

Code availability

The data retrieval pipeline was written in Python (version 3.7.6) and ran in Docker on AWS Fargate (this code is found in https://github.com/wacl-york/quant-scraper). The post-processing code to upload the data from the daily CSVs into the Postgres database (version 14.10), and then to export the database into NetCDF and CSV for storage into the CEDA repository was written in R (version 4) (found in https://github.com/wacl-york/quant-tools).

References

Lu, J. G. Air pollution: A systematic review of its psychological, economic, and social effects. Curr. Opin. Psychol. 32, 52–65 (2020).
Article PubMed Google Scholar
Schnell, I., Cohen, P., Mandelmilch, M. & Potchter, O. Portable - trackable methodologies for measuring personal and place exposure to nuisances in urban environments: Towards a people oriented paradigm. Comput. Environ. Urban Syst. 86, 101589 (2021).
Article Google Scholar
De Vito, S., Esposito, E., Castell, N., Schneider, P. & Bartonova, A. On the robustness of field calibration for smart air quality monitors. Sens. Actuators B Chem. 310, 127869 (2020).
Article Google Scholar
Popoola, O. A. M. et al. Use of networks of low cost air quality sensors to quantify air quality in urban settings. Atmos. Environ. 194, 58–70 (2018).
Article ADS CAS Google Scholar
Schneider, P. et al. Mapping urban air quality in near real-time using observations from low-cost sensors and model information. Environ. Int. 106, 234–247 (2017).
Article CAS PubMed Google Scholar
Munir, S., Mayfield, M., Coca, D., Jubb, S. A. & Osammor, O. Analysing the performance of low-cost air quality sensors, their drivers, relative benefits and calibration in cities—a case study in Sheffield. Environ. Monit. Assess. 191, 94 (2019).
Article PubMed PubMed Central Google Scholar
Liu, X. et al. Low-cost sensors as an alternative for long-term air quality monitoring. Environ. Res. 185, 109438 (2020).
Article CAS PubMed Google Scholar
Chojer, H. et al. Development of low-cost indoor air quality monitoring devices: Recent advancements. Sci. Total Environ. 727, 138385 (2020).
Article CAS PubMed Google Scholar
Maag, B., Zhou, Z. & Thiele, L. A Survey on Sensor Calibration in Air Pollution Monitoring Deployments. IEEE Internet Things J. 5, 4857–4870 (2018).
Article Google Scholar
Diez, S. et al. Air pollution measurement errors: is your data fit for purpose? Atmospheric. Meas. Tech. 15, 4091–4105 (2022).
Article CAS Google Scholar
Schmitz, S. et al. Unravelling a black box: an open-source methodology for the field calibration of small air quality sensors. Atmospheric Meas. Tech. 14, 7221–7241 (2021).
Article ADS CAS Google Scholar
Pang, X., Shaw, M. D., Gillot, S. & Lewis, A. C. The impacts of water vapour and co-pollutants on the performance of electrochemical gas sensors used for air quality monitoring. Sens. Actuators B Chem. 266, 674–684 (2018).
Article ADS CAS Google Scholar
Ripoll, A. et al. Testing the performance of sensors for ozone pollution monitoring in a citizen science approach. Sci. Total Environ. 651, 1166–1179 (2019).
Article ADS CAS PubMed Google Scholar
A. Miech, J. et al. In situ drift correction for a low-cost NO 2 sensor network. Environ. Sci. Atmospheres 3, 894–904 (2023).
Article Google Scholar
Bulot, F. M. J. et al. Long-term field comparison of multiple low-cost particulate matter sensors in an outdoor urban environment. Sci. Rep. 9, 7497 (2019).
Article ADS PubMed PubMed Central Google Scholar
Feinberg, S. et al. Long-term evaluation of air sensor technology under ambient conditions in Denver, Colorado. Atmospheric. Meas. Tech. 11, 4605–4615 (2018).
Article CAS Google Scholar
Crilley, L. R. et al. Evaluation of a low-cost optical particle counter (Alphasense OPC-N2) for ambient air monitoring. Atmospheric. Meas. Tech. 11, 709–720 (2018).
Article Google Scholar
Diez, S. et al. Long-term evaluation of commercial air quality sensors: an overview from the QUANT (Quantification of Utility of Atmospheric Network Technologies) study. Atmospheric Meas. Tech. 17, 3809–3827 (2024).
Article Google Scholar
Park, H. S. et al. The Potential of Commercial Sensors for Spatially Dense Short-term Air Quality Monitoring Based on Multiple Short-term Evaluations of 30 Sensor Nodes in Urban Areas in Korea. Aerosol Air Qual. Res. 20, 269–380 (2020).
Article Google Scholar
Jiao, W. et al. Community Air Sensor Network (CAIRSENSE) project: evaluation of low-costsensor performance in a suburban environment in the southeastern UnitedStates. Atmospheric Meas. Tech. 9, 5281–5292 (2016).
Article ADS CAS Google Scholar
Collier-Oxandale, A. et al. Field and laboratory performance evaluations of 28 gas-phase air quality sensors by the AQ-SPEC program. Atmos. Environ. 220, 117092 (2020).
Article CAS Google Scholar
Sá, J. P. et al. Two step calibration method for ozone low-cost sensor: Field experiences with the UrbanSense DCUs. J. Environ. Manage. 328, 116910 (2023).
Article PubMed Google Scholar
European Commission. Guide to the demonstration of equivalence of ambient air monitoring methods, Report by an EC Working, Group on Guidance. European Commission. (2010).
Lacy, S., Diez, S. & Edwards, P. Quantification of Utility of Atmospheric Network Technologies: (QUANT): Low-cost air quality measurements from 52 commerical devices at three UK urban monitoring sites. CEDA https://catalogue.ceda.ac.uk/uuid/ae1df3ef736f4248927984b7aa079d2e (2023).
Diez, S., Lacy, S., Read, K., Pete, E. & Josefina, U. QUANT: A Three-Year, Multi-City Air Quality Dataset of Commercial Air Sensors and Reference Data for Performance Evaluation. Zenodo https://doi.org/10.5281/zenodo.10775692 (2024).
DEFRA. Quality Assurance and Quality Control (QA/QC) Procedures for UK Air Quality Monitoring under the Air Quality Standards Regulations. (2023).
Tryner, J. et al. Laboratory evaluation of low-cost PurpleAir PM monitors and in-field correction using co-located portable filter samplers. Atmos. Environ. 220, 117067 (2020).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by the UKRI Strategic Priorities Fund Clean Air program (NERC NE/T00195X/1), with support from Defra. We extend our gratitude to the MAQS team (NERC NE/T001984/1, NE/T001917/1), Dr Michael Flynn, Dr Nicholas Marsden and Dr Thomas Bannan at the MAQS for their great help and assistance with regulatory-grade instruments data collection and support in maintenance tasks during QUANT. We would also like to thank the LAQS team (NERC NE/T001909/1) Dr Max Priestman, Dr Stefan Gillott and Dr David Green (Imperial College London) for granting access, support in maintenance tasks and sharing the data from the London site. The authors wish to acknowledge Dr Katie Read and the Atmospheric Measurement and Observation Facility (AMOF), a Natural Environment Research Council (UKRI-NERC) funded facility, for providing the duplicate references (a Thermo 49i and a 2B Technologies 202 for ozone, and two Teledyne T200U for NOx) used in this study and for their expertise on its deployment. Our efforts were greatly facilitated by Andrew Gillah, Jordan Walters, Liz Bates, and Michael Golightly from the City of York Council, whose support was crucial in granting site access and monitoring instrument status. Further appreciation is directed towards Chris Anthony, Killian Murphy, Steve Andrews, and Jenny Hudson-Bell from WACL for their invaluable help and support throughout the project. Lastly, we thank Stuart Murray and Chris Rhodes from the Department of Chemistry Workshop for their indispensable technical assistance and advice.

Author information

Authors and Affiliations

Centro de Investigación en Tecnologías para la Sociedad, Universidad del Desarrollo, Santiago, CP, 7550000, Chile
Sebastian Diez
Wolfson Atmospheric Chemistry Laboratories, University of York, York, YO10 5DD, UK
Sebastian Diez, Stuart Lacy & Pete Edwards
Grupo de Estudios de la Atmósfera y el Ambiente (GEAA), Universidad Tecnológica Nacional, Facultad Regional Mendoza (UTN-FRM), Cnel. Rodriguez 273, Mendoza, 5501, Argentina
Josefina Urquiza
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
Josefina Urquiza

Authors

Sebastian Diez
View author publications
Search author on:PubMed Google Scholar
Stuart Lacy
View author publications
Search author on:PubMed Google Scholar
Josefina Urquiza
View author publications
Search author on:PubMed Google Scholar
Pete Edwards
View author publications
Search author on:PubMed Google Scholar

Contributions

Sebastian Diez: Conceptualization, writing – original draft preparation, experimental deployment, maintenance and technical preparation of sensor systems and data processing. Stuart Lacy: Data infrastructure, data processing, data quality control, preparation of data files, writing – original draft preparation. Josefina Urquiza: review of meta datafiles, writing – original draft preparation. Pete Edwards: Conceptualization, supervision experimental deployment, writing – original draft preparation.

Corresponding author

Correspondence to Sebastian Diez.

Ethics declarations

Competing interests

The authors declare that there are no competing financial or personal interests that could have influenced the work described in this article.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Diez, S., Lacy, S., Urquiza, J. et al. QUANT: a long-term multi-city commercial air sensor dataset for performance evaluation. Sci Data 11, 904 (2024). https://doi.org/10.1038/s41597-024-03767-2

Download citation

Received: 06 March 2024
Accepted: 09 August 2024
Published: 21 August 2024
DOI: https://doi.org/10.1038/s41597-024-03767-2

This article is cited by

A framework for advancing independent air quality sensor measurements via transparent data generating process classification
- Sebastian Diez
- Thomas J. Bannan
- Erika von Schneidemesser
npj Climate and Atmospheric Science (2025)
Seasonal dynamics and trends in air pollutants: A comprehensive analysis of PM2.5, NO2, CO, SO2 and O3 in Houston, USA
- Mohammad Jahirul Alam
- Irfan Karim
- Shahid Uz Zaman
Air Quality, Atmosphere & Health (2025)

Subjects

Abstract

Similar content being viewed by others

SensEURCity: A multi-city air quality dataset collected for 2020/2021 using open low-cost sensor systems

UK daily meteorology, air quality, and pollen measurements for 2016–2019, with estimates for missing data

The effect of national protest in Ecuador on PM pollution

Background & Summary

Methods

Systems selection

Co-location sites

Sensor systems deployment

Sensor data collection

Duplicate reference instrument deployment

Data Records

Hourly records for sensor systems, reference and duplicate reference monitors

Interactive data visualization platform: the QUANT Shiny app

Technical Validation

Sensor data quality assurance and quality control

Documentation practices

Sensor data availability

Reference data sharing

Reference data validation

Calibrated data products

Limitations of the QUANT dataset

Usage Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A framework for advancing independent air quality sensor measurements via transparent data generating process classification

Seasonal dynamics and trends in air pollutants: A comprehensive analysis of PM2.5, NO2, CO, SO2 and O3 in Houston, USA

Search

Quick links