Assessing and correcting neighborhood socioeconomic spatial sampling biases in citizen science mosquito data collection

Padilla-Pozo, Álvaro; Bartumeus, Frederic; Montalvo, Tomás; Sanpera-Calbet, Isis; Valsecchi, Andrea; Palmer, John R. B.

doi:10.1038/s41598-024-73416-6

Download PDF

Article
Open access
Published: 28 September 2024

Assessing and correcting neighborhood socioeconomic spatial sampling biases in citizen science mosquito data collection

Álvaro Padilla-Pozo^1,2,3,4,
Frederic Bartumeus^3,5,6,
Tomás Montalvo^7,8,9,
Isis Sanpera-Calbet⁴,
Andrea Valsecchi⁷ &
…
John R. B. Palmer⁴

Scientific Reports volume 14, Article number: 22462 (2024) Cite this article

3336 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Climatic, ecological, and socioeconomic factors are facilitating the spread of mosquito-borne diseases, heightening the importance of vector surveillance and control. Citizen science is proving to be an effective tool to track mosquito populations, but methods are needed to detect and account for small scale sampling biases in citizen science surveillance. In this article we combine two types of traditional mosquito surveillance records with data from the Mosquito Alert citizen science system to explore the ways in which the socioeconomic characteristics of urban neighborhoods result in sampling biases in citizen scientists’ mosquito reports, while also shaping the spatial distribution of mosquito populations themselves. We use Barcelona, Spain, as an example, and focus on Aedes albopictus, an invasive vector species of concern worldwide. Our results suggest citizen scientists’ sampling effort is focused more in Barcelona’s lower and middle income census tracts than in its higher income ones, whereas Ae. albopictus populations are concentrated in the city’s upper-middle income tracts. High resolution estimates of the spatial distribution of Ae. albopictus risk can be improved by controlling for citizen scientists’ sampling effort, making it possible to provide better insights for efficiently targeting control efforts. Our methodology can be replicated in other cities faced with vector mosquitoes to improve public health responses to mosquito-borne diseases, which impose massive burdens on communities worldwide.

Aedes albopictus in a recently invaded area in Spain: effects of trap type, locality, and season on mosquito captures

Article Open access 25 January 2024

Deep learning identification for citizen science surveillance of tiger mosquitoes

Article Open access 25 February 2021

Spatial and seasonal dynamics of mosquito species in Harris County, Texas, highlight risk of arbovirus introduction and transmission

Article Open access 25 March 2025

Introduction

Climatic, ecological and socioeconomic factors are shifting and expanding the ranges, seasons, and fecundity of disease vector mosquitoes^{16,55,62,73,78,79,86}, while also driving faster and earlier multiplication of pathogens^14,83, all of which heightens the importance of vector surveillance and control efforts^18,92. In the absence of viable vaccines, the World Health Organization (WHO) has long highlighted vector control as an essential but underutilized strategy for addressing the increasing global burden of vector-borne disease⁹⁰, and the calls for member states to strengthen vector control and surveillance are becoming increasingly urgent^3,91,92.

Responding to these calls requires adding new tools and new ways of thinking to traditional entomological approaches^26,58,74. Citizen science is proving to be a particularly effective one^5,12,29,70, so much so, that it is now being incorporated explicitly into national and local vector control strategies^60,61. Citizen science involves participation in scientific research by members of the public who need not be professional scientists^37,41,66. Participation ranges from “classical” scientific observation (like the century-old Christmas Bird Count), to networked citizen science that harnesses the internet and digital sensors to link participants and enable involvement in different parts of the research process³⁶.

One of the main challenges of citizen science is to collect reliable data that can be used to make unbiased inferences. This is especially important for projects that rely on citizen scientists’ observations and reporting but have no participation restrictions—i.e., people can participate whenever and wherever they want. In these types of projects, it is generally necessary to distinguish between variation in the phenomenon of interest and variation in citizen scientists’ sampling effort²² because citizen scientists tend to sample some areas more than others⁴⁶. For example sampling may be more intense in areas closer to participants’ homes²¹ or closer to cities or roads⁹. Sampling effort may also be mediated by a variety of other physical and socioeconomic spatial characteristics, including human population density, altitude, land cover type, degree of urbanization, and aggregate income levels of resident populations^32,81.

From a vector surveillance perspective, this issue is crucial because human-related factors generally have the most unpredictable influence on vector dynamics in urban ecosystems⁴⁹. A better understanding of the role of these factors in citizen science participation makes it possible to learn more about their influence on vectors themselves. The analysis of potential spatial sampling biases in mosquito vector population estimates that use data collected from citizen scientists is also crucial from a health equity perspective, as these estimates are increasingly used by local, regional, and national public health authorities as an integral part of vector surveillance and control⁶⁰. Reducing spatial biases in these estimates allows for more efficient control interventions and targeting of surveillance, consequently reducing the risk of diseases that place massive burdens on society around the world and that widen existing health inequalities^8,56.

In the present study, we investigate how socioeconomic census tract characteristics in Barcelona shape reporting patterns in Mosquito Alert, a networked citizen science platform used for vector mosquito surveillance^5,47,70. We untangle the effects of census tract characteristics on participation from their effects on mosquito populations and the risk of exposure using ground truth data about the presence of mosquitoes. This allows us to reduce spatially structured socieconomic biases in estimates of mosquito distributions relied on by public health authorities.

Mosquito Alert leverages mobile phone sensors, the Internet, expert validation, and artificial intelligence to enable people without any prior training to identify and report vector mosquitoes^5,47,70. Since its launch in 2014, the Mosquito Alert system has been operational in Barcelona^67,70, and Barcelona’s public health agency, the Agència de Salut Pública de Barcelona (ASPB), has gradually incorporated it into its vector surveillance program^59,61.

Since June 2014 Mosquito Alert has received over 197,353 reports from 67,276 citizen scientists in 183 countries. Of these reports, 54% represent adult mosquitoes, 35% mosquito bites, and 10% mosquito breeding sites. (For the latest participation statistics, updated daily, see https://labs.mosquitoalert.com/participation/.) Seventy-seven percent of adult and site reports are accompanied by at least one photograph, and all adult reports with photographs are validated within Mosquito Alert’s Digital Entolab system by entomologists who review the photographs and score each report’s probability of representing one of five vector species (or species complexes) of public health concern: Ae. albopictus, Ae. aegypti, Ae. japonicus, Ae. koreicus, or Culex pipiens⁴⁷.

The Mosquito Alert system was designed to collect not only citizen scientists’ reports of vector mosquitoes, but also information about sampling effort that can be used to correct for bias. To protect participants’ privacy, however, its approach relies on anonymized background tracking masked to a grid of 0.025 degrees latitude and longitude (see Suppl. Info. Fig. 1)^12,47,70. This makes it possible to estimate variation in sampling effort between cells of approximately 4 $km^2$, but not within these cells. Yet, smaller-scale variation in sampling effort may well be an important source of bias in the type of census tract-level analysis necessary for effective mosquito surveillance and control in cities.

In Barcelona, the mosquito species of primary concern is the tiger mosquito, Ae. albopictus, an invasive species that poses a nuisance and a public health risk through its aggressive daytime biting and its ability to transmit dengue, chikungunya, Zika and a wide range of other viruses^15,20,38,77. Although autochthonous transmission of these viruses has not yet been detected in Barcelona, the presence of an established Ae. albopictus population in the city is particularly worrying given this species’ high vector competence and the city’s position as a major travel and tourism hub, with frequent cases of imported mosquito-borne infections^{7,34,35,52,59}.

Barcelona contains 1068 census tracts, with important variation along many socioeconomic dimensions including income (Fig. 1). Our key variable of interest is the yearly net mean income per consumption unit of each census tract, which we use as a proxy for area-level socioeconomic status. Worldwide, evidence suggests that Aedes mosquitoes often cluster in poorer areas^{17,76,80,87,89} due to community features such as higher concentrations of unmanaged water containers (e.g., flower pots), overgrown vegetation, and poorly maintained or abandoned buildings^{17,49,65,76,87}. Nevertheless, Aedes mosquito populations are also capable of clustering in census tracts with high socioeconomic status due to the availability of breeding sites created through gardening and other activities involving water inputs⁴⁹ like swimming pools¹⁷. In the case of Barcelona, very little is known about the relationship between census tract socioeconomic characteristics and Aedes mosquito populations. Our analysis helps to address this gap for the first time, while also adding to the literature on this topic globally.

Results

We examine citizen scientists’ reporting through Mosquito Alert in Barcelona from the time of the system’s launch in June 2014 through December 2023. This period encompasses 10 mosquito seasons, with the seasons running each year from approximately April through November. We start by examining overall participation by citizen scientists in the form of their submission of adult mosquito reports and mosquito bite reports through the system (Section "General participation model"). We then analyze citizen scientists’ Mosquito Alert reports in the immediate vicinity of catch basin drains with known mosquito activity to better isolate the socioeconomic determinants of participation levels from those of mosquito populations (Section "Active catch basin drain participation model"). We combine these results with expert-validated Ae. albopictus reports from Mosquito Alert to analyze the socioeconomic determinants of this species’ populations in Barcelona’s census tracts and improve high resolution estimates (Section "Mosquito Alert vector model"). Finally, we compare our findings to an independent set of estimates based on data from 51 traditional adult mosquito traps positioned around the city between 2018 and 2022 (Section "Mosquito trap vector model"). We focus our discussion on our estimated coefficients on the income variable, but we include coefficient estimates for other variables and tables of model comparisons in the Supplemental Information.

Citizen scientist participation patterns

From June 2014 through December 2023 Mosquito Alert received a total of 5,393 adult mosquito and mosquito bite reports in Barcelona (3,563 adults and 1,830 bites). These reports were heterogeneously distributed across the city and covered most of its area (79% of its census tracts). We observe a lower density of reports in the southernmost area of the city, which is an industrial and commercial area, as well as the northeastern and northern border zones of the city, which mostly consist of natural areas like a marsh and lands covered by forest (Fig. 2a).

The distribution of mean income by consumption unit (from now on referred to as “income”; we describe its calculation in the Methods section) across Barcelona’s census tracts ranges from 10,399 euros to 51,033, with a median of 23,510 (calculated in terms of the number of tracts, not people). Twenty-five percent of Barcelona’s census tracts have mean incomes below 19,970 euros; 75% have mean incomes below 27,517 euros. Spatially, income is highly autocorrelated in Barcelona (Fig. 1), with a correlation coefficient of 0.84 between the income of each census tract and that of its neighboring tracts. Overall, we find 60% of Mosquito Alert reports in census tracts above the median; we find 25% in census tracts in the highest income quartile (over 27,517 euros), and 19% in the lowest quartile (under 19,970 euros).

General participation model

We analyze the Mosquito Alert participation data in a Bayesian logistic regression framework, modeling the log odds of a report being sent from a given point in Barcelona as a function of census tract income, population density, mean age, and proportion of households composed of only a single member (see Methods section for a description) using an intrinsic conditional auto-regressive term to account for the spatial autocorrelation of our covariates across neighboring census tracts. We transform our presence-only data (reports of mosquitoes) to a binary variable by randomly sampling pseudo-absences within the perimeter of the city of Barcelona (we describe this in detail in the Methods section). In this model (hereinafter General Participation Model) we find that, all else equal, the probability of citizen scientists’ participation has a very weak inverted-U-shaped relationship with income (Fig. 3), but there is too much uncertainty to draw firm conclusions about this relationship. We do see a clearly positive relationship between the probability of submitting a report to Mosquito Alert and population density, with the effect diminishing as density increases (Suppl. Info. Fig. 2). We also see a positive relationship with the proportion of single-member households (Suppl. Info. Fig. 3), and a negative relationship with age (Suppl. Info. Fig. 4), but in both cases there is a lot of uncertainty in the estimate. This model does not allow us to untangle whether the predicted probabilities of submitting a mosquito report are due to socioeconomic neighborhood-level processes (e.g., a higher concentration of residents with more free time), a higher (or lower) density of mosquitoes, or both. Testing this requires some independent source of information about the actual mosquito population, and for this, we turn to official mosquito surveillance and control data from the ASPB.

Active catch basin drain participation model

The ASPB regularly monitors mosquito activity in and around catch basin drains in public spaces of Barcelona, recording mosquito activity whenever detected³⁰. We rely on the locations of 1024 drains in which the ASPB recorded mosquito activity from 2019-23. We model the presence or absence of citizen scientists’ Mosquito Alert reports within 200 m of those drains in which the ASPB detected mosquito activity during the year in which the report was sent (Fig. 2b) as a function of Mosquito Alert sampling effort (described in Methods) along with the set of independent variables used in the Mosquito Alert General Participation Model. This enables us to estimate the probability of a citizen scientist reporting mosquito activity to Mosquito Alert given that mosquito activity was actually occurring. We again use a Bayesian logistic regression, but in this case, instead of fitting a spatial autocorrelation term, we take a multilevel approach, giving each drain a random intercept to account for the fact that some of the drains are observed as active repeatedly (in multiple years). We refer to this model as our Active Catch Basin Drain Participation Model.

The Active Catch Basin Drain Participation Model shows a concave and mostly negative relationship between census tract income and probability of reporting (Fig. 4). We find the highest probabilities of reporting mosquitos in census tracts with incomes just below 30,000 euros (the highest income quartile among Barcelona’s census tracts), with probabilities then dropping at higher and lower incomes. Holding our other covariates at their means, the probability of a citizen scientist reporting mosquito activity near an active drain is between 40% and 50% in census tracts with incomes under 30,000 euros, and it drops to 30% in the highest-income tracts (Fig. 4). We also see the same positive, diminishing effect of population density as in the General Participation Model (Suppl. Info. Fig. 5), and here we see clear positive relationships with the proportion of households composed of a single person (Suppl. Info. Fig. 6) and with Mosquito Alert sampling effort (Suppl. Info. Fig. 7) and a clear negative relationship with age (Suppl. Info. Fig. 8).

As a robustness check, we test whether the relationships estimated in this model hold if we randomly eliminate catch basin drains from the dataset to ensure that no two modeled drains are within 200 m of one another. We find largely the same relationships in this case, albeit with more noise (Suppl. Info. Figs. 9, 10, 11, 12). The relationship between the probability of reporting a mosquito and income appears more like an inverted U in this model, showing a larger variation of sampling probability values, with much lower sampling effort in both lower and higher income tracts (Suppl. Fig. 9). Nonetheless, we attribute this to the loss of data from eliminating the drains closer than 200 m to one another.

Vector-mosquito risk

We interpret the relationships estimated in the Active Catch Basin Drain Participation Model as reflecting sampling biases—systematic differences in the probabilities of citizen scientists reporting mosquitoes when mosquitoes are present, based on socioeconomic neighborhood characteristics. These sampling biases are important not just for what they tell us about the socioeconomic determinants of citizen science participation, but also because correcting for them should make it possible to disentangle the determinants of mosquito population sizes from the determinants of citizen scientists’ participation. In the same way that we use ASPB data on mosquito activity in the sewage system to control for variation in the actual mosquito distribution, we can use the predictions from the Active Catch Basin Drain Participation Model to control for sampling bias in a mosquito distribution model.

Mosquito Alert vector model

Specifically, we model the probability of Ae. albopictus presence, as measured through validated Mosquito Alert reports, controlling for the predicted sampling effort from our Active Catch Basin Drain Participation Model. We again rely on Bayesian logistic regressions. As in the General Participation Model, we use an intrinsic conditional auto-regressive term to account for the spatial autocorrelation of our covariates across neighboring census tracts. In this case, however, the presences in our outcome variable are locations of expert-validated Ae. albopictus reports (as opposed to all adult and bite reports, regardless of validation status, as modeled in the General Participation Model). In this case we generate pseudo-absences by sampling random points across Barcelona with sampling weights proportional to the sampling probabilities predicted from the Active Catch Basin Drain Participation Model. In addition to using the predicted sampling probabilities as sampling weights, we also use the log of these probabilities as an offset in the model to control for any extra socioeconomic bias for which the sampling weights do not account. Finally, we control for proximity to privately owned spaces with green land cover, a variable that the ASPB regularly uses in its mosquito surveillance and control planning based on habitat suitability of green land cover and the difficulty of carrying out mosquito control in privately-owned spaces due to lack of access.

This model estimates the relationship between census tract mean income and the probability of Ae. albopictus presence to be concave, but mostly increasing (Fig. 5). Holding the other variables at their means, the probability is lowest, below 1%, in the lowest-income census tracts. It increases to a maximum of just over 5% in census tracts with mean incomes between 40,000 and 50,000 euros and then decreases beyond that point, dropping to 5% in the highest-income tracts. The model also estimates a clear positive relationship between the probability of Ae. albopictus presence and proximity to privately owned spaces with green land cover (Suppl. Info. Fig. 13).

Figure 6 shows the predicted probabilities from this model on a regular lattice of 256,898 points across Barcelona, spaced at 20 m intervals, and it compares these predictions to those from the same model without controlling for sampling effort. The map in Figure 6a shows the predictions from the model with sampling effort, i.e., the predicted probabilities of reporting a mosquito from our Active Catch Basin Drain Participation Model, while the one in Figure 6b shows the differences between these predictions and those from the model without sampling effort, subtracting the no-sampling-effort model predictions from the sample-effort-model predictions. In this second map, the orange and red areas are those in which the model without sampling effort under-predicts the probability of Ae. albopictus presence, while the green and blue ones are those in which it over-predicts the probability. The model without sampling effort mostly over-predicts the probability of Ae. albopictus presence. It does so by more than 10 percentage points in 9 out of Barcelona’s 10 districts (Suppl. Info. Tab. 1). Under prediction by more than 10 percentage points is seen in only 3 of the 10 districts (Suppl. Info. Tab. 1).

Mosquito trap vector model

As a final step, we partially validate these results by testing the relationship between census tract income and a total of 1368 female Ae. albopictus counts that we recorded in 51 BG Sentinel-2 adult mosquito traps placed around Barcelona between 2018 and 2022 (Fig. 2c). These counts and the sampling process on which they are based have no connection to citizen scientists and so provide an independent source of information about the mosquito distribution in Barcelona. Controlling for the effects of weather (since traps were checked at different times throughout the mosquito season) and land cover, we find a similar relationship with income as in the Mosquito Alert Vector Model: The predicted counts of female albopictus in traps go up as we move from the lowest income census tracts to the tracts with mean incomes of 40,000 euros (Fig. 5). The predictions appear to peak at slightly lower incomes than in the Mosquito Alert Vector Model, but they are also much more noisy (Fig. 5). We hypothesize that this is due to the fact that there were very few traps in high income tracts: only 89 out of 1368 (6.5%) trap counts are from traps in tracts with mean income over 35,000 euros, and none are over 42,283. In contrast, the Mosquito Alert Vector Model is estimated using reports (both presences and pseudo-absences) from tracts with mean incomes up to 51,000.

Discussion

These results show how spatially structured socioeconomic characteristics shape sampling bias in the Mosquito Alert citizen science reporting in Barcelona. In addition, they show how a better understanding of that bias makes it possible to learn more about the socioeconomic determinants of the small-scale spatial distribution of disease vector mosquitoes and to improve our predictive models of mosquito populations at high resolution (20m).

We find a predominantly negative relationship between census tract mean income and the probability of submitting a mosquito report to Mosquito Alert (Fig. 4). Once we control for this sampling bias, the modeled citizen science data shows a positive relationship between census tract mean income and Ae. albopictus probabilities up to incomes above 40,000 euros, which is largely the same as the relationship we find when we model mosquito trap data. Beyond these tract mean incomes the citizen science data suggests that probabilities begin to decrease, but there is little mosquito trap data with which to make a comparison at this income range values. Note also that 40,000 euros is at the top of Barcelona’s census tract mean income distribution, only 5% of Barcelona’s census tracts have higher mean incomes.

Whereas citizen science often faces criticism for insufficiently including communities with lower resources^42,71, our results suggest that citizen science mosquito sampling effort in Barcelona is actually higher in these areas than in others. We hypothesize that higher-income census tracts in Barcelona have lower probabilities of submitting reports to Mosquito Alert because their residents are less exposed to mosquitoes, irrespective of mosquito prevalence, because they have higher access to air conditioning, private transportation, mosquito window nets, mosquito repellents, indoor working environments, and other amenities that reduce human-mosquito contact.

From a health and spatial inequality perspective, hazards are theorized to accumulate in poorer areas as a reflection of inequality processes at the individual level as well as structural decisions to organize space in ways that perpetuate inequality^23,33,40. Empirical evidence supports this idea worldwide, e.g.^13,94, even in spaces that make substantive efforts to make themselves more equitable⁸⁵. This is also the case in Barcelona when it comes to hazards such as air pollution⁵³.

Our models suggest a more complicated story when it comes to vector mosquitoes, with higher prevalence in wealthier areas. Although this is not the most common distribution pattern observed worldwide, our results emphasize the heterogeneous and location-specific nature of the association between socioeconomic status and Aedes mosquito presence that has been recently highlighted by other scholars⁸⁸. In the case of Barcelona, the positive association between tract-level income and Ae. albopictus presence makes sense since higher income housing in Barcelona is characterized by having higher access to outdoor private spaces like terraces and gardens, where mosquitoes can proliferate easily. This does not mean that mosquito-borne disease infections should be expected to follow the same pattern as mosquito vector prevalence. Were an outbreak to occur in Barcelona, many factors could cause transmission to amplify in different parts of the city. For example, human mobility flows can facilitate the rapid spread of infected people in the viremic phase, moving the virus between mosquito subpopulations across neighborhoods¹. Moreover, disease transmission depends not just on the density of mosquito populations but also on mosquito behavior, the density of human populations and on individuals’ exposure to mosquito bites.

Notably, the Agència de Salut Pública de Barcelona (ASPB) is making an effort to monitor mosquito presence in the city and avoid the production of mosquito-related health inequalities. In its latest report⁶¹, ASPB found that 59% of potential mosquito risk factors (e.g., public catch basins or fountains) were in the low-income areas of the city. Consequently, the ASPB decided to increase monitoring of mosquito activity and related risk factors in these areas, which now represent 67% of its monitoring activity⁶¹. This may well explain our finding, from both the Mosquito Alert Vector model and the Mosquito Trap Vector model, that Ae. albopictus probabilities are lower in the low-income areas.

The prevalence of Ae. albopictus in higher-income census tracts also suggests a mechanism by which this species’ population may be supported in Barcelona: metapopulation stability. A set of interconnected Ae. albopictus subpopulations utilizing different types of breeding sites, some in public spaces and others in private spaces, is more likely to survive and rebound after concerted control programs¹. For example, when control programs focus on public spaces, they may reduce the subpopulations there, but those subpopulations can be replenished through in-migration from subpopulations in private spaces. Even with their limited dispersal capacity, Ae. albopictus individuals can cross the boundaries between high and low-income areas by flying between adjacent sites, through human-aided dispersal in vehicles^28,54, or through slower multi-generational processes.

In addition to these findings about citizen science participation and the spatial distribution of mosquito populations, our approach here offers a reliable and replicable method for other citizen science projects that collect anonymized data to assess their socioeconomic spatial sampling biases as long as they have access to some source of ground truth data about the presence of the phenomenon they study. Studying the socioeconomic distribution of the underlying phenomena observed by citizen scientists in any field is crucial, as it can overlap with detrimental outcomes and trigger a process of widening social inequality.

Methods

The core of the data for this study consists of anonymous geolocated reports sent by citizen scientists through the Mosquito Alert citizen science system^5,47,70. We combine these reports with data from mosquito traps that we collected during 2018-22, data from the ASPB’s surveillance of catch basin drains during 2019-23, sociodemographic data from the Spanish National Statistical Institute (https://www.ine.es) and the Barcelona City Council’s Open Data Portal (https://opendata-ajuntament.barcelona.cat), and weather data from the Meteorological Service of Catalonia.

We estimate all models using Hamiltonian Monte Carlo Markov chain sampling with the no U-turn sampler implemented by brms 2.19.0^10,11, cmdstanr 0.7.1, and cmdstan 2.32.2³¹ in R 4.4.1⁷². We use weakly informative priors for our population effects and estimate complete posterior distributions for each parameter through 4 parallel chains and 2000 iterations.

Mosquito Alert data

We rely on descriptive content of the Mosquito Alert reports and the report dates and locations, along with optional anonymous background tracking masked to a grid of 0.025 degrees latitude and longitude (approx. 4 $km^{2}$, see Supplemental Fig. 1). Anyone can participate in Mosquito Alert as a citizen scientist if they are 18 or older. Citizen scientists participate by using a mobile phone app to report adult mosquitoes, breeding sites, and mosquito bites. Participants can optionally include photographs with adult mosquito reports, and these photographs are reviewed by a large international team of entomologists, who classify them by target species using Mosquito Alert’s digital Entolab⁴⁷. Mosquito Alert operates around the world and collaborates with local and national public health agencies to ensure that the vector intelligence it produces is useful and actionable^{12,29,47,60,74}. The system avoids collecting socio-demographic or economic information about participants for privacy reasons. However, a recent analysis of a voluntary sample of citizen scientists who had registered to use Mosquito Alert in Spain from August 2020 to September 2021 showed that participants are more likely to be women, middle-aged (30-49), employed, high earners, and live in urban spaces¹⁹. The tendency of higher earners to participate in Mosquito Alert in Spain as a whole contrasts with the lower propensity to report mosquitoes in the highest income areas of Barcelona. Notably, the first finding is based on a voluntary sample of the Mosquito Alert citizen scientist pool all over Spain who had begun participating in 2020-21, whilst the findings in this paper are based on mosquito bite and mosquito reports collected specifically in Barcelona and over a much wider time period (2014-23). Together, these results suggest that participation “profiles” may vary within countries and over time.

The present study uses 3563 reports of adult mosquitoes and 1830 reports of mosquito bites collected from Barcelona Municipality from June 2014 through December 2023 through Mosquito Alert⁶⁹. The study also relies on a subset of 992 of the adult mosquito reports that contained photographs assessed by a team of expert entomologists as showing Ae. albopictus. That subset was derived from the full dataset of expert-validated adult reports downloaded on 20 February 2024⁷⁵. Barcelona is the city with the highest concentration of reports in the Mosquito Alert system, in part because it is where the system was first launched and has received press attention for the longest time.

Mosquito Alert sampling effort data is taken from the open data published by Mosquito Alert daily and downloaded on 20 February 2024⁶⁴. Specifically, we rely on the SE_expected variable in the sampling_effort_daily_cellres_025.csv.gz dataset. The dataset provides values on a grid of 0.025 degrees longitude and latitude (see Suppl. Info. Fig. 1). Each cell in the grid is referred to as a sampling cell, and it is used to protect the anonymity of the sampling effort data collected from participants⁷⁰. The SE_expected variable represents the estimated expected number of participants sending at least one report from each sampling cell during each day from 2018 to present. The estimates are calculated by first estimating the discrete hazard function²⁵ for participants’ reporting over time since registering with Mosquito Alert. In other words, we calculate the empirical probability of participants sending a report through Mosquito Alert on each day since they register, treating reporting as a repeated event². We then apply this empirical probability to each participant in the background tracking data to estimate their probability of sending a report on the date in question. For the SE_expected variable, we aggregate these probabilities by summing them by sampling cell (see Suppl. Info. Fig. 1) and by day to get the expected number of participants sending at least one report from each cell each day. In other words, SE_expected for sampling cell c on day d is:

$$\begin{aligned} {\text {SE}}\_{\text {expected}}_{cd} = \sum _{i=1}^{N_{cd}}P(x_{icd}) \end{aligned}$$

(1)

where $P(x_{icd})$ is the probability of participant i sending at least one report from cell c on day d and $N_{cd}$ is the total number of participants observed in cell c on day d.

We incorporate this variable into the dataset used here by further aggregating each cell by year, summing the daily expected values to get the expected number of participants sending at least one report from each cell each year.

Mosquito trap data

We rely on data that we collected from BG Sentinel-2 mosquito traps with BG lures (BioGents Corporation, Regensbourg, Germany). These traps attract adult, host-seeking female mosquitoes using air currents, visual queues, and (optionally) chemical queues, and capture them with a small suction fan^6,48,51. We deployed 51 traps, between 12 and 24 per year, over four years and four mosquito seasons in cooperation with the ASPB. Trap locations are shown in Fig. 2c. Each trap was checked and mosquitoes were classified and counted approximately every week over the course of each mosquito season. There are a total of 1368 trap observations, 453 in 2018, 165 in 2020, 426 in 2021, and 324 in 2022.

Catch basin drain data

The drain data were collected by the ASPB from 2019 to 2023 as part of its continuous surveillance and control of mosquitoes in the city³⁰. This involves inspecting possible breeding sites in public spaces and applying anti-larval treatments as needed. The triggers for these inspections include direct communications about mosquito nuisance from city residents, Mosquito Alert reporting, and communications of arboviruses cases from relevant authorities, i.e. humans infected with mosquito-born viruses. Some drain inspections are also part of regular surveillance that is carried out in different risk areas of the city (around 80 areas each year). These risk areas are inspected from April to November with an inspection frequency per drain of approximately 1 visit every 3 weeks. In these inspections, all possible sources of breeding sites found in public spaces (mainly streets and roads) are checked (mainly catch basins but also ornamental fountains or other elements that can accumulate water), and data is collected on whether or not mosquito activity is detected.

Socio-demographic data

Our principal socioeconomic indicator in all of our models is the yearly net mean income per consumption unit of each census tract in 2021, measured in euros, obtained from the most recent atlas of income distribution provided by the Spanish National Statistics Institute (INE)⁴³. This variable is calculated as the sum of net incomes generated in a household divided by the units of consumption living in it. Adults are assigned a weight of one unit of consumption, individuals 14-18 years old are given a weight of 0.5, and individuals younger than 14 are assigned a weight of 0.3⁴³. Adjusting household incomes by consumption unit in this way is intended to provide a value that can be more meaningfully compared across families by correcting for differences in consumption needs⁹⁵. This variable is correlated with standard measures of inequality such as the Gini index, with a Pearson correlation coefficient of 0.54.

We obtain the percentage of census tract households composed of a single member and mean age of the census tract residential population from INE’s atlas of income distribution⁴³. We estimate the population density of each census tract by dividing each tract’s population by its area, in square kilometers, which we obtain from INE’s cartography of census tracts⁴⁴.

Landcover data

We use data provided by the ASPB on green areas located in private spaces. This data is generated from Normalized Difference Vegetation Index (NDVI) values taken from 2019 orthophoto data available from the Cartographic and Geographic Institute of Catalonia⁴⁵, cropped to private spaces in Barcelona. The ASPB uses this dataset as part of its surveillance planning based on its experience that these private green spaces serve as important mosquito breeding sites, particularly when breeding sites in public areas are being effectively suppressed. The data consists of a set of polygons indicating private green spaces in Barcelona, and we use this to generate a proximity variable calculated based on the distances between each observation and the closest green area. We transform these distances into a Green Proximity Index (GPI) using the following standard distance decay function:

$$\begin{aligned} \text {GPI} = e^{-a*d^b} \end{aligned}$$

(2)

where d is distance in meters and we set constants a to 0.01 and b to 1. This function takes the value of 1 when the distance is 0, and decreases with greater distances, quickly approaching zero at distances beyond 200 m, which corresponds roughly to the maximum flight-based dispersal distance of Ae. albopictus⁵⁷.

Weather data

Our Mosquito Trap Vector Model relies on historical temperature data obtained from the Meteorological Service of Catalonia. We calculate the 30-day average temperature of each trap observation using data from the five weather stations across Barcelona. The average takes the 30 days up to and including each day on which the trap was checked. We label this variable TMP.