Introduction

The World Health Organization (WHO) estimates that 6–7 million people worldwide, primarily in Latin America, are infected with Trypanosoma cruzi, the etiological agent of Chagas disease1. Trypanosoma cruzi, a multi-host zoonosis, is distributed from the southern USA to Argentinean Patagonia, where it is transmitted by hematophagous triatomines (order: Hemiptera, family: Reduviidae, subfamily: Triatominae) to domestic, synanthropic and sylvatic mammals, including humans2,3. While the majority (60–70%) of chronically infected humans remain asymptomatic throughout life, others develop the most severe symptoms of heart failure and sudden cardiac death4. Despite progress in recent decades, the global burden of Chagas disease is substantial, with an estimated 275,000 disability-adjusted life-years (DALYs) lost in 20195. The classic setting for Chagas disease is rural Latin America, where adobe or low-quality cement houses and the presence of domestic animals favor household and peri-domestic triatomine infestations, and consequently inhabitants experience high levels of exposure to T. cruzi-infected triatomines6.

Except for imported chronic infections in approximately 288,000 migrants from Latin America7, T. cruzi in the USA is almost exclusively confined to sylvatic transmission cycles involving 11 triatomine species: T. gerstaeckeri, T. incrassata, T. indictiva, T. lecticularia, T. neotomae, T. protracta, T. recurva, T. rubida, T. rubrofasciata, T. sanguisuga, and Paratriatoma hirsuta6. Autochthonous vector-borne transmission in the USA is extremely rare, with fewer than 50 case reports since 19556. Triatomines can be found from coast to coast across the southern two-thirds of the continental USA. In the American Southwest the most frequently observed triatomine species are T. protracta (Uhler, 1894), T. rubida (Uhler, 1894), and T. recurva (Stål, 1868)6. Triatoma protracta is composed of five morphologically distinct subspecies, three of which occur in the USA: T. p. navajoensis (Ryckman) in the Four Corners (where Colorado, Utah, Arizona, and New Mexico meet); T. p. woodi (Usinger) in Texas; and T. p. protracta (Ryckman) in California, Nevada, Utah, Arizona and New Mexico; the latter two subspecies are also dispersed across northern Mexico2. Triatoma rubida is composed of five subspecies, T. r. rubida (Uhler), T. r. jaegeri (Ryckman), T. r. cochimiensis (Ryckman), T. r. sonoriana (Usinger), and T. r. uhleri (Usinger). Members of the T. rubida complex occur over a wide geographical range, with T. r. rubida and T. r. uhleri found in Arizona, California, Texas and New Mexico and all five subspecies distributed across northwestern and eastern Mexico2,8. Triatoma recurva is sympatric with T. rubida in northwestern Mexico and Arizona2. Triatomines in the USA are often associated with specific microenvironments, such as woodpiles, rockpiles, rodent nests, livestock pens or dog kennels6. While US triatomines do not tend to colonize modern habitations, with ever growing anthropogenic expansion, the overlap between humans and triatomine habitats is increasing9; in some localities triatomines have been documented infesting older, more precarious housing10. Triatomines of the American Southwest particularly T. protracta and T. rubida, have been implicated in local anaphylactic reactions11,12,13, and are mostly encountered during the late spring and summer months when seasonal flight dispersal occurs in response to starvation14,15; with most flights occurring during the first four hours after sunset16.

Citizen science is a growing area of research, where a network of unpaid community members, with an enthusiasm for science, volunteer their time to contribute to research objectives. Citizen science entomology has proven particularly successful, including triatomine research studies in Texas and Venezuela17,18,19,20,21. iNaturalist, an application established in 2008, is one of the most popular citizen science data portals in the world, where users can record an encounter with any individual organism (by July 2024, more than 198 million observations were reported)22. Triatomine species in the USA have been notoriously difficult to collect longitudinally over a large area, due to lack of a gold-standard trap and difficulty predicting spatiotemporal dispersal14,15.

The primary objective of the study was to model the zones of potentially suitable habitat for T. protracta, T. rubida and T. recurva in the American Southwest, Texas and northern Mexico, using maximum entropy (Maxent) models of iNaturalist observations and bioclimatic variables. Accurate identification of putative hot spots of triatomines can aid with targeted surveillance efforts to address the paucity of contemporary information regarding triatomine species diversity, T. cruzi infection prevalence, and geographical and ecological associations in the American Southwest.

Methods

Study area and species occurrence data

The study focused on the three most frequent triatomine species in the American Southwest (based on iNaturalist observations), namely T. protracta, T. rubida and T. recurva, for distribution mapping and analysis (Fig. 1). The American Southwest is defined for this study as Arizona, California, Colorado, Nevada, New Mexico, and Utah. As the distribution of these three species expands beyond the American Southwest, data were also included for Texas and northern Mexico.

Fig. 1
figure 1

Dorsal photographs of the three most frequent triatomine species of the American Southwest reported on iNaturalist. Image source: Dr. Richard Oxborough.

iNaturalist data processing and sampling bias correction

Each recorded observation of an organism on iNaturalist includes name of user, date of observation, geolocation of observation, geospatial accuracy, name of species observed and photo of observation. A growing online community of citizens and experts review submitted observations and confirm species identification. The number of triatomine observations has risen in recent years, presumably due to increased recognition of the iNaturalist platform, improved quality of mobile phone applications and cameras, and recent public health campaigns and media reports raising Chagas disease awareness23. As of November 2023, there were 722 triatomine observations in the American Southwest reported on iNaturalist, with seven species represented. Supplementary Fig. 1 indicates that the total reported annual observations (with confirmed species identification) for the three most frequent triatomine species of the American Southwest increased gradually since iNaturalist platform inception, with a substantial rise in recent years from 41 records in 2019 to 149 in 2022. Supplementary Fig. 2 indicates the total number of confirmed observations made by state and by triatomine species for T. protracta, T. rubida and T. recurva. Triatoma protracta was reported in all states of the American Southwest, with the greatest number of observations made in California. Triatoma rubida was the most reported species in Arizona, although some T. rubida observations came from neighboring states (California, New Mexico, Texas and Northern Mexico). Triatoma recurva occurrences were restricted primarily to Arizona and Mexico. In Colorado, Utah, and Nevada very few triatomine observations were made by iNaturalist users. The probability of detecting a given species is a function of both the likelihood of having chosen an appropriate sampling location and the probability of species detection at that location24. The human population in California (39.24 million) is far greater than other states in the American Southwest and this is likely to lead to greater triatomine-human overlap and sampling bias. A calculation of total triatomine (T. protracta, T. rubida and T. recurva) observations divided by human population size shows that triatomine observations per capita were highest in Arizona, New Mexico, California and Sinaloa (Table 1).

Table 1 Triatomines per capita in the Southwestern United States and Mexico. *Total triatomines refers only to number of observations of the three species in this study: T. protracta, T. rubida and T. recurva.

Figure 2 indicates the number of triatomine observations that were exported from iNaturalist and any data exclusions or additions before modelling was undertaken. As model performance decreases rapidly with small sample sizes, no analysis was conducted for T. indictiva (n = 11), P. hirsuta (n = 5), T. gerstaeckeri (n = 2) or T. incrassata (n = 1); sympatric triatomine species reported in the American Southwest. To ensure accuracy of observations reported to iNaturalist, species identification was conducted independently for each data occurrence point, by at least two of the authors (ET, ZS, RMO and LAM), using photographs uploaded by users to iNaturalist. In rare cases of discordant identification, a third author arbitrated. Species identification was performed using the key of Lent and Wygodzinsky and a simplified guide by the US Centers for Disease Control and Prevention25,26. Observations were excluded from the study if photographs on iNaturalist were not of sufficient quality to allow for accurate identification. Only observations of adult triatomines were included; nymphal stages were excluded due to greater difficulty of identification to species level using photographs alone. Observations were filtered to those with < = 1 km spatial accuracy, resulting in a total of 533 confirmed records extracted from the iNaturalist database, with observations made from 21-July-2002 through 27-November-2023. Exclusion of observations with > 1 km spatial accuracy reduced the number of observations for T. protracta from 414 to 286, for T. rubida from 207 to 163 and T. recurva maintained 84 occurrences. Spatial filtering is recommended for use in situations when the restriction of the species occurrence is the result of sampling bias27. Spatial thinning was applied using the R package spThin (version 0.2.0)28 for 10 iterations at 10 km for the minimum neighbor distance, to reduce sample bias from multiple observations from the same location (often from an observer’s land or house). Per species, this reduced the total number of observations for T. protracta from 286 to 160, for T. rubida from 163 to 76 and for T. recurva from 84 to 37. Supplementary Fig. 3 indicates the locations of observations for each species that were included for modelling.

Fig. 2
figure 2

Flow chart demonstrating the number of observations per species exported from iNaturalist, exclusions, additions and final number of observations used in the Maxent models (indicated in bold font in the lower-most boxes).

MaxEnt model parameters

Predictive modelling of T. protracta, T. rubida and T. recurva was performed using Maxent (version: 3.4.4), a machine-learning method by which environmental variables and presence-only or presence-absence data are used to predict probability of presence, potential distribution, and habitat suitability29,30. Optimal model features for each species were imported from ENMeval version 2.0.431. Ten thousand background points were generated randomly within the study area for comparison to presence locations. The training data, or the data used to create the model, was 70% of the sample data selected randomly, and 30% was used as test or validation data to quantify the model’s performance. A raster file containing the complimentary log-log (cloglog) transformed output values was generated and imported into QGIS version 3.34 for visualization32. Response curves were generated for each of the included variables. Variable importance was estimated by jackknife cross-validation. Environmental clamping was enabled to restrain predictions to values in the training data.

Environmental variables

In this study, 19 historical bioclimatic factors, averaged for the years 1970–2000, at a spatial resolution of 2.5 arc-minutes, were downloaded from ‘WorldClim version 2’33. The inclusion of highly correlated predictor variables results in multicollinearity and can influence the interpretability of MaxEnt outputs34. Hence, pairwise Pearson’s correlation coefficients were calculated among the bioclimatic variables across a geographic range, approximating the extent of the collective species distributions, with coefficients exceeding 0.80 excluded from the analysis, as they provided redundant information. As a result, nine variables were included in the final model: mean diurnal range, isothermality, maximum temperature of warmest month, minimum temperature of coldest month, mean temperature of wettest quarter, annual precipitation, precipitation of driest quarter, precipitation of warmest quarter and elevation (Supplementary Table 1), while the remaining 11 variables were excluded from further analysis.

Model selection

To proactively avoid potential model overfitting, the R package ENMevaluate 2.0 was used for model selection31. Ten thousand background points were generated randomly across the study area. Data were partitioned via k-fold cross-validation (k = 10) for T. protracta and T. rubida. Data were partitioned via jackknife for T. recurva, as that is recommended for datasets with low sample size35. Inappropriate model complexity can affect model performance36. Models with the smallest change in corrected Akaike Information Criterion score (ΔAICc) were chosen to evaluate further using MaxEnt models (Supplementary Table 2).

Results

Evaluation of model performance

Receiver operator characteristics and omission curves are presented for all three species in Supplementary Fig. 4. The area under the receiver operating characteristic curve (AUC) for test data was 0.847 (± 0.027) for T. protracta, 0.941 (± 0.013) for T. rubida, and 0.958 (± 0.009) for T. recurva. Omission rates were within predicted ranges for T. protracta and T. rubida but were lower for T. recurva.

The most important variables for each species are displayed by jackknife values (Fig. 3), as well as percent contribution to the model and permutation importance values (Table 2). For T. protracta, precipitation warmest quarter had the highest single variable AUC value (0.7619) as well as the highest permutation importance (22.4); annual precipitation (16.7) and isothermality (the extent of day-to-night temperature oscillation relative to annual oscillations) (15.2) also made important contributions to the model. For T. rubida, precipitation driest quarter had the highest single variable AUC (0.7828) and permutation importance (38.9); isothermality (24.7) and maximum temperature warmest month (20.2) also made important contributions to the model. For T. recurva, precipitation driest quarter had the highest single variable AUC (0.7565) and permutation importance (36.1); annual precipitation (12.7), precipitation warmest quarter (12.2), minimum temperature coldest month (12.8), mean diurnal range (12.6) and maximum temperature warmest month (11.4) all contributed to the model.

Regarding annual precipitation, compared to T. rubida and T. recurva, T. protracta was observed at both a higher pooled mean annual precipitation (458 mm/year) and across a wider range (139 mm/year near Fallon, Nevada to 1,489 mm/year north of Sacramento, California). Mean annual precipitation for T. rubida and T. recurva was 354 mm/year and 583 mm/year, respectively. The annual precipitation range of T. recurva was slightly narrower, from 240 mm/year in the Gulf Coast of Sonora, Mexico to 1041 mm/year in Sinaloa, Mexico. T. rubida was found from 71 mm/year in Imperial County, California to 933 mm/year near Mazatlán in Sinaloa, Mexico.

For temperature range, T. protracta was observed at areas with a minimum temperature in the coldest quarter, averaging 1.6ºC with a range of -17 ºC in Saguache County, Colorado to 9ºC in San Diego, California. This species was also reported at both a lower average maximum temperature in the warmest month (32ºC) and across a wider range, from 23ºC in San Diego, California to 41ºC in Glendale, Arizona. By comparison, T. recurva were encountered in areas with the highest average minimum temperature in the coldest quarter of the three species (4.6ºC) and a range of -5.3ºC in Whispering Pines, Arizona to 12.3ºC near Mazatlán in Sinaloa, Mexico. The average maximum temperature in the warmest month of T. recurva localities was 36ºC, with a low of 30ºC near Whispering Pines, Arizona to 40ºC in Wikieup, Arizona. T. rubida were found in areas with an average maximum temperature in the warmest month of 37ºC, ranging from 29ºC in Big Bend National Park, Texas to 43ºC near Brawley, California. T. rubida localities had an average minimum temperature in the coldest quarter of 2.1ºC, ranging from − 6.6ºC in Socorro County, New Mexico to 12ºC in Sinaloa, Mexico.

Elevation was not an important predictive variable for any of the three species. The median elevation was 717 m above sea level (m.a.s.l.) (range:15 to 2,473 m.a.s.l.) for T. protracta, 817 m.a.s.l. (range:-18 to 1,600 m.a.s.l.) for T. rubida and 861 m.a.s.l. (range:15 to 1,664 m.a.s.l.) for T. recurva. Triatoma recurva observations were recorded from near sea level in coastal Sonora, Mexico to the mountain ranges of Southern Arizona, while Triatoma rubida ranged from below sea level near El Centro, California to 1,600 m.a.s.l. in southern Arizona. Triatoma protracta was observed from near sea level in coastal southern California to mountainous areas above 2,000 m.a.s.l. in California and New Mexico.

Fig. 3
figure 3figure 3

Jackknife tests of variable importance by triatomine species.

Table 2 Variable percent contribution (PC) and permutation importance (PI) values.

Spatial predictive maps

Spatial maps displaying the cloglog values across the study area indicated where ecological and climactic factors may favor triatomine occurrence (Figs. 4, 5, 6). Triatoma protracta is predicted to have a wide distribution across all states of the American Southwest, albeit with regional differences in predicted suitability. Highly suitable areas (cloglog 0.8-1.0) were focused along coastal California, extending into northern Baja California, Mexico. The majority of the Central Valley in California (an intensive agricultural area) is predicted to have far lower suitability for T. protracta (cloglog values 0.2–0.4), but the neighboring Sierra Nevada foothills are predicted to be highly suitable (cloglog 0.8-1.0). While elevation per se was not strongly predictive of presence, high mountain areas in the Sierra Nevada range and elsewhere were found to be unsuitable, likely owing to the collective suite of bioclimatic variable relative differences found at higher elevations (e.g., temperature and precipitation). Areas of more moderate suitability (0.2–0.6) included Northern California, Southern Nevada, Northwestern Arizona and Eastern Utah, extending into Colorado.

The predicted areas of suitability for T. rubida resemble a belt running from Southern California eastwards to West Texas, extending into the northern states of Sonora, Chihuahua, Coahuila and Nuevo Leon in Mexico. The largest concentrated region of high suitability was in the southern half of Arizona and along the border between New Mexico, Texas and the Mexican states of Chihuahua and Coahuila.

The model for T. recurva yielded the largest areas of high predicted suitability, extending from Southern Nevada into Northwestern Mexico, with a secondary area of high predicted suitability distributed from Southern Texas into Central and Southern Mexico. Adjacent areas, particularly throughout Mexico, in Northern and Southern California, South New Mexico and in North Texas were moderately suitable.

Fig. 4
figure 4

Predicted suitability range of T. protracta (cloglog values indicate % predicted occurrence likelihood).

Fig. 5
figure 5

Predicted suitability range of T. rubida (cloglog values indicate % predicted occurrence likelihood).

Fig. 6
figure 6

Predicted suitability range of T. recurva (cloglog values indicate % predicted occurrence likelihood).

Discussion

This study is the first to apply user observations from the iNaturalist platform to prepare predictive occurrence maps of potential Chagas disease vector habitat suitability in the USA. Complimentary log-log transformed MaxEnt output values most closely approximate probability of presence. However, this method carries with it assumptions of spatially independent presence/absence data32. Given the nature of the occurrence data from iNaturalist, the maps can be best considered as relative indices of habitat suitability, rather than probabilities of presence34. The number of triatomine observations on iNaturalist in the American Southwest has risen substantially in the last three years, most likely due to increased awareness of the iNaturalist platform, ever improving capabilities of software applications and higher quality mobile phone photographs. In addition to iNaturalist, data from BugGuide and the Global Biodiversity Information Facility platforms have been used to expand our understanding of US triatomine species ranges37. In this study, the number of observations on iNaturalist were sufficient for predictive modelling and are likely to increase substantially in the future, making this a valuable resource, particular for US triatomine species that are difficult to trap across wide areas but can be easily identified using photographs. iNaturalist triatomine observations probably represent both peridomestic, domestic (in yards and inside buildings) and sylvatic triatomines (e.g. when camping/hiking).

Overall, the predicted suitable habitats for T. protracta, T. rubida and T. recurva aligned with previously reported species geographical ranges6. This study indicated predicted zones of high T. protracta habitat suitability along coastal California and the Sierra Nevada foothills, with wide-ranging, but lower suitability across parts of all Southwestern states. Although generally considered to be a triatomine vector species of low epidemiological importance, T. protracta is known to invade human dwellings and has been responsible for autochthonous Chagas disease in the western USA6,38. In the Southwestern USA, triatomine bites from T. protracta and T. rubida frequently cause serious systemic allergic reactions that can lead to anaphylaxis12,38,39. Despite many mammal species being potential reservoirs of T. cruzi, a particularly close association of T. protracta has been established with the genus Neotoma (commonly known as packrats or woodrats), therefore sustaining T. cruzi in natural habitats40. The bioclimatic variables used in the models generally had good predictive power; however, the models did not include other key factors affecting triatomine geographic distribution such as the biotic component of their environmental space or current and historic biogeographic processes41. Elsewhere, niche modeling has been used to calculate areas in Mexico shared between species or subspecies of the T. protracta complex and each species of Neotoma42. Therefore, it is important to note that in addition to climatic variables of importance identified in this study, local species-specific mammalian host-triatomine interactions are likely to influence triatomine species presence. While elevation was not an important predictive variable for presence of any of the three triatomine species in this study, their broad altitudinal ranges were striking and consistent with some of the major triatomine vector species in Central and South America (including Rhodnius prolixus: 5–2964 m.a.s.l.; T. dimidiata: 3–3103 m.a.s.l.; and Panstrongylus geniculatus: 9–2885 m.a.s.l.)43.

Of the three species included in this study, T. rubida is often considered to have the greatest potential for human Chagas disease transmission due to its greater tendency for home invasion44. Triatoma rubida habitat suitability was greatest in southern Arizona, southern New Mexico and along border areas with northern Mexico. Laboratory feeding and defecation studies in northern Mexico concluded that T. rubida could be considered an efficient vector, whereas T. recurva and T. protracta would be considered of secondary importance45. However, despite numerous examples of home invasion in southern Arizona, often with people exposed to hundreds of bites over many years, transmission of T. cruzi appears to remain a rare event9,10. Following investigations into home invasion by T. rubida in the desert Southwest, Klotz et al. concluded that although yearly intrusion likely occurs in some homes, T. rubida does not domiciliate and there is little risk to homeowners of Chagas disease9. Further reasons for low disease infection incidence include the inefficient transmission modality, with studies in Arizona indicating that T. rubida rarely defecates immediately after taking a blood-meal (contrary to defecation studies in Mexico)46.

Spatial thinning likely reduced the sample size of T. recurva to a level that affected the omission rates, with a high proportion of samples left out of the predicted area. Thus, the utility of iNaturalist for modelling rare or cryptic species may be limited. Similar MaxEnt modelling of T. recurva conducted for Chihuahua state, northern Mexico (using just 14 occurrence records) determined that mean temperature of the driest quarter, maximum temperature of the warmest month, precipitation of the driest month, and altitude were the most important predictor variables47. This partially concurs with our finding that precipitation in the driest quarter was the most important predictor variable.

This study is important for predicting zones of human-triatomine interactions; however, there are limitations of this approach, including the biased sampling distribution, which is inevitable when using citizen science observations. To generate true probabilities of presence across the landscape, presence only modelling assumes a random or representative sampling scheme. Points selected from non-random presence data reflect the sample selection distribution rather than the true species distribution48. In most cases, sampling biases are caused by uneven search processes49. In our case, user observations were most likely biased heavily by human population density. The finding that T. protracta observations were greatest in California may be biased by being the most populous state, with an estimated 39 million inhabitants, compared with just 2 million in New Mexico. Detectability and sampling effort were likely affected by the accessibility of the surrounding landscape. A further limitation is the spatial accuracy of point data in iNaturalist which can be manually inputted post-upload, with no associated accuracy metric. Furthermore, geoprivacy options allow an observer 3 levels of obscurity: open, obscured, and private. Open geoprivacy lists the precise location of the observance with the accuracy limited only by the collection device. With geoprivacy set to obscured, location is set within a random point within a 0.2 × 0.2-degree cell (~ 324 km2 at 34 degrees latitude). Private restricts both geospatial data and all associated notes with the observance. With the environmental data set to such a fine grain detail (2.5 arc minute or ~ 4 km2 at 34 degrees latitude), the effect of including low accuracy point data could be substantial. The use of random background points biased predictions towards areas that were more intensively sampled48. MaxEnt allows for the creation of bias grids to select background points with a similar selection bias as the occurrence dataset. However, use of a bias grid is not recommended in datasets such as the one used in this study where sample bias is unknown, or data are collected haphazardly24.

While autochthonous transmission of Chagas disease in the American Southwest has been rare, changes in triatomine ecology can lead to behavioral changes, as demonstrated by increased domestic invasion of T. rubida complex members in northwest Mexico and southern Arizona9,44. Climate change increases the potential for changes in triatomine species distribution in addition to behavioral shifts. Fossil evidence suggests that historic climatic variability caused changes in vegetation abundance, which likely led to shifts in the relative abundance of different Neotoma species50. Given the close host association between some triatomine species, reduced abundance of Neotoma could trigger increased triatomine dispersal and home invasion in search of alternative mammalian hosts. It is important to regularly monitor triatomine species distributions, as well as investigate aspects of their behavioral ecology to identify factors that could lead to greater T. cruzi transmission within the USA. As with all predictive distribution maps, ground truthing through triatomine trapping along transects during the dispersal season would be highly informative, particularly in locations where human populations are low, meaning records are less likely to be present on iNaturalist, to be leveraged for analysis.