Introduction

Lyme borreliosis (Lb) is the most prevalent vector-borne human disease in Europe, with approximately 129,000 cases reported annually from national surveillance systems1. However, as Lb is not a notifiable disease in most European countries, these reports probably underestimate the true incidence. In France, the routine surveillance system estimated approximately 39,000 cases in 2023, but recent analyses of data from a computerised physician decision support system suggest that the actual incidence may have been closer to 191,000 cases during the same year2.

Lyme borreliosis is a zoonotic disease caused by pathogenic spirochetes of the Borrelia burgdorferi sensu lato (Bbsl) species complex, primarily transmitted by the hard tick Ixodes ricinus3,4. In Europe, Borrelia afzelii and Borrelia garinii are the most common genospecies5, predominantly maintained by rodents and birds, respectively6. Their distinct eco-epidemiological cycles, combined with environmental and anthropogenic factors drive spatial heterogeneity in Bbsl distribution7. Since genospecies differ in clinical outcomes, B. garinii being more often linked to neuroborreliosis8, mapping this variation is critical. Human risk likewise varies widely across landscapes, underscoring the need to map this heterogeneity for surveillance, prevention, and public health actions.

Assessing spatial Lb risk typically relies on two layers of information : acarological hazard, defined by the density of host-seeking Bbsl-infected I. ricinus nymphs, and human exposure, often approximated by clinical findings5,9. Both are essential for modelling the local states of the host–vector–pathogen system and predicting infection risk10. Two complementary approaches are used: mechanistic models, which explicitly represent life-cycle and transmission processes11, and statistical models12, which link infection patterns to environmental or socio-economic factors10,13.

Statistical models can target specific components, such as the spatial distribution of Bbsl in ticks, rather than full system dynamics. This focus allows the use of large-scale, spatially explicit data, whereas mechanistic models require extensive parameterization for which data is often difficult and costly to obtain due to the system complexity.

Field methods, such as ticks sampling from hosts, provide insights into pathogen ecology but rarely capture the full diversity of reservoirs within a single study (e.g.14., for birds in France). Standardised questing tick sampling offers a good proxy for tick density and pathogen hazard (see for instance15,16. However, tick sampling is often not standardized at country level, is limited to small areas or accessible environments and lacks information on human exposure. Human case reports from medical practitioners can provide broader spatial coverage of human risk of infection but are often limited by coarse spatial resolution17.

To overcome these limitations, citizen science initiatives have emerged as promising tools to generate geographically informed, large-scale data on human-tick encounters. Such data would otherwise be extremely difficult, if not impossible, to obtain through conventional research means, while also fostering engagement and raising awareness18. In particular, the collection of human-biting ticks is directly associated to human exposure19, and can be used to study spatial variations in the probability of Bbsl genospecies infection in human-biting ticks.

France hosts diverse climates and environments20 that support both tick populations and pathogen reservoirs15,16,21. Previous studies on Bbsl distribution drivers in France were conducted at highly localised scales22,23, making extrapolation to the contiguous national level difficult. At the other end of the spectrum, European-scale models of Bbsl distribution have been developed7, but these rely on literature data which are from France limited and at a very local scale, highlighting the need for extensive data and studies to better understand and predict Bbsl risk in this country suffering high Lyme borreliosis incidence24.

Here, we used human-biting ticks collected through the CiTIQUE citizen-science programme between 2017 and 2019, to study the spatial distribution of Bbsl and its major genospecies across continental France. Combining statistical relative risk mapping, which quantifies the relative density of pathogen presence in relation to its absence, and generalised additive models (GAMs), we characterised the spatial heterogeneity of tick infection risks and identified environmental and ecological drivers shaping the distribution of Bbsl in human-biting ticks, providing a detailed, data-driven picture of Lb eco-epidemiology across continental France.

Results

Descriptive analysis

A total of 1,891 human-biting Ixodes ricinus ticks were screened for pathogens, of which 291 (15%) were found to be infected with Borrelia burgdorferi sensu lato (Bbsl). Among these, Borrelia afzelii and Borrelia garinii were the most prevalent genospecies, infecting 136 (7.2%) and 80 (4.2%) individual ticks, respectively (Table 1). Infections with other genospecies were less common, with 37 ticks (2%) infected with Borrelia valaisiana, 25 (1.3%) with Borrelia burgdorferi sensu stricto, 8 (0.4%) with Borrelia spielmanii, and 5 (0.3%) with Borrelia lusitaniae. No ticks were found to be infected with Borrelia bissettii (Table 1).

Table 1 Count and prevalence of Borrelia burgdorferi sensu lato genospecies found in human-biting ticks per region. Prevalence are in percentage and 95% confidence interval are between brackets. n: total number of analyzed ticks in each region. Region: ARA: Auvergne-Rhône-Alpes; BFC: Bourgogne-Franche-Comté; BRE: Bretagne; CVL: Centre-Val de Loire; GES: Grand-Est; IDF: Île-de-France; NOR: Normandie; NAQ: Nouvelle-Aquitaine; OCC: Occitanie; PDL: Pays-de-la-Loire.
Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Prevalence of Borrelia burgdorferi sensu lato per region in analyzed human-biting Ixodes ricinus ticks (central map), and histograms of the specific prevalence of the main Borrelia species in each region. Central map: n: number of I. ricinus ticks analyzed in the considered region. Histogram: Confidence interval 95%. ARA: Auvergne-Rhône-Alpes; BFC: Bourgogne-Franche-Comté; BRE: Bretagne; CVL: Centre-Val de Loire; GES: Grand-Est; HDF: Hauts-de-France; IDF: Île-de-France; NAQ: Nouvelle-Aquitaine; NOR: Normandie; OCC: Occitanie; PAC: Provence-Alpes-Côte d’Azur; PDL: Pays-de-la-Loire. Reproduced from Durand et al.25 under a Creative Commons Attribution (CC-BY) license.

Factors associated with Bbsl distribution

The results of the first GAM model (M0), which investigated factors associated with Bbsl prevalence, are presented in Fig. 2. After penalisation, only two variables were retained as significant predictors: the I. ricinus habitat suitability index and the grass cover fraction (Table 2). The I. ricinus suitability index showed a positive association with Bbsl prevalence (edf = 1.560; χ² = 6.558; p-value = 0.007), indicating that higher I. ricinus habitat suitability values correspond to an increased probability of Bbsl infection, with this relationship plateauing at higher suitability values (Fig. 2D). Bbsl prevalence exhibited a convex relationship with grass cover fraction (edf = 1.104; χ² = 3.445; p-value = 0.026). At lower grass cover values, an initial increase was associated with a decline in Bbsl prevalence, reaching a minimum, followed by a slight increase at higher grass cover values. However, this latter trend was characterised by greater uncertainty, likely due to the limited number of observations in areas with high grass cover (Fig. 2E).

The relative risk map (Fig. 2A) highlights significant low-risk areas (blue contours), which include the Bretagne (BRE) and Normandie (NOR) regions in northwestern France, and significant high-risk areas (red contours), concentrated in the Grand Est (GES), Bourgogne-Franche-Comté (BFC), and Centre-Val de Loire (CVL) regions in the east and centre of the country. Model predictive performance was low (mean AUC = 0.56 based on 100 repetitions of 90/10 cross-validation). Nevertheless, high-risk regions identified by the relative risk map correspond to areas with higher Bbsl prevalence predicted by the GAM model (Fig. 2B). Conversely, BRE and NOR exhibited lower predicted prevalence, consistently with the low-risk areas identified by the relative risk analysis. Additional regions, particularly in the southwest and Auvergne-Rhône-Alpes (ARA), also displayed elevated Bbsl prevalence according to the GAM model predictions (Fig. 2B). The associated uncertainty in prevalence predictions was highest in the Rhône Valley (southeastern France) and the Alpine regions (eastern France) where sample density was lower and environmental heterogeneity may be greater (Fig. 2C).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Model predictions for Borrelia burgdorferi sensu lato (Bbsl) prevalence in Ixodes ricinus tick across France. (A) Bbsl relative risk surface using the Bbsl presence and absence data. Tolerance contours represent significantly lower risk areas in blue and higher risk areas in red. (B) Predicted prevalence of Bbsl based on the M0 GAM, expressed as the predicted proportion of Bbsl infected ticks (range 0–1), with values ranging from low (dark blue) to high (yellow-green) prevalence. (C) Corresponding standard error of the GAM predictions, with lower uncertainty indicated in dark blue and higher uncertainty in yellow. (D,E) Each plot demonstrates the marginal effects of I. ricinus habitat suitability index (D) and grass cover fraction (E) on Bbsl predicted prevalence, with 95% confidence intervals shown in shaded areas while values for which the slope is significantly different from zero are highlighted in red. Black ticks along the x-axis represent observed values of the covariables.

Factors associated with Borrelia afzelii distribution

The results of the M1 GAM model assessing factors associated with B. afzelii distribution are presented in Fig. 3. This model focused on sites where a Borrelia genospecies was detected, with the presence probability of B. afzelii (conditional on overall Borrelia presence) as the response variable. Therefore, the model highlights determinants of the variability in relative incidence of B. afzelii among all Borrelia occurrences. Four covariates were identified as significant predictors: cattle density, I. ricinus habitat suitability index, rodent species richness, and grass cover fraction.

Cattle density was negatively associated with B. afzelii presence probability (edf = 0.753; χ² = 2.808; p-value = 0.035), although uncertainty increased at higher cattle densities (Fig. 3D). The relationship between B. afzelii presence and I. ricinus habitat suitability index was convex (edf = 2.697; χ² = 9.142; p-value = 0.009), with a decline in presence probability observed at lower suitability values, reaching a minimum around 0 on the x-axis, followed by an increase at higher suitability values. However, both trends were accompanied by high uncertainty (Fig. 3E). A concave relationship was observed with rodent species richness (edf = 1.788; χ² = 6.618; p-value = 0.015), peaking around a richness value of one, before slightly declining. Uncertainty was highest at both low and high richness values (Fig. 3F). Grass cover fraction also showed a concave relationship (edf = 1.041; χ² = 2.959; p-value = 0.041), with B. afzelii presence probability increasing to a maximum at intermediate grass cover values (around 2 on the x-axis), followed by a slight decline and higher uncertainty at greater grass cover fractions (Fig. 3G).

The relative risk map (Fig. 3A) revealed that areas of significant low (blue contours) and high (red contours) B. afzelii risk broadly overlapped with those identified for Bbsl. However, an additional high-risk area was detected in the northern part of the Occitanie (OCC) region. Model predictive performance was moderate (mean AUC = 0.62 based on 100 repetitions of 90/10 cross-validation), yet predictions from the GAM aligned with the relative risk surface, with the highest B. afzelii prevalence in northeastern regions, including Grand Est (GES), Bourgogne-Franche-Comté (BFC), and Centre-Val de Loire (CVL). High infection risk was also predicted in regions such as Nouvelle-Aquitaine and Auvergne-Rhône-Alpes (Fig. 3B). Model predictive performance was moderate, with the 100-iteration of 90/10 cross-validation yielding a mean of 0.62. Prediction uncertainty was greatest in the Rhône Valley (southeastern France), the Alps (eastern France), and the eastern part of the GES region, which also correspond to areas of high predicted prevalence (Fig. 3C).

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Model predictions for Borrelia afzelii prevalence in Ixodes ricinus tick across France. (A) B. afzelii relative risk surface using the B. afzelii presence and absence data. Tolerance contours represent significantly lower risk areas in blue and higher risk areas in red. (B) Predicted prevalence of B. afzelii based on the product of the predicted general Borrelia burgdorferi sensu lato (Bbsl) presence (M0) with the B. afzelii relative probability knowing presence (M1). Prevalence values are expressed as the predicted proportion of B. afzelii infected ticks (range 0–1), ranging from low (dark blue) to high (yellow-green) prevalence. (C) Corresponding standard error of the predicted prevalence of B. afzelii based on the product of M0 and M1, with lower uncertainty indicated in dark blue and higher uncertainty in yellow. (DG) Each plot demonstrates the marginal effects of the model on the relative probability of having B. afzelii knowing Bbsl presence (M1), with Cattle density (D), I. ricinus habitat suitability index (E), indices of rodent species richness (F) and grass cover fraction (G). 95% confidence intervals are shown in shaded areas while values for which the slope is significantly different from zero are highlighted in red. Black ticks along the x-axis represent observed values of the covariables.

Factors associated with Borrelia garinii distribution

The predicted prevalence of B. afzelii and B. garinii was calculated by the product of the predicted general Bbsl presence (M0) with the species-specific relative probability (M1 or M2 depending on the genospecies).

The results of the M2 GAM assessing factors associated with B. garinii distribution are presented in Fig. 4. This model, again restricted to locations where Borrelia was detected, used the presence probability of B. garinii (conditional on Borrelia presence) as the response variable. Therefore, the model highlights determinants of the variability in relative incidence of B. garinii among all Borrelia occurrences. Two covariates were identified as significant predictors: rodent species richness and Turdidae abundance.

Rodent species richness showed a significant negative association with B. garinii presence probability (edf = 0.898; χ² = 8.573; p-value = 0.002), with increased uncertainty at lower values along the x-axis (Fig. 4D). In contrast, Turdidae abundance had a significant positive effect on B. garinii presence probability (edf = 0.714; χ² = 2.422; p-value = 0.049), although uncertainty increased markedly at higher abundance values (Fig. 4E).

The relative risk map of B. garinii (Fig. 4A) revealed localised high-risk areas (red contours) in the western part of the Bretagne (BRE) region and the western part of the Centre-Val de Loire (CVL) region. Low-risk areas (blue contours) were concentrated in the southwestern regions, including Occitanie and Nouvelle-Aquitaine (NAQ) (Fig. 4A). Model predictive performance was low (mean AUC = 0.58 based on 100 repetitions of 90/10 cross-validation), yet GAM model predictions (Fig. 4B) were consistent with the relative risk map, showing the highest B. garinii infection probabilities in the identified high-risk areas. Additional suitable areas for B. garinii were predicted in northwestern France, particularly in Bretagne (BRE), Normandie (NOR), and Hauts-de-France (HDF), as well as in the northern parts of NAQ and CVL. Uncertainty associated with the infection probability predictions was generally low but increased in areas with the highest predicted infection probability, notably in the high-risk zones highlighted on the map (Fig. 4C).

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Model predictions for Borrelia garinii prevalence in Ixodes ricinus tick across France. (A) B. garinii relative risk surface using the B. garinii presence and absence data. Tolerance contours represent significantly lower risk areas in blue and higher risk areas in red. (B) Predicted prevalence of B. garinii based on the product of the predicted general Borrelia burgdorferi sensu lato (Bbsl) presence (M0) with the B. garinii relative probability knowing presence (M2). Prevalence values are expressed as the predicted proportion of B. garinii ticks (range 0–1), ranging from low (dark blue) to high (yellow-green). (C) Corresponding standard error of the predicted prevalence of B. garinii based on the product of M0 and M2, with lower uncertainty indicated in dark blue and higher uncertainty in yellow. (DF) Each plot shows the marginal effects of the model on the relative probability of having B. garinii knowing Bbsl presence (M2) with indices of rodent species richness (D), Turdidae abundance (E) on the probability of having B. garinii knowing Bbsl presence. 95% confidence intervals are shown in shaded areas while values for which the slope is significantly different from zero are highlighted in red. Black ticks along the x-axis represent observed values of the covariables.

Discussion

Assessing spatial risk of Lyme borreliosis is challenging, as localized surveys lack coverage and broad incidence data miss ecological drivers. By leveraging CiTIQUE citizen-science data, our study provides the first geographically explicit, large-scale view of Borrelia burgdorferi sensu lato distribution in France, linking infection risk with key environmental, ecological, and anthropogenic factors.

B. afzelii and B. garinii were the most prevalent genospecies in the biting ticks which is consistent with the dominant genospecies and their relative frequencies reported across European countries, primarily based on questing tick studies5,26. However, the pathogen detection method only identified the dominant genospecies in co-infected ticks, potentially underestimating the prevalence of other species. For example, if B. afzelii dominates in co-infected ticks, it may have been disproportionately reported in areas with high tick abundance and higher frequencies of co-infection.

Previous work at the European scale7, using a rough resolution ( around 28 km × 28 km per cell) reported high prevalence of B. afzelii across France, particularly in eastern Auvergne-Rhône-Alpes. In contrast, our analyses at a finer resolution (5 km x5km per cells) revealed a more heterogeneous pattern, with higher prevalence in eastern and central regions (Grand Est, Bourgogne-Franche-Comté, Centre-Val de Loire) and lower rates in western, northern, and southern regions (Bretagne, Normandie, Occitanie). For B. garinii, they found very high prevalence in these western, southern and northern regions, whereas our analyses revealed a more even distribution nationwide, while still confirming higher prevalence in the western and northern part.

Our GAMs highlighted distinct environmental and ecological factors associated with the distribution of Bbsl and its two main genospecies, B. afzelii and B. garinii. At the overall Bbsl level, infection probability was positively associated with the I. ricinus habitat suitability index, a composite indicator derived from a multi-criteria decision analysis integrating climate, altitude, land cover and wild ungulates density, which are factors known to influence tick abundance21. This result suggests that, in areas where human exposure occurs, favourable environments for ticks also promote higher Bbsl circulation and prevalence. This finding aligns with theoretical models predicting a positive, yet non-linear, relationship between tick density and pathogen prevalence, modulated by host community composition, including the presence of non-competent host for Bbsl transmission27,28.

Observational studies in comparable ecological settings are consistent with these results. In Belgium, tick density was identified as a key driver of Bbsl-infected nymph density, with limited evidence for dilution effects from non-competent hosts29. Similarly, in the Netherlands, increased ungulate abundance led to higher tick density and a non-linear, accelerating rise in the density of Bbsl-infected nymphs, without clear evidence for dilution effects attributable to ungulates30. Long-term monitoring in southern England further showed that habitats favourable to ticks, such as structurally diverse forests and woodland edges, sustain higher densities of both questing and Bbsl-infected I. ricinus nymphs31.

At the genospecies level, habitat and host associations differed, reflecting the distinct ecological characteristics of B. afzelii and B. garinii6,32,33. Borrelia garinii infection probability was positively associated with Turdidae bird abundance, consistent with its known reliance on avian hosts34,35, and negatively associated with rodent species richness, possibly reflecting a dilution effect from these non-specific host30. Conversely, B. afzelii infection probability was positively associated with rodent richness, consistent with the role of small mammals as primary reservoirs for this genospecies36. Rodent richness was also positively associated with the presence of Bbsl, likely reflecting both the dominance of B. afzelii in our dataset and the association of several minor Bbsl genospecies, as B. burgdorferi sensu stricto, with small mammal hosts37. However, these results on B. afzelii could be either because B. afzelii is very dominant especially in the core areas of Borrelia presence or it could be a consequence of a method bias toward B. afzelii in co infected ticks, linking its presence with areas of high prevalence.

Grass cover exhibited a non-linear association with Bbsl and B. afzelii infection probabilities, with a negative trend across most observed values. Elevated risk was observed at low grass cover levels, characteristic of forest, fragmented woodlands, and associated ecotones, including adjacent private gardens, which are all known to favour both tick and Borrelia presence38. In contrast, extensive grass cover, typical of open meadows or pastures, was associated with lower infection probabilities, likely due to reduced tick survival in open, hot, and dry environments, and a lower density of competent hosts39,40.

Our modelling approach prioritised interpretability using a parsimonious set of covariates, selected to limit collinearity and capture the main ecological drivers, while accounting for the current sampling structure. Although additional covariates, such as vegetation indices7, forest structural complexity41 or soil properties42, have been highlighted in previous studies, our sampling resolution likely limits the reliable detection of such fine-scale environmental effects. This limitation is reflected in the modest predictive performance of the models, as indicated by the low AUC values.

Sampling constraints also restricted our capacity to investigate several relevant aspects of Bbsl infection risk, including variation across tick developmental stages, seasonal and inter-annual dynamics, as well as prevalence across specific environmental contexts. Despite efforts to mitigate sampling bias, by analysing a minimum of 150 ticks per region, a substantial proportion of the analysed samples originated from densely populated areas, leaving gaps in remote and sparsely populated regions in France. A further challenge in estimating Lyme disease risk is the limited understanding of the components leading host seeking infected Bbsl tick to human populations43. In this regard, our analysis on human-biting ticks provides valuable insights44. However, a more precise assessment of risk would require comparison with prevalence in questing ticks, to disentangle ecological hazard from human exposure. Finally, integrating CiTIQUE tick-bite reports could provide complementary information on the socio-behavioral and ecological factors shaping human exposure thereby improving risk mapping45.

Despite these limitations, our study demonstrates the substantial potential of citizen-generated data for monitoring tick-borne pathogen distribution at a national scale. The ticks analysed here represent approximately 40% of all human-biting ticks submitted to CiTIQUE along with a tick-bite report during the study period. Since its inception, the CiTIQUE programme has grown considerably, with over 60,000 human and animal-biting ticks currently stored in the national tick bank. This continuously expanding resource offers unique opportunities to refine surveillance and research on tick-borne diseases in France.

As data and biting ticks continues to be collected, targeted sub-sampling strategies could be implemented to improve spatial representativeness, assess temporal dynamics, compare prevalence across tick developmental stages, and address refined ecological questions regarding Bbsl infection risk. Such approaches will contribute to a more detailed understanding of the eco-epidemiology of Bbsl genospecies and other tick-borne pathogens in human-biting ticks.

In the context of global environmental change and potential shifts in ticks and tick-borne pathogen distributions, the continued development of CiTIQUE provides a scalable, adaptable, citizen-driven tool to support large scale surveillance, environmental risk assessment, and public health preparedness.

Materials and methods

Tick acquisition

Ticks were collected through the CiTIQUE citizen science programme (www.citique.fr), launched in July 2017 as a collaboration between INRAE (French National Research Institute for Agriculture, Food and Environment), the Laboratory of Excellence ARBRE, Anses (French Agency for Food, Environmental and Occupational Health & Safety), and the CPIE network (Permanent Centre for Environmental Initiatives). This programme aims to improve the understanding of the ecology of ticks and the tick-borne diseases they transmit, particularly Lyme Borreliosis, in order to support prevention efforts based on monitoring.

Citizens can participate in the CiTIQUE programme choosing among various levels of involvement, ranging from promoting the programme to actively contributing to research through activities in the open lab “Tous Chercheurs”, or by reporting tick bites and collecting and sending biting ticks to the tick bank maintained by the national programme. Participants can report tick bites via a website, a mobile application or by using a paper form, providing the date and GPS location of the bite, along with basic personal information (age, sex, activity at the time of the bite, tick localisation on the body) and the ecological characteristics of the place where the tick bite occurred (e.g., forest, garden, meadow…). When a tick was collected, citizens were instructed to enclose it in a piece of kitchen roll and tape it to a sheet of paper before sending it by courier. Upon reception, ticks were conserved in a freezer before identification and linked to their respective tick-bite reports (https://www.citique.fr/signaler-une-piqure/). Between 2017 and 2020, more than 17,000 human tick-bite reports were submitted, with over 4,500 of these associated with one or more tick specimens.

Pathogen identifications

A total of 2009 ticks, collected between 2017 and 2019 (inclusive), were randomly selected for DNA extraction, with the objective of including at least 150 human-biting ticks for each French NUTS-1 regions (Nomenclature of Territorial Units for Statistics - major socio-economic regions) except for Provence-Alpes-Côte d’Azur (PAC) with 59 records. From this dataset, 1 891 Ixodes ricinus ticks were retained for the modelling analyses presented in this study, including 221 adults, 1324 nymphs and 110 larvae, with 236 undetermined specimens. Details on the overall sample, selection procedure, and dataset structure can be found in Durand et al.25.

The tick stages and species were identified morphologically using identification keys by Pérez-Eid46 and Estrada-Pena et al.47. Species identification was confirmed using specific primers and probes on the microfluidic real-time PCR assay described below. Tick DNA was extracted and screened for pathogens using the method described in Melis et al.48. Briefly, tick DNA was extracted using NucleoSpin® Tissue kit (Macherey-Nagel, Germany), following manufacturer’s instructions. Then, all samples underwent a pre-amplification step by PCR with the Preamp Master Mix (Standard BioTools, USA). High-throughput microfluidic real-time PCR was then performed on a BioMark™ real-time PCR system (Standard BioTools, USA), using the 48.48 Dynamic Array™ (Standard BioTools, USA) to detect pathogens and identify ticks. Only results for the different Bbsl genospecies are presented and used in this paper: Borrelia afzelii, Borrelia garinii, Borrelia burgdorferi sensu stricto (s.s.), Borrelia valaisiana, Borrelia spielmanii, Borrelia bissettii, Borrelia lusitaniae. This method cannot detect coinfections between different Borrelia genospecies. The sequences of the primers and probes are provided in Michelet et al.49.

Covariates acquisition

To explain Bbsl spatial distribution, we extracted covariates related to the density of host-seeking I. ricinus, pathogen occurrence and persistence, and human exposure. Climatic variables were included for their effect on tick habitat suitability50, along with habitat suitability indices for I. ricinus and host-related variables. Non-competent hosts (e.g., roe deer) were also considered for their role as tick amplifiers51. Because pathogens were identified in human-biting ticks, we further included variables reflecting human exposure, such as population density and human pressure.

In total, 103 covariates were extracted at a 5-km grid resolution across continental France (Cf. Supplementary Table 1). To reduce collinearity and overfitting, covariates were first grouped into seven categories: bioclimatic, land cover/soil, human pressure, and host-related (deer, birds, rodents, species richness). Then, within each category, hierarchical clustering on principal components (HCPC) was applied, with clusters defined by the largest relative loss of inertia. The most relevant covariates within each cluster were selected based on literature and prior hypotheses (Cf. Table 2).

For each tick, selected covariate values were extracted within a 5 km radius around the GPS coordinates to account for environmental heterogeneity and geolocation imprecision, and the weighted median, accounting for the proportion of each cell covered by the buffer, was retained (Cf. Table 2). All covariates were centred and scaled before analysis. To further limit collinearity between selected covariates, variance inflation factors (VIFs) were assessed, using a cut-off value of 5, to remove collinear covariates (car package, version 3.1.3;52.

Table 2 Covariates selected for the models with their descriptions and associated hypotheses on their effects on Lyme borreliosis occurrence.

Modelling

We used Generalised Additive Models (GAMs) to investigate the variation in Bbsl and Bbsl genospecies distribution, used as the response variable, in relation to the selected set of continuous explanatory covariates (see Table 1), while accounting for the spatial distribution of the observations. The general structure of the models was:

$${y}_{i}\text{~Binom}\left({p}_{i}\right)\text{\:where\:}logit\left({p}_{i}\right)={\beta}_{0}+{f}_{n}\left({x}_{n,i}\right)+te\left(\text{l}{ong}_{i},{lat}_{i}\right)$$

where \({f}_{n}\) are spline functions applied to the explanatory covariates \({x}_{n,i}\), and \(te\) is a bivariate tensor product function accounting for the spatial structure based on the longitude and latitude.

Given the large number of 23 explanatory covariates, we applied a double penalty approach to control smoothness of the model terms (i.e., curves \({f}_{n}\) and spatial surfaces \(te\left(\text{long},\text{lat}\right)\), and perform covariate selection92. The first penalty controlled the complexity of smooth functions to prevent overfitting by ensuring that relationships remained smooth and interpretable. The second penalty shrunk entire uninformative smooth functions towards zero and effectively removing them from the model. The degree of smoothing was selected using the restricted maximum likelihood (REML) method, and the number of basis functions for smooth terms was limited to the default of 10 simple terms to avoid overfitting.

Three GAM specifications were fitted. The first model (M0) investigated factors associated with the presence of Bbsl across all observations, encompassing both presence (\({y}_{i}=1\)) and absence (\({y}_{i}=0\)) of Bbsl genospecies. In a second step, we focused the modelling process to locations where a Borrelia genospecies was detected, to identify factors associated with the infection probability of specific genospecies. Only B. afzelii and B. garinii had sufficient data for the whole territory, as they were the most frequent genospecies. Therefore, two separate GAMs were constructed (M1 for B. afzelii and M2 for B. garinii), using the subset of 285 individuals where a Bbsl genospecies was present. In these models, infection with B. garinii or B. afzelii (depending on the model) was coded as presence (\({y}_{i}=1\)), while infection with another Bbsl genospecies was coded as absence (\({y}_{i}=0\)).

The predicted prevalence of B. afzelii and B. garinii was calculated by the product of the predicted general Bbsl presence (M0) with the species-specific relative probability (M1 or M2 depending on the genospecies).

Model fitting was performed using the mgcv package (v1.9.1;93). All final models were checked for residuals validity using the Dharma package. (v0.4.7 ;94).

In addition to GAMs, we used spatial relative risk analysis to identify areas of elevated risk (i.e., “hot spots”) where tick infection by Bbsl and its genospecies was higher than expected, while accounting for underlying tick distribution. Spatial relative risk was estimated using the sparr package in R (v2.3.15;95). This method estimates the relative density of pathogen presence versus absence points. To account for our spatial sampling, we applied an adaptive bandwidth, allowing finer resolution in densely sampled regions and smoother estimates in sparsely sampled areas. The relative risk surface was estimated asymmetrically, meaning that different bandwidths were applied to the case and control point distributions, as recommended by Davies and Hazelton96. Edge effects were corrected using the default settings described by Diggle97. Statistical significance of elevated and diminished risk areas was assessed using asymptotic tolerance contours based on p-values, generated via the Monte Carlo method with 1,000 simulations.

All analyses were performed in R version 4.3.298.