Background & Summary

The West Nile virus (WNV) is an arbovirus belonging to the Flaviviridae family, first isolated in 1937 in Uganda, specifically in the West Nile district1. It is known to cause West Nile fever, an infectious disease that can present asymptomatically or with variable symptoms, which can lead to severe forms involving the central nervous system1. Transmission of the virus occurs primarily through the bite of infected mosquitoes, particularly the Culex species, generally known as the common house mosquito. These mosquitoes become infected by feeding on viremic birds, which act as reservoirs for the virus.

At the same time, humans, horses, and other mammals are occasional dead-end hosts, typically developing low and transitory viremia which is not considered to be sufficient to infect competent mosquito species, and are therefore not thought to play a role in the subsequent spread of the virus.

Although human-to-human transmission does not occur through direct contact, it is possible through organ transplants, blood transfusions, and during pregnancy2. West Nile virus infection in humans is asymptomatic in about 80% of the cases. About 20% of the infected individuals experience mild flu-like symptoms, such as fever, headache, fatigue, muscle aches, nausea, and vomiting. In rare cases, symptoms can progress to more severe forms, such as encephalitis or meningitis, occurring in approximately 1 in 150 cases. Severe symptoms are more common among elderly adults or individuals with compromised immune systems3.

In Italy, West Nile virus (WNV) was first detected in 1998 in horses in the Tuscany region, with no reported clinical cases in humans4. In 2001, a multi-species surveillance plan was implemented, incorporating monitoring of wild bird mortality, mosquito collection, and repeated testing of sentinel animals to detect potential introduction or circulation of WNV and to track the spread of the infection.

A decade later, in 2008, a widespread WNV outbreak occurred in northern Italy, particularly across the Po Valley—the country’s largest plain—affecting the Emilia-Romagna, Veneto, and Lombardy regions5. Since 2008, WNV has exhibited persistent and intensified circulation in various areas of Italy6,7,8.

Culex pipiens was identified as the primary vector comprising 81.4% of the analyzed specimens9.

Since 2016, an integrated approach has been adopted, coordinating veterinary and human surveillance activities under a unified national plan (One Health Surveillance)10. Surveillance efforts on animals (birds and poultry) and mosquito are mainly aimed at enabling early detection of viral circulation6,11,12. Once WNV is identified, specific measures are implemented at the provincial level, including blood and organ safety protocols and mosquito control interventions. The surveillance plan is reviewed annually based on observed changes in the geographical distribution and circulation of WNV10,13.

Some studies in the literature have analyzed the spread of the virus on the Italian territory using artificial intelligence techniques. Calderolo et al.14 developed a machine learning model to predict human positive cases of WNV based on satellite data. Another similar study by Bonicelli et al.15, which used graph neural networks and Earth observation data, aimed to predict the positivity, in birds and mosquitoes, of a pixel. Marco Mingione et al. have also provided a dataset containing all the causes of WNV cases recorded and published in the annual reports between 2012 and 202416,17.

In Italy, apart from the seasonal reports of the ISS10 and the dataset compiled by Mingione et al., there is no archive of epidemiological information on WNV. Although these studies and the previously cited papers provide or directly use the number of positive WNV cases, standardised indices are instead of particular interest for epidemiological studies and could provide additional information to better understand the spread of the virus.

In our study, we present a 13-year data set (2012-2024) containing standardized incidence rates (SIR) for West Nile virus at the level of Italian provinces. Throughout, “provinces” refer to NUTS-3 administrative units (ISTAT denominations). This indicator, which is frequently used in epidemiological studies18, allows a direct comparison of the incidence rate of a province with the chosen reference standard, defined as the set of provinces recording at least one human case in that year.

Our goal was to provide a valuable tool for local and national stakeholders and researchers, enabling them not only to monitor the evolution of the virus but also to understand the potential causes of its spread. Figure 1 shows the SIR values calculated for the years 2012 and 2023 at the Italian provincial level. This comparison illustrates how the number of provinces affected by WNV cases has increased over time. The SIR dataset can be found on the Dryad public data repository and all the source data19.

Fig. 1
Fig. 1
Full size image

Geographical distribution of Standardized Incidence Ratio for WNV for year 2012 (panel A) and 2023 (panel B) at provincial Italian level.

Methods

Data Source

For the calculation of the SIR, we utilized several public datasets provided by different sources. Marco Mingione et al. made the data on positive cases directly available for download on their GitHub page17. We define positive cases as laboratory-confirmed human WNV cases meeting EU criteria, including neuroinvasive disease and non-neuroinvasive fever; asymptomatic blood-donor detections and imported cases are excluded16,20,21. Detection practices evolved across years and regions (e.g., testing intensity for WNF, donor screening uptake); therefore, the SIR standardizes for age composition but remains sensitive to heterogeneity in case ascertainment.For population data (provincial, single year of age) we used the official ISTAT endpoints following the I.Stat  → IstatData migration: 2019–2024 from the IstatData DataBrowser (dataset DCIS_POPRES1) and 2012–2018 from the legacy interface, harmonizing NUTS-3 identifiers and age strata across sources22,23.

ISTAT is a public organization that provides statistical information about the Italian territory and population.

Table 1 describes all the variables used for the calculation of the SIR.

Table 1 List of variables, with respective definitions and data sources, used to compute Standardized Incidence Ratio.

Observed number of positives (Op, y)

We define Op, y as the annual count of laboratory-confirmed human West Nile virus infections assigned to the province (p) of exposure and stratified by age classes, as reported in the national ISS surveillance bulletins in year y. In line with the EU surveillance case definition (Decision (EU) 2018/945) and its adoption in Italy, confirmed cases meet clinical and laboratory criteria20. For the purpose of incidence estimation, we include clinical presentations classified as neuroinvasive disease and non-neuroinvasive fever, while we exclude asymptomatic detections in blood donors and imported cases. Weekly bulletin counts are aggregated to yearly totals by province and age group before computing the SIR. Human case data are taken from the curated, bulletin-derived open dataset by Mingione et al., which standardizes provincial identifiers and age strata and provides the per-year CSV files used here16. The ISS bulletins and the integrated national plan describe surveillance flows, case classifications, and reporting by province of exposure21,24.

The data on human positive cases for WNV can be downloaded from the GitHub repository (https://github.com/fbranda/west-nile)17. The file “latest-wnv.csv” contains all positive WNV cases, including humans, horses, mosquitoes, and birds. The data are provided at the provincial level and cover the period 2012–2024. We aggregated the data by age group and symptoms to avoid information loss. The final dataset is available on the Dryad platform.

Age-specific number of  positive cases of reference population (Pi, y)

Using the surveillance dataset curated by Mingione et al.16, it has been possible to estimate the number of WNV-positive cases in the reference population, stratified by age groups. Specifically, the data are divided into the following five age groups: “<=14, 15–44, 45–64, 65–74, >=75.” We define 𝑖 as the age-group index. The reference population, as detailed in the section “Standardized Incidence Ratios Computation,” consists of all provinces that tested positive in the reference year.

Resident population at provincial (ni, p, y) and national levels (Ni, y)

The total resident population at provincial level (ni, p, y) was retrieved from the ISTAT data warehouse. Following the recent migration from I.Stat to the new IstatData platform, provincial resident-population series by single year of age are split across two official endpoints. For years 2019–2024 we retrieved data from the IstatData DataBrowser (dataset DCIS_POPRES1). For years 2012–2018 we used the legacy ISTAT interface providing annual population estimates by single year of age. We harmonized province identifiers (NUTS-3), age strata, and field names across sources and verified year-to-year continuity prior to computing SIR22,23.

The reconstruction process takes into account demographic flows such as births, deaths, migration and acquisition of citizenship. The data are stratified by age, with a resolution of 1 year, from 0 to over 100 years, and additionally grouped by provinces. The population values at national level (Ni, y) were then calculated directly by an aggregation process.

Standardized Incidence Ratios computation (SIR)

In this study, we focused on calculating the Standardized Incidence Ratio (SIR) for WNV cases in humans. This index compares the number of recorded positive cases, in our case at the Italian provincial level, with the expected number of positive cases in the reference population, in our analysis the subset of Italian provinces reporting at least one case in each given year (2012–2024).

The SIR eliminates the effect of population size in a province, as it is standardized by the population of both the province under examination and the reference population25. This standardization enables comparisons between provinces with different population sizes. Consequently, the SIR facilitates epidemiological analyses while minimizing potential biases associated with different study areas. In this framework, the SIR is a relative, age-adjusted metric: SIR  > 1 indicates more cases than expected given the province’s age structure (excess risk relative to the reference), SIR  = 1 indicates parity with expectation, and SIR  < 1 indicates fewer cases than expected. The SIR complements—rather than replaces—crude incidence per 100,000, especially when comparing small areas or sparse counts. Its use and interpretation are well established in disease mapping and public-health surveillance26,27.

A critical aspect of calculating the SIR is the choice of the reference population. In the case of WNV, many provinces report no positive cases. This could introduce bias in the distribution of SIR values, leading them to all exceed 1, thereby failing to provide meaningful information.

To solve this problem, we included only those provinces in the reference population that reported at least one positive case in the year in question. This approach ensured that the resulting SIR values were between less than 1 and more than 1, thus providing the desired insight.

The SIR value also considers the possible effects of age and gender within the analyzed population. Advancing age substantially increases the risk of WNV neuroinvasive disease and the likelihood of clinical detection and reporting; therefore, age strongly influences observed case counts and motivates age standardization in our analyses28,29. In some cases, age-specific information was not available in the downloaded dataset. In order to avoid underestimating the number of positives, we standardized these data using the average population across all age groups.

We calculated the SIR for each available year between 2012 and 2024. The mathematical expression of the SIR is as follows:

$$SI{R}_{p,y}=\frac{{O}_{p,y}}{{E}_{p,y}}$$
(1)

where Op, y and Ep, y are the observed and the expected number of recorded cases for province p and year y, respectively. Ep, y is defined as:

$${E}_{p,y}=\mathop{\sum }\limits_{i=1}^{5}{R}_{i,y}^{P}\ast {n}_{i,p,y}$$
(2)

in which \({R}_{i,y}^{P}\) is the age-specific incidence rates of the reference population and ni, p, y is the age-specific population size for the given locality. In particular \({R}_{i,y}^{P}\) is calculated using the formula:

$${R}_{i,y}^{P}=\frac{{P}_{i,y}}{{N}_{i,y}}$$
(3)

\({R}_{i,y}^{P}\) is obtained dividing the number of positive cases by age of the reference population Pi, y with the age-specific reference population size Ni, y.

Data Records

The dataset of Standardized Incidence Ratio for West Nile Virus is available on Dryad platform19. The dataset includes SIR values for the years 2012–2024 at the provincial level. Provinces where no positive cases were ever recorded (in the considered time interval) are excluded. The root directory in Dryad, named “DryadWNV”, contains a .csv file called “sir_tot.csv” and a subfolder named “Auxiliary_data”, which includes all datasets used to compute the SIR index. Specifically, this folder contains an Excel file, “Ni.xlsx”, and 13 .csv files named “wn-ita-provinces-human-surveillance-20***.csv”, one for each year from 2012 to 2024. Both file types include a “Province” column, indicating the corresponding province, and an “Age” column, specifying the age group. The Excel file provides annual population values by age group at the provincial level (with each year as a separate column), while the surveillance .csv files report the number of newly registered cases per province and age group for each year.

The “sir_tot.csv” file compiles the calculated SIR (Standardized Incidence Ratio) values for each province and each year. Furthermore, the columns “mean”, “min,” “max,” and “std” represent the mean, minimum, maximum, and standard deviation of the SIR, respectively, excluding missing values. We also included the “Gradient” feature, calculated in the paragraph “Slope Analysis”, for all the provinces. This feature provides information about the trend of the SIR values over the considered years. Finally, a “README” file that provide further details about the data structure is included.

Technical Validation

To evaluate the informational content of the calculated index, we compared SIR values with the number of positive cases in humans, mosquitoes, and birds, as shown in Table 2. These data are available from the cited source17. Specifically, we employed Pearson’s correlation coefficient (R) and its associated p-value, repeating the analysis for each year, as presented in the table. As expected, we found a strong positive correlation between SIR values and the number of positive human cases. Although the strength of this correlation varies, the results are statistically significant. This pattern was not observed when analyzing mosquitoes and birds. Indeed, prior to 2017, positive cases in animals were not recorded30. For subsequent years, the results show that the R values calculated for other variables are not statistically significant, especially in birds. However, an interesting trend emerges when analyzing the number of positive mosquitoes in recent years. Starting in 2022, the calculated correlation values are strongly positive and statistically significant. This may indicate an increase in positive cases among mosquitoes, potentially driven by rising temperatures31, which in turn could lead to an increase in human cases. Beyond temperature, heterogeneity in entomological sampling intensity (trap density, temporal coverage), laboratory diagnostic protocols, and circulating WNV lineages may influence mosquito-positivity indicators and their comparability across years and areas.

Table 2 Table of Pearson correlation and relative p-value between SIR values and positive number of human, mosquitoes and birds for each year.

Finally, when considering all years together, the results become statistically more reliable. The correlation between SIR and human cases is 0.50, highlighting that while SIR is correlated with the number of positive cases, it contains distinct information. SIR contributes information distinct from raw counts by adjusting for provincial age structure; together with the slope ("Gradient”) and dispersion metrics (Mean, Sd), it facilitates cross-province comparability and temporal trend assessment. The correlation with positive mosquito cases is 0.32, the highest among the other variables, indicating a relationship between infected mosquito populations and high SIR values. The correlation with positive bird cases is less evident but still statistically valid.

Slope Analysis

After calculating the SIR values, we analyzed their temporal trends. Specifically, for each province, we computed a linear regression line using the SIR values over the study period (2012–2024). From the regression line, we obtained the slope coefficient, which was used to generate the Fig. 2. Missing values were replaced with 0.

Fig. 2
Fig. 2
Full size image

Trend of SIR values over the analyzed time period (2012-2024) for each Italian province, obtained using the slope of the line obtained through linear regression. Missing values of a province are substituted with 0, if the province has recorded at least one positive case.

As shown in the figure, most trends are positive, except for the provinces of Matera, Gorizia, Rimini, Livorno, and Pavia, which exhibit a slightly negative coefficient (mean: -0.018  ± 0.001), with an overall mean trend of 0.05  ± 0.07. Among the provinces with increasing SIR values, Padua (0.404), Lodi (0.237), and Modena (0.224) stand out. Further details are provided directly in the dataset.

These results demonstrate that WNV cases are increasing over time in Italy, underscoring the importance of studying the potential causes of this trend.

Usage Notes

This paper presents a database on incidence rates (in the form of Standardized Incidence Ratios, SIR) for WNV in Italy at provincial level between 2012 and 2024. The dataset is open to public use without limitation. The permanent storage is on Dryad19. The archive contains two components for analysis. For most applications, use sir_tot.csv, which provides province-level SIRs for human West Nile virus for 2012–2024 alongside simple across-year summaries (Gradient, Mean, Sd, Max, Min). Users who wish to verify or extend the table can consult Auxiliary_data/, which stores the inputs used to assemble the dataset: Ni.xlsx (year-specific sheets with provincial resident population by single year of age) and the per-year surveillance files wn-ita-provinces-human-surveillance-20YY.csv. Reading these files requires only standard CSV/Excel import routines. No special preprocessing is required beyond ensuring consistent province identifiers; after loading, analysts can filter years of interest, compute additional summaries, or merge with covariates for modeling or visualization.