Abstract
West Nile Virus (WNV) is transmitted by mosquitoes that feed on viremic birds. Humans, horses, and other mammals act as dead-end hosts. It is responsible for the West Nile fever which can cause, in severe cases, neuroinvasive disease in humans. First detected in Italy in 1998 in horses, the virus was initially absent in humans. After a multispecies surveillance plan was launched in 2001, a large-scale outbreak occurred in northern Italy in 2008. Since then, WNV circulation has intensified, with Culex pipiens identified as the main vector. This study provides a dataset from 2012 to 2024, including standardized incidence ratio (SIR) values for human WNV cases at the NUTS-3 (provincial) level in Italy, aiming at supporting local and national stakeholders in monitoring the virus distribution. The results show a significant correlation between mosquito positivity and high SIR values, suggesting that mosquito populations strongly influence the incidence of WNV. These results show an increasing trend in WNV cases in Italy, underlining the need to investigate the factors favouring this increase.
Similar content being viewed by others
Background & Summary
The West Nile virus (WNV) is an arbovirus belonging to the Flaviviridae family, first isolated in 1937 in Uganda, specifically in the West Nile district1. It is known to cause West Nile fever, an infectious disease that can present asymptomatically or with variable symptoms, which can lead to severe forms involving the central nervous system1. Transmission of the virus occurs primarily through the bite of infected mosquitoes, particularly the Culex species, generally known as the common house mosquito. These mosquitoes become infected by feeding on viremic birds, which act as reservoirs for the virus.
At the same time, humans, horses, and other mammals are occasional dead-end hosts, typically developing low and transitory viremia which is not considered to be sufficient to infect competent mosquito species, and are therefore not thought to play a role in the subsequent spread of the virus.
Although human-to-human transmission does not occur through direct contact, it is possible through organ transplants, blood transfusions, and during pregnancy2. West Nile virus infection in humans is asymptomatic in about 80% of the cases. About 20% of the infected individuals experience mild flu-like symptoms, such as fever, headache, fatigue, muscle aches, nausea, and vomiting. In rare cases, symptoms can progress to more severe forms, such as encephalitis or meningitis, occurring in approximately 1 in 150 cases. Severe symptoms are more common among elderly adults or individuals with compromised immune systems3.
In Italy, West Nile virus (WNV) was first detected in 1998 in horses in the Tuscany region, with no reported clinical cases in humans4. In 2001, a multi-species surveillance plan was implemented, incorporating monitoring of wild bird mortality, mosquito collection, and repeated testing of sentinel animals to detect potential introduction or circulation of WNV and to track the spread of the infection.
A decade later, in 2008, a widespread WNV outbreak occurred in northern Italy, particularly across the Po Valley—the country’s largest plain—affecting the Emilia-Romagna, Veneto, and Lombardy regions5. Since 2008, WNV has exhibited persistent and intensified circulation in various areas of Italy6,7,8.
Culex pipiens was identified as the primary vector comprising 81.4% of the analyzed specimens9.
Since 2016, an integrated approach has been adopted, coordinating veterinary and human surveillance activities under a unified national plan (One Health Surveillance)10. Surveillance efforts on animals (birds and poultry) and mosquito are mainly aimed at enabling early detection of viral circulation6,11,12. Once WNV is identified, specific measures are implemented at the provincial level, including blood and organ safety protocols and mosquito control interventions. The surveillance plan is reviewed annually based on observed changes in the geographical distribution and circulation of WNV10,13.
Some studies in the literature have analyzed the spread of the virus on the Italian territory using artificial intelligence techniques. Calderolo et al.14 developed a machine learning model to predict human positive cases of WNV based on satellite data. Another similar study by Bonicelli et al.15, which used graph neural networks and Earth observation data, aimed to predict the positivity, in birds and mosquitoes, of a pixel. Marco Mingione et al. have also provided a dataset containing all the causes of WNV cases recorded and published in the annual reports between 2012 and 202416,17.
In Italy, apart from the seasonal reports of the ISS10 and the dataset compiled by Mingione et al., there is no archive of epidemiological information on WNV. Although these studies and the previously cited papers provide or directly use the number of positive WNV cases, standardised indices are instead of particular interest for epidemiological studies and could provide additional information to better understand the spread of the virus.
In our study, we present a 13-year data set (2012-2024) containing standardized incidence rates (SIR) for West Nile virus at the level of Italian provinces. Throughout, “provinces” refer to NUTS-3 administrative units (ISTAT denominations). This indicator, which is frequently used in epidemiological studies18, allows a direct comparison of the incidence rate of a province with the chosen reference standard, defined as the set of provinces recording at least one human case in that year.
Our goal was to provide a valuable tool for local and national stakeholders and researchers, enabling them not only to monitor the evolution of the virus but also to understand the potential causes of its spread. Figure 1 shows the SIR values calculated for the years 2012 and 2023 at the Italian provincial level. This comparison illustrates how the number of provinces affected by WNV cases has increased over time. The SIR dataset can be found on the Dryad public data repository and all the source data19.
Geographical distribution of Standardized Incidence Ratio for WNV for year 2012 (panel A) and 2023 (panel B) at provincial Italian level.
Methods
Data Source
For the calculation of the SIR, we utilized several public datasets provided by different sources. Marco Mingione et al. made the data on positive cases directly available for download on their GitHub page17. We define positive cases as laboratory-confirmed human WNV cases meeting EU criteria, including neuroinvasive disease and non-neuroinvasive fever; asymptomatic blood-donor detections and imported cases are excluded16,20,21. Detection practices evolved across years and regions (e.g., testing intensity for WNF, donor screening uptake); therefore, the SIR standardizes for age composition but remains sensitive to heterogeneity in case ascertainment.For population data (provincial, single year of age) we used the official ISTAT endpoints following the I.Stat → IstatData migration: 2019–2024 from the IstatData DataBrowser (dataset DCIS_POPRES1) and 2012–2018 from the legacy interface, harmonizing NUTS-3 identifiers and age strata across sources22,23.
ISTAT is a public organization that provides statistical information about the Italian territory and population.
Table 1 describes all the variables used for the calculation of the SIR.
Observed number of positives (Op, y)
We define Op, y as the annual count of laboratory-confirmed human West Nile virus infections assigned to the province (p) of exposure and stratified by age classes, as reported in the national ISS surveillance bulletins in year y. In line with the EU surveillance case definition (Decision (EU) 2018/945) and its adoption in Italy, confirmed cases meet clinical and laboratory criteria20. For the purpose of incidence estimation, we include clinical presentations classified as neuroinvasive disease and non-neuroinvasive fever, while we exclude asymptomatic detections in blood donors and imported cases. Weekly bulletin counts are aggregated to yearly totals by province and age group before computing the SIR. Human case data are taken from the curated, bulletin-derived open dataset by Mingione et al., which standardizes provincial identifiers and age strata and provides the per-year CSV files used here16. The ISS bulletins and the integrated national plan describe surveillance flows, case classifications, and reporting by province of exposure21,24.
The data on human positive cases for WNV can be downloaded from the GitHub repository (https://github.com/fbranda/west-nile)17. The file “latest-wnv.csv” contains all positive WNV cases, including humans, horses, mosquitoes, and birds. The data are provided at the provincial level and cover the period 2012–2024. We aggregated the data by age group and symptoms to avoid information loss. The final dataset is available on the Dryad platform.
Age-specific number of positive cases of reference population (Pi, y)
Using the surveillance dataset curated by Mingione et al.16, it has been possible to estimate the number of WNV-positive cases in the reference population, stratified by age groups. Specifically, the data are divided into the following five age groups: “<=14, 15–44, 45–64, 65–74, >=75.” We define 𝑖 as the age-group index. The reference population, as detailed in the section “Standardized Incidence Ratios Computation,” consists of all provinces that tested positive in the reference year.
Resident population at provincial (ni, p, y) and national levels (Ni, y)
The total resident population at provincial level (ni, p, y) was retrieved from the ISTAT data warehouse. Following the recent migration from I.Stat to the new IstatData platform, provincial resident-population series by single year of age are split across two official endpoints. For years 2019–2024 we retrieved data from the IstatData DataBrowser (dataset DCIS_POPRES1). For years 2012–2018 we used the legacy ISTAT interface providing annual population estimates by single year of age. We harmonized province identifiers (NUTS-3), age strata, and field names across sources and verified year-to-year continuity prior to computing SIR22,23.
The reconstruction process takes into account demographic flows such as births, deaths, migration and acquisition of citizenship. The data are stratified by age, with a resolution of 1 year, from 0 to over 100 years, and additionally grouped by provinces. The population values at national level (Ni, y) were then calculated directly by an aggregation process.
Standardized Incidence Ratios computation (SIR)
In this study, we focused on calculating the Standardized Incidence Ratio (SIR) for WNV cases in humans. This index compares the number of recorded positive cases, in our case at the Italian provincial level, with the expected number of positive cases in the reference population, in our analysis the subset of Italian provinces reporting at least one case in each given year (2012–2024).
The SIR eliminates the effect of population size in a province, as it is standardized by the population of both the province under examination and the reference population25. This standardization enables comparisons between provinces with different population sizes. Consequently, the SIR facilitates epidemiological analyses while minimizing potential biases associated with different study areas. In this framework, the SIR is a relative, age-adjusted metric: SIR > 1 indicates more cases than expected given the province’s age structure (excess risk relative to the reference), SIR = 1 indicates parity with expectation, and SIR < 1 indicates fewer cases than expected. The SIR complements—rather than replaces—crude incidence per 100,000, especially when comparing small areas or sparse counts. Its use and interpretation are well established in disease mapping and public-health surveillance26,27.
A critical aspect of calculating the SIR is the choice of the reference population. In the case of WNV, many provinces report no positive cases. This could introduce bias in the distribution of SIR values, leading them to all exceed 1, thereby failing to provide meaningful information.
To solve this problem, we included only those provinces in the reference population that reported at least one positive case in the year in question. This approach ensured that the resulting SIR values were between less than 1 and more than 1, thus providing the desired insight.
The SIR value also considers the possible effects of age and gender within the analyzed population. Advancing age substantially increases the risk of WNV neuroinvasive disease and the likelihood of clinical detection and reporting; therefore, age strongly influences observed case counts and motivates age standardization in our analyses28,29. In some cases, age-specific information was not available in the downloaded dataset. In order to avoid underestimating the number of positives, we standardized these data using the average population across all age groups.
We calculated the SIR for each available year between 2012 and 2024. The mathematical expression of the SIR is as follows:
where Op, y and Ep, y are the observed and the expected number of recorded cases for province p and year y, respectively. Ep, y is defined as:
in which \({R}_{i,y}^{P}\) is the age-specific incidence rates of the reference population and ni, p, y is the age-specific population size for the given locality. In particular \({R}_{i,y}^{P}\) is calculated using the formula:
\({R}_{i,y}^{P}\) is obtained dividing the number of positive cases by age of the reference population Pi, y with the age-specific reference population size Ni, y.
Data Records
The dataset of Standardized Incidence Ratio for West Nile Virus is available on Dryad platform19. The dataset includes SIR values for the years 2012–2024 at the provincial level. Provinces where no positive cases were ever recorded (in the considered time interval) are excluded. The root directory in Dryad, named “DryadWNV”, contains a .csv file called “sir_tot.csv” and a subfolder named “Auxiliary_data”, which includes all datasets used to compute the SIR index. Specifically, this folder contains an Excel file, “Ni.xlsx”, and 13 .csv files named “wn-ita-provinces-human-surveillance-20***.csv”, one for each year from 2012 to 2024. Both file types include a “Province” column, indicating the corresponding province, and an “Age” column, specifying the age group. The Excel file provides annual population values by age group at the provincial level (with each year as a separate column), while the surveillance .csv files report the number of newly registered cases per province and age group for each year.
The “sir_tot.csv” file compiles the calculated SIR (Standardized Incidence Ratio) values for each province and each year. Furthermore, the columns “mean”, “min,” “max,” and “std” represent the mean, minimum, maximum, and standard deviation of the SIR, respectively, excluding missing values. We also included the “Gradient” feature, calculated in the paragraph “Slope Analysis”, for all the provinces. This feature provides information about the trend of the SIR values over the considered years. Finally, a “README” file that provide further details about the data structure is included.
Technical Validation
To evaluate the informational content of the calculated index, we compared SIR values with the number of positive cases in humans, mosquitoes, and birds, as shown in Table 2. These data are available from the cited source17. Specifically, we employed Pearson’s correlation coefficient (R) and its associated p-value, repeating the analysis for each year, as presented in the table. As expected, we found a strong positive correlation between SIR values and the number of positive human cases. Although the strength of this correlation varies, the results are statistically significant. This pattern was not observed when analyzing mosquitoes and birds. Indeed, prior to 2017, positive cases in animals were not recorded30. For subsequent years, the results show that the R values calculated for other variables are not statistically significant, especially in birds. However, an interesting trend emerges when analyzing the number of positive mosquitoes in recent years. Starting in 2022, the calculated correlation values are strongly positive and statistically significant. This may indicate an increase in positive cases among mosquitoes, potentially driven by rising temperatures31, which in turn could lead to an increase in human cases. Beyond temperature, heterogeneity in entomological sampling intensity (trap density, temporal coverage), laboratory diagnostic protocols, and circulating WNV lineages may influence mosquito-positivity indicators and their comparability across years and areas.
Finally, when considering all years together, the results become statistically more reliable. The correlation between SIR and human cases is 0.50, highlighting that while SIR is correlated with the number of positive cases, it contains distinct information. SIR contributes information distinct from raw counts by adjusting for provincial age structure; together with the slope ("Gradient”) and dispersion metrics (Mean, Sd), it facilitates cross-province comparability and temporal trend assessment. The correlation with positive mosquito cases is 0.32, the highest among the other variables, indicating a relationship between infected mosquito populations and high SIR values. The correlation with positive bird cases is less evident but still statistically valid.
Slope Analysis
After calculating the SIR values, we analyzed their temporal trends. Specifically, for each province, we computed a linear regression line using the SIR values over the study period (2012–2024). From the regression line, we obtained the slope coefficient, which was used to generate the Fig. 2. Missing values were replaced with 0.
Trend of SIR values over the analyzed time period (2012-2024) for each Italian province, obtained using the slope of the line obtained through linear regression. Missing values of a province are substituted with 0, if the province has recorded at least one positive case.
As shown in the figure, most trends are positive, except for the provinces of Matera, Gorizia, Rimini, Livorno, and Pavia, which exhibit a slightly negative coefficient (mean: -0.018 ± 0.001), with an overall mean trend of 0.05 ± 0.07. Among the provinces with increasing SIR values, Padua (0.404), Lodi (0.237), and Modena (0.224) stand out. Further details are provided directly in the dataset.
These results demonstrate that WNV cases are increasing over time in Italy, underscoring the importance of studying the potential causes of this trend.
Usage Notes
This paper presents a database on incidence rates (in the form of Standardized Incidence Ratios, SIR) for WNV in Italy at provincial level between 2012 and 2024. The dataset is open to public use without limitation. The permanent storage is on Dryad19. The archive contains two components for analysis. For most applications, use sir_tot.csv, which provides province-level SIRs for human West Nile virus for 2012–2024 alongside simple across-year summaries (Gradient, Mean, Sd, Max, Min). Users who wish to verify or extend the table can consult Auxiliary_data/, which stores the inputs used to assemble the dataset: Ni.xlsx (year-specific sheets with provincial resident population by single year of age) and the per-year surveillance files wn-ita-provinces-human-surveillance-20YY.csv. Reading these files requires only standard CSV/Excel import routines. No special preprocessing is required beyond ensuring consistent province identifiers; after loading, analysts can filter years of interest, compute additional summaries, or merge with covariates for modeling or visualization.
Data availability
The West Nile virus SIR dataset described in this Data Descriptor is publicly available on Dryad at https://doi.org/10.5061/dryad.95x69p8x3. All accompanying metadata and yearly surveillance auxiliary files are included in the same repository.
Code availability
The scripts used to compute province-level SIRs and assemble the final dataset were written in R (v4.2.2)32 using readxl and dplyr. The workflow reads the auxiliary inputs (ISTAT population by age and ISS surveillance CSVs), derives age-specific reference rates, applies the SIR formula by province and year, and writes the consolidated output (sir_tot.csv) with summary statistics. The complete, reproducible code— including the calculate_sir function and minimal examples— is openly available at https://doi.org/10.5281/zenodo.1521157333.
References
Petersen, L. R. & Roehrig, J. T. West Nile virus: a reemerging global pathogen. Emerging Infectious Diseases 7(4), 611–614, https://doi.org/10.3201/eid0704.010401 (2001).
Gyure, K. A. West Nile Virus Infections. Journal of Neuropathology & Experimental Neurology 68(10), 1053–1060, https://doi.org/10.1097/NEN.0b013e3181b88114 (2009).
Salcuni, P. & Rizzo, C. West Nile disease: review of clinical features and risk factors associated with severe disease. Infezioni in Medicina 19(1), 5–15 (2011).
Autorino, G. L. et al. West Nile virus epidemic in horses, Tuscany region, Italy. Emerging Infectious Diseases 8(12), 1372–1375, https://doi.org/10.3201/eid0812.020234 (2002).
Savini, G., Monaco, F., Calistri, P. & Lelli, R. Phylogenetic analysis of West Nile virus isolated in Italy in 2008. Eurosurveillance 13(48), 19048 (2008).
Rizzo, C. et al. West Nile virus transmission: results from the integrated surveillance system in Italy, 2008 to 2015. Eurosurveillance 21(37), 30340, https://doi.org/10.2807/1560-7917.ES.2016.21.37.30340 (2016).
Monaco, F. et al. The 2011 West Nile disease outbreak in Sardinia region, Italy. Veterinaria Italiana 51, 5–16, https://doi.org/10.12834/VetIt.260.2386.2 (2015).
Savini, G. et al. Evidence of West Nile virus lineage 2 circulation in Northern Italy. Veterinary Microbiology 158(3-4), 267–273, https://doi.org/10.1016/j.vetmic.2012.02.018 (2012).
Mancini, G. et al. Mosquito species involved in the circulation of West Nile and Usutu viruses in Italy. Veterinaria Italiana 53(2), 97–110, https://doi.org/10.12834/VetIt.114.933.4764.2 (2017).
Sanità ISd La sorveglianza dei casi umani di infezione da West Nile e Usutu virus — Bollettini. (2023).
Calzolari, M. et al. West Nile virus surveillance in 2013 via mosquito screening in northern Italy and the influence of weather on virus circulation. PLoS ONE 10(10), e0140915, https://doi.org/10.1371/journal.pone.0140915 (2015).
Riccardo, F. et al. An early start of West Nile virus seasonal transmission: the added value of One Health surveillance in detecting early circulation and triggering timely response in Italy, June to July 2018. Eurosurveillance 23(32), 1800427, https://doi.org/10.2807/1560-7917.ES.2018.23.32.1800427 (2018).
Sanità ISd West Nile Fever (English factsheet). (2023).
Candeloro, L. et al. Predicting WNV circulation in Italy using Earth observation data and extreme gradient boosting model. Remote Sensing 12(18), 3064, https://doi.org/10.3390/rs12183064 (2020).
Bonicelli, L. et al. Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks. IEEE Transactions on Geoscience and Remote Sensing (2023).
Mingione, M., Branda, F., Maruotti, A., Ciccozzi, M. & Mazzoli, S. Monitoring the West Nile virus outbreaks in Italy using open access data. Scientific Data 10(1), 777 (2023).
Branda F. West Nile data Italy (fbranda/west-nile). (2025).
Stella, R. Quah Observational Epidemiology. International Encyclopedia of Public Health (Second Edition) 295–307, https://doi.org/10.1016/B978-0-12-803678-5.00310-6 (2017).
Fania, A. et al. Standardized incidence ratio dataset of human West Nile virus in Italy (2012–2024). Dryad https://doi.org/10.5061/dryad.95x69p8x3 (2025).
Commission Implementing Decision (EU) 2018/945 of 22 June 2018 on the communicable diseases and related special health issues to be covered by epidemiological surveillance as well as relevant case definitions. European Commission (2018).
Sanità ISd La sorveglianza dei casi umani di infezione da West Nile virus — Bollettini. (2025).
(ISTAT) INdS Resident population by single year of age — Provinces (DCIS_POPRES1). (2025).
(ISTAT) INdS Population estimates 2002–2019 by single year of age. (2025).
Salute, M. D., Sanità Sd Piano nazionale di prevenzione, sorveglianza e risposta alle Arbovirosi (PNA) 2020–2025. (2020).
Rioux, C., Grandbastien, B. & Astagneau, P. The standardized incidence ratio as a reliable tool for surgical site infection surveillance. Infection Control & Hospital Epidemiology 27(8), 817–824, https://doi.org/10.1086/506082 (2006).
Clayton, D. & Kaldor, J. Empirical Bayes Estimates of Age-Standardized Relative Risks for Use in Disease Mapping. Biometrics 43(3), 671–681, https://doi.org/10.2307/2532003 (1987).
Control CfD, Prevention The NHSN Standardized Infection Ratio (SIR): A Guide to the SIR (Based on 2015 National Baseline). (2024).
Petersen, L. R., Brault, A. C. & Nasci, R. S. West Nile Virus: Review of the Literature. JAMA 310(3), 308–315, https://doi.org/10.1001/jama.2013.8042 (2013).
Clark, M. B. & Chandrasekaran, P. West Nile Virus. StatPearls https://www.ncbi.nlm.nih.gov/books/NBK544246/ (2023).
(ISS) ISdS West Nile virus — Archivio 2017. (2017).
Dohm, D. J., O’Guinn, M. L. & Turell, M. J. Effect of environmental temperature on the ability of Culex pipiens (Diptera: Culicidae) to transmit West Nile virus. Journal of Medical Entomology 39(1), 221–225, https://doi.org/10.1603/0022-2585-39.1.221 (2002).
Team RC R: A language and environment for statistical computing. (2018).
Fania, A. et al. Standardized incidence ratio dataset of human West Nile virus in Italy (2012–2024): Code. Zenodo https://doi.org/10.5281/zenodo.15211573 (2025).
Acknowledgements
We are thankful to Marco Mingione et al. and the Italian Statistical Institute (ISTAT) for providing the raw data on positive cases and demography. This research was supported by EU funding within the MUR PNRR Extended Partnership initiative on Emerging Infectious Diseases (Project no. PE00000007, INF-ACT).
Author information
Authors and Affiliations
Contributions
Conceptualization A.F., A.M.; Methodology A.F., A.M., N.A., N.F.; Formal analysis A.F., A.M.; Writing (Original Draft) A.F., A.M. and P.C.; Writing (Review & Editing) A.F., P.C., V.I.V., G.E., A.P., A.M., N.A., L.B., R.C.G., N.F., A.L., E.P., S.T., A.V., R.B.; Data Curation A.F.; Software A.F.; Visualization A.F., AM; Validation A.M., N.A., R.C.G., G.E. and A.P.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Fania, A., Capozza, P., Cazzolla Gatti, R. et al. Standardized Incidence Ratio dataset of Human West Nile Virus in Italy (2012-2024). Sci Data 12, 1861 (2025). https://doi.org/10.1038/s41597-025-06129-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-06129-8




