Abstract
Tracking influenza and similar respiratory diseases is an important problem in public health and clinical medicine. The problem is complicated by the clinical similarity and co-occurrence of many of these illnesses. Additionally, recent history has shown that detecting new or reemergent diseases, such as COVID-19, is of paramount importance. This paper describes the design and testing of a system called ILI Tracker that is capable of tracking known influenza-like illnesses and early and accurately detecting the presence of a novel disease, such as COVID-19. We extracted clinical findings from 2.9M clinical records from five emergency departments using natural language processing. We constructed statistical models of six influenza-like illnesses for the first five years of the dataset and then used these models and a Bayesian filter to track the rates of these diseases in the five remaining years of data. We found significant daily correlation with the number of patients who were diagnosed with influenza and respiratory syncytial virus, but lower correlation with the other tracked diseases. We extended ILI Tracker to detect the presence of a novel, unmodeled disease, resulting in a strong signal near the beginning of the COVID-19 outbreak, and also in response to artificial injections of COVID-19 cases into case data streams, and known outbreaks of influenza and RSV treated as novel, unmodeled diseases. Our results suggest that ILI Tracker can detect the presence of a novel, unmodeled disease in a timely fashion with few false alarms. The ILI Tracker system is freely available.
Data availability
ILI Tracker is freely available for use in monitoring for disease outbreaks, including outbreaks of novel or reemergent diseases. The source code is available at https://github.com/RodsLaboratory/PDS. We also have made available a Docker container that embeds the ILI Tracker code within a more comprehensive outbreak detection system, which is available at https://github.com/rodslaboratory/pds-docker. The container includes the following com-ponents: (1) scripts to run MetaMap Lite to convert free-text ED reports into coded Concept Unique Identifiers (CUIs) from UMLS, (2) the CDS case-detection system, which performs Bayesian case diagnosis of modeled diseases, (3) the ILI Tracker program, (4) scripts that integrate all components into a complete processing pipeline, (5) a web application for running the entire system, and (6) simulated ED reports to use in conducting a test drive of the system. Documentation for the Docker container also includes instructions for processing a user’s own ED reports through the system. Real-time deployment in EDs and other highvolume healthcare settings could help clinicians and public health officials recognize the emergence of new disease outbreaks earlier. The source ED data used in this project are protected health information and therefore cannot be shared externally.
References
Aronis, J. M. et al. A Bayesian system to detect and track outbreaks of influenza-like illnesses including novel diseases. JMIR Public Health Surveill. https://doi.org/10.2196/57349 (2024).
Villanueva, J., Schweitzer, B., Odle, M. & Aden, T. Detecting emerging infectious diseases: An overview of the laboratory response network for biological threats. Public Health Rep. 134, 16S (2019).
Smith, G. et al. Developing a national primary care-based early warning system for health protection-a surveillance tool for the future? Analysis of routinely collected data. J. Public Health (Oxf.) 29, 75–82 (2007).
Ginsberg, J. et al. Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009).
Villamarin, R., Cooper, G. F., Wagner, M., Tsui, F.-C. & Espino, J. A method for estimating from thermometer sales the incidence of diseases that are symptomatically similar to influenza. J. Biomed. Inform. 46, 444–457 (2013).
Kim, J. & Ahn, I. Infectious disease outbreak prediction using media articles with machine learning models. Sci. Rep. https://doi.org/10.1038/s41598-021-83926-2 (2021).
Henning, K. J. Overview of syndromic surveillance: What is syndromic surveillance. Morb. Mortal. Wkly. Rep. (MMWR) 53, 7 (2004).
Hughes, H. E., Edeghere, O., O’Brien, S. J., Vivancos, R. & Elliot, A. J. Emergency department syndromic surveillance systems: A systematic review. BMC Public Health 20, 1891 (2020).
Li, M. et al. Time of arrival analysis in NC DETECT to find clusters of interest from unclassified patient visit records. Online J. Public Health Inform. 5, e13 (2013).
Burkom, H., Elbert, Y., Piatko, C. & Fink, C. A term-based approach to asyndromic determination of significant case clusters. Online J. Public Health Inform. 7, e11 (2015).
Nobles, M., Lall, R., Mathes, R. W. & Neill, D. B. Presyndromic surveillance for improved detection of emerging public health threats. Sci. Adv. 8, eabm4920 (2022).
Aronis, J. M. et al. A Bayesian approach for detecting a disease that is not being modeled. PLoS ONE 15, e0229658 (2020).
Visweswaran, S. et al. An atomic approach to the design and implementation of a research data warehouse. J. Am. Med. Inform. Assoc. 29, 601 (2010).
Aronson, A. R. & Lang, F.-M. An overview of MetaMap: Historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17, 229–236. https://doi.org/10.1136/jamia.2009.002733 (2010).
Mitchell, T. M. Machine Learning (McGraw-Hill, 1997).
Wagner, M. M., Gresham, L. S. & Dato, V. Case detection, outbreak detection, and outbreak characterization. In Handbook of Biosurveillance, 27–50 (Elsevier Academic Press, 2006).
RODS Laboratory GitHub https://github.com/RodsLaboratory.
Funding
This research was supported by grant R01LM013509 (Automated Surveillance of Overlapping Outbreaks and New Outbreak Diseases) from the National Library of Medicine (NLM) of the U.S. National Institutes of Health (NIH). Harry Hochheiser and Jessi Espino also received support from NIGMS grant U24GM132013 (MIDAS Coordination Center) and NIGMS grant R24GM153920 (MIDAS Coordination Center). Ye Ye also received support from NLM grant R00LM013383 (Transfer Learning to Improve the Re-usability of Computable Biomedical Knowledge). Marian Michaels also received support from CDC grant U01IP001152 (New Vaccine Surveillance Network). Jessi Espino also received support from CDC grant 5U01IP001184 (Evaluating Respiratory Virus Vaccine Effectiveness in a Large, Diverse Healthcare System). This work was also supported by the National Institutes of Health through Grant Number UL1 TR001857.
Author information
Authors and Affiliations
Contributions
All authors contributed to the writing of this paper. John Aronis provided conceptual formulation, mathematical modeling, implementation and testing of the ILI Tracker system. Ye Ye provided conceptual formulation, mathematical modeling, implementation and testing of patient modeling. Jessi Espino provided conceptual formulation, data management, and extraction of MetaMap findings. Marian Michaels provided conceptual formulation, and clinical expertise. Harry Hochheiser provided conceptual formulation. Gregory Cooper provided conceptual formulation, clinical expertise, and mathematical modeling.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Aronis, J.M., Ye, Y., Espino, J. et al. An evaluation of a Bayesian method to track outbreaks of known and novel influenza-like illnesses. Sci Rep (2026). https://doi.org/10.1038/s41598-026-45934-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-45934-y