Time series anomaly detection in helpline call trends for early detection of COVID-19 spread across Sweden, 2020

Hashemi, Atiye Sadat; Dietler, Dominik; Fall, Tove; Inghammar, Malin; Johansson, Anders F.; Bonander, Carl; Ohlsson, Mattias; Björk, Jonas

doi:10.1038/s41598-025-20641-2

Download PDF

Article
Open access
Published: 24 September 2025

Time series anomaly detection in helpline call trends for early detection of COVID-19 spread across Sweden, 2020

Atiye Sadat Hashemi¹,
Dominik Dietler¹,
Tove Fall²,
Malin Inghammar^3,4,
Anders F. Johansson⁵,
Carl Bonander⁶,
Mattias Ohlsson⁷ &
…
Jonas Björk^1,8

Scientific Reports volume 15, Article number: 32701 (2025) Cite this article

2062 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Timely detection and surveillance of disease community spread is a potent tool for implementing effective public health interventions. This study investigates the National Telehealth Service (1177 helpline) across 18 regions in Sweden in 2020 to identify early signals of community transmission of COVID-19 at the beginning of the pandemic. Focusing on calls related to key COVID-19 symptoms (cough, fever, and breathing difficulties in adults), we analyze their frequency and distribution across referral categories, comparing them to 2019 data. We employ an explainable time series anomaly detection algorithm using daily call data to identify the first collective anomalies across regions. The results show that anomalies in call data were correlated with, but preceded, the first confirmed case infected in Sweden by a median of 7 days (IQR: 2.5–10.5) and the first hospitalized case infected in Sweden by a median of 13 days (IQR: 7.25–16). They also preceded the estimated onset of community spread, indicated by the absolute confirmed cases (median: 24.5, IQR: 18.25-32.5), and severe outcomes defined by hospitalizations (median: 33, IQR: 27.25-44). These findings showcase how helpline call monitoring, using time series anomaly detection, can aid early outbreak detection.

COVID-19’s natural course among ambulatory monitored outpatients

Article Open access 12 May 2021

Mental health concerns during the COVID-19 pandemic as revealed by helpline calls

Article 17 November 2021

Improving CNN predictive accuracy in COVID-19 health analytics

Article Open access 14 August 2025

Introduction

The coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was first identified in December 2019 in Wuhan, China¹. Its rapid spread across Asia, Europe, and the rest of the world led to a global pandemic declaration by the World Health Organization in March 2020². Europe experienced its first wave of infections in early 2020. In Sweden, the first SARS-CoV-2 infection was detected on January 31³, a traveler from Jönköping returning from Wuhan, China⁴. In March 2020, community transmission was evident, posing significant challenges for public health surveillance and containment^5,6.

Surveillance of community spread is essential for effective public health responses⁶. Various approaches exist for disease surveillance, with one method being the monitoring of confirmed cases through widespread testing programs⁷. However, case counts are an unreliable metric during the emergence of a novel infectious disease due to limited testing, lab capacity, and selection bias in severity and demographics, hindering accurate outbreak tracking and decision-making. Alternative approaches, such as monitoring syndromic health data, can provide near-real-time insights into emerging diseases, as evidenced by research conducted in the United States and the United Kingdom^8,9,10,11. Other surveillance approaches include wastewater sampling in the United States¹², app-based symptom reporting in Sweden¹³, England and the UK^14,15,16,17, and search ranks from internet search engines in the United States¹⁸.

Patients infected with the SARS-CoV-2 virus may experience symptoms such as cough, fever, breathing difficulties, sore throat, diarrhea, headache, muscle or joint pain, fatigue, and loss of smell and taste¹⁹. In Sweden, data on such symptoms can be gathered from the National Telehealth Service (1177 helpline)²⁰. This telehealth service for triage and referral is operated by registered nurses and is available for everyone free of charge^21,22. It is also widely used among various population groups, including the elderly population²³.

The call data from the 1177 helpline has been analyzed for its potential use in the national surveillance of various diseases in Sweden^24,25,26,27. For instance, Bjelkmar et al.²⁴ demonstrated the early detection of a large waterborne cryptosporidiosis outbreak in Skellefteå by retrospectively linking calls to the 1177 helpline with water distribution areas. Martin et al.²⁵ introduced a novel approach to tick-borne encephalitis (TBE) surveillance, utilizing data from the 1177 helpline, diagnosed case reports, and environmental factors to improve the detection and monitoring of TBE outbreaks between 2010 and 2017. Ma et al.²⁶ assessed the effectiveness of three syndromic surveillance methods, web queries, 1177 helpline data, and school absenteeism, in detecting influenza activity in Sweden. Their findings revealed that web queries and 1177 helpline data produced results comparable to traditional surveillance systems, while school absenteeism data proved unreliable. Although these tools were not consistently early indicators, they could detect influenza cases before primary healthcare systems. Andersson et al.²⁷ also compared local outbreak signals from 1177 helpline, web queries, and OTC antidiarrheal sales against known outbreaks, identifying helpline data as the most effective tool for early detection and monitoring. Some spatiotemporal analyses have also been conducted for COVID-19 in Sweden or for specific regions^28,29,30.

The purpose of our study is to investigate whether the 1177 helpline in Sweden could serve as an early indicator of emerging disease patterns during the COVID-19 pandemic. The specific aim is to test the hypothesis that collective anomalies, groups of data points exhibiting abnormal behavior, in 1177 helpline call patterns could provide early signals of key pandemic outcomes, such as the onset of community spread and severe cases leading to hospitalization or death. To address this, we develop an algorithm to detect anomalies in time series call data and visualize their spatiotemporal patterns. We also conduct a comparative analysis of 1177 call patterns from the early phase of the COVID-19 pandemic in 2020 and the corresponding period in 2019, focusing on COVID-19 key symptoms, and examining their frequency and distribution across referral categories. Using the COVID-19 pandemic as a case study, this approach aims to enhance syndromic surveillance methods for detecting emerging disease patterns in future outbreaks.

Telehealth call and COVID-19 data

Telehealth call data in this study includes records from the 1177 helpline, which provides medical counselling on care and illnesses in Sweden^20,31. Registered nurses who answer calls on the 1177 helpline use a medical decision support system to assess the healthcare needs of callers³². Each call is classified according to one specific reason from a predetermined list (known as Contact Reason). We also utilize the public data on SARS-CoV-2 confirmed case from Sweden’s public health authority (based on the SMINET register)³³. Furthermore, the individual-level data from 1177 helpline calls is linked with hospitalization data (National Patient Register, Inpatient and Outpatient Care) and death data (Cause of Death Register) from the Swedish National Board of Health and Welfare, to analyze community spread. The SARS-CoV-2 confirmed case, hospitalization, and death data sources cover the entire country and include individuals of all age groups during the study period.

Table 1 shows the overview of the 1177 helpline dataset. A key variable in this study is ”Contact Reason”, encompassing about 190 distinct codes for reasons ranging from common symptoms such as cough, fever, and abdominal pain, to sleep problems as well as for general medical inquires or administrative matters (e.g., scheduling appointments). In the dataset, only one main ”Contact Reason” is recorded per call, reflecting the primary reason for contact as assessed by the nurse. Specifically, our analysis focuses on three key COVID-19 symptoms i.e., cough, fever, and breathing difficulties by adults³⁴ to elucidate changes in their combined frequency, distribution, and temporal trends. Another key variable, ”Referral Priority”, includes five unique categories: Immediate, Urgent, Within the next 24 hours, Next weekday, and Wait. For analysis, these categories are grouped into Emergency care (Immediate, Urgent), Primary care (Within 24 hours, Next weekday), and Wait. A Norwegian study shows that callers understand the advice given by registered nurses, and the large majority of patients advised to wait did not contact their GP or other healthcare services again with the same complaints the following week³⁵. This suggests the classification of calls based on the ”Contact Reason” and ”Referral Priority” variables appear to be instrumental in managing healthcare needs efficiently and ensuring appropriate care pathways³⁵.

Table 1 Overview of the 1177 helpline dataset in this study.

Full size table

The dataset spans January 1, 2019, to December 31, 2020, capturing COVID-19-related activity and comparing helpline calls before the pandemic and during its first year.

The geographical coverage of our call data includes all regions in Sweden, excluding Stockholm, Östergötland, and Gotland, since data were not available. Specifically, we consider data from 16 regions, excluding the regions of Värmland and Södermanland from January 2019 to October/November 2019, and from 18 regions thereafter (because Värmland and Södermanland began using the 1177 system through Inera in October/November 2019). We present call data figures relative to regional population size. For reference, the populations of the regions in 2019 are listed in Table 2³⁶.

Method development

In this section, we outline the approach for detecting the first collective anomalies in symptoms reported to the 1177 helpline, which are then visualized through geographic mapping to highlight regional patterns. Additionally, we present the estimation of the onset of community spread and severe outcomes based on data from confirmed SARS-CoV-2 infections, hospitalizations, and deaths.

Detecting first anomalies in call data

Time series anomaly detection models are used in real-time surveillance systems, with different types of anomalies that can be prioritized depending on the specific application³⁷. In time series data, anomalies can be classified into various types, including point-wise, contextual, and collective anomalies³⁸. Point-wise anomalies refer to individual data points that deviate significantly from the overall distribution, while contextual anomalies arise when a data point deviates from expected behavior in a specific context. Collective anomalies, on the other hand, involve a group of data points that together exhibit abnormal behavior. Detecting collective anomalies requires examining the relationships between consecutive data points over time. While it is challenging to isolate specific types of anomalies as they are often interrelated, we aim to place a stronger emphasis on detecting collective anomalies. This is because sustained patterns over time, such as consecutive days of high call volumes to a helpline, are more likely to indicate meaningful trends rather than isolated anomalies.

To detect the first collective anomaly, we propose an algorithm that analyzes daily call signals derived from observed data in each region. The algorithm accounts for weekly and collective-term patterns through two key conditions:

Weekly Average Condition: which evaluates whether the current day’s value exceeds the 7-day rolling average from the previous week by a dynamic threshold.
Sequential Anomaly Condition: which identifies sustained increases over 1 to 7 consecutive days by comparing each sequence of days to the corresponding sequence from the previous week. It accounts for fluctuations in call volume that vary by weekday and requires the current sequence to exceed its past counterpart by a dynamic, proportional threshold, prioritizing collective anomalies over isolated spikes.

Mathematically, the applied algorithm can be expressed as:

$$\begin{aligned} (X_t> \alpha _t \cdot \bar{X}_{\text {last\_week}}) \quad \text {and} \quad \bigvee _{k=1}^{7} \left( \bigwedge _{i=0}^{k-1} \left( X_{t-i} > \left( 1+\frac{\alpha _t}{k} \right) X_{t-7-i} \right) \right) \end{aligned}$$

(1)

where $X_t$ denotes the observed data (in this study, it is the sum of calls for the three specified COVID-19 symptoms) in a specific region on day $t$, $\alpha _t$ is a dynamic threshold, and k represents the number of consecutive recent days (ranging from 1 to 7) being evaluated for sequential anomalies compared to the corresponding days in the previous week. The equation detects anomalies by evaluating the current value against two contextual conditions as mentioned above. First, it compares $X_t$ with the 7-day average ($\bar{X}_{\text {last\_week}} = \frac{1}{7} \sum _{i=1}^{7} X_{t-i}$). Then, it checks the sequential anomalies over varying sequence lengths to verify if the values over any 1 to 7-day period are significantly larger than the corresponding values from the previous week, capturing sustained and sequential increases.

We employ a dynamic threshold for the parameter $\alpha _t$ in Equation (1) to improve sensitivity and reduce false positives. Rather than using a fixed value, $\alpha _t$ is adjusted over time according to the rolling standard deviation of the data within a 7-day window. This dynamic threshold is expressed as $\alpha _t = \alpha _0 + c\sigma _t$ where $\alpha _t$ represents the threshold at time $t$, $\alpha _0$ is the base threshold value reflecting the initial sensitivity to anomalies determined from the data, $\sigma _t$ is the rolling standard deviation of the data over the specified window, and $c$ is a scaling factor that adjusts the contribution of $\sigma _t$, which in this study is set to 0.001 after tuning using random search. To ensure generalizability and avoid overfitting to region-specific patterns, the algorithm and parameters, including $\alpha _0$ and $c$, were applied uniformly across all regions without any region-specific optimization.

Seasonal factors such as school breaks, national holidays, and vacation periods can influence telehealth call volumes and healthcare-seeking behavior. While the main analysis did not explicitly adjust for seasonality in order to avoid excluding meaningful anomalies coinciding with seasonal patterns, we additionally implemented a complementary seasonality adjustment as:

$$\begin{aligned} (X_t> \alpha _t \cdot \bar{X}_{\text {last\_week}}) \quad \text {and} \quad \bigvee _{k=1}^{7} \left( \bigwedge _{i=0}^{k-1} \left( X_{t-i}> \left( 1+\frac{\alpha _t}{k} \right) X_{t-7-i} \right) \right) \text{and} \quad \bigvee _{k=1}^{7} \left( \bigwedge _{i=0}^{k-1} \left( X_{t-i} > \left( 1+\frac{\alpha _t}{k} \right) X_{t-i-365} \right) \right) \end{aligned}$$

(2)

where the last condition aims to check the current call volumes to the same period in the previous year. This approach balances sensitivity to both seasonal and outbreak-driven changes in call patterns.

While the proposed method focuses on developing an explainable data-driven algorithm for detecting collective anomalies, we also employ a z-score-based anomaly detector³⁹ to identify significant temporal deviations, providing a baseline for comparison. This method uses a 30-day rolling window to calculate the mean and standard deviation. The z-score is computed by subtracting the rolling mean from the current value and dividing by the rolling standard deviation. Anomalies are flagged when the z-score exceeds a predefined threshold, indicating a significant deviation from the mean. Due to the data’s high variability, an appropriate threshold was selected to balance sensitivity and specificity. The first anomaly date for each region is recorded and presented in the results for comparison with our algorithm.

Community spread and severe outcomes: estimation of date of onset

To assess the temporal relationship between anomalies in call data and community spread, we first identify key event dates for each region. As community spread refers to widespread disease transmission without identifiable sources like travel, we report the first COVID-19 confirmed and hospitalized cases (ICD-10 codes U07.1 and U07.2) infected in Sweden for each region, as shown in Table 2. We also provide additional dates, including the first confirmed and hospitalized cases, and the first death, regardless of the infection source (might be linked to travel) in each region. It should be noted that although the first case in Sweden was reported in the media on January 31 in Jönköping, in our dataset, it is registered on February 4 in the same location. Moreover, COVID-19 hospitalization or death in a region markedly before the first confirmed case were regarded as registration errors and thus removed (three cases in Stockholm and one case in Skåne).

As a complementary approach, we also estimated the date of onset of community spread based on the regional data, including the daily number of SARS-CoV-2 infections confirmed cases and the onset of severe outcomes based on hospitalization, and death.

Specifically, we define the onset of community spread in a region as the point when the cumulative number of confirmed cases, hospitalizations, or deaths caused by COVID-19 reaches a specified threshold for that region. In this study, we apply relative and absolute thresholds based on regional population sizes to define community spread, using data from 2020. The relative threshold for defining significant outbreak activity is set at 25 confirmed cases per 100,000 people (cumulative over the entire pandemic)⁴⁰. Alternatively, an absolute threshold is also calculated based on the median population size of regions, i.e., Gävleborg (287,382). Applying the same rate (25/100,000) to this population yields 75 cases, rounded upward, as the absolute threshold. Thresholds for hospitalizations and deaths follow the same logic, with relative and absolute thresholds set accordingly.

In the Results section, we present trend analyses to examine changes in calls to the 1177 helpline before the pandemic (2019) and after its onset (2020), along with a referral priority analysis to evaluate shifts in nurse-assigned urgency levels over the same period. We also assess the correlation between the timing of the first anomalies in call data and the onset of community spread and severe outcomes, using Spearman’s rank correlation coefficients. Additionally, we report the median difference between the first anomaly and key public health milestones.

Table 2 Summary table for the important dates and statistical analysis of first detected anomalies in call data compared to other important dates in Sweden in 2020.

Full size table

Results

Data and trend analysis

Our analysis of the temporal trends in symptom-related calls to the 1177 helpline in Sweden includes the comparison of the daily call patterns for selected COVID 19 symptoms in 2019 and 2020. Figure 1 illustrates the daily call numbers per million population for COVID-19-related symptoms (breathing difficulties, cough, and fever), which shows a striking contrast between the years 2019 and 2020. In 2019, the number of calls with the selected symptoms remained almost stable throughout the year, however there is a notable spike in calls during the early weeks of the pandemic (weeks 11 to 14 for the first wave of the outbreak) in 2020. This surge corresponds with the onset of COVID-19 in Sweden (as well as with the influenza epidemic and human metapneumovirus) and illustrates significant shifts in symptom reporting patterns.

Referral priority analysis

The frequency of the three referral priority categories Emergency, Primary care, and Waiting over the study period is shown in Fig. 2a,b. They show how urgency in healthcare-seeking behavior varied during the first year of the pandemic for the selected symptoms and highlights that the number of calls by referral priority shifted markedly for all three categories from 2019 to 2020. Figure 3 also shows the daily call numbers per million population categorized by referral priority for breathing difficulties, cough, and fever, highlighting an increase in waiting category calls for cough and fever among adults.

Anomaly detection and correlation analysis

The COVID-19 outbreak reached different regions of Sweden at different time points between February 23, 2020, and March 11, 2020, according to the anomaly detection algorithm (Fig. 4a). Using the z-score method alone led to only marginal changes in the estimated outbreak pattern, which is illustrated in Fig. 4b. Figure 5a,b show the variable and the ratios from Equation (1) for Uppsala (R3) and Södermanland (R4), providing two examples of how our algorithm detects the first collective anomaly in the time series data in 2020.

The relationship between the timing of the first anomaly in call data and the date of the first confirmed and hospitalized cases infected in Sweden in each region in 2020 is illustrated in Fig. 6a,b. The scatter plot shows individual data points on the y-axis in relation to the corresponding first anomaly in the call data (on the x-axis). The black dashed line indicates where the values would be equal, serving as a reference to compare whether anomalies in call data occur before or after the events in each region. Spearman correlation coefficients further highlight the observed correlations. In certain regions, such as R25, anomalies were detected substantially earlier than the first confirmed COVID-19 case. While this may suggest a false positive, it is also plausible that these signals captured early, undetected transmission, particularly given the limited and delayed testing capacity during the initial phase of the pandemic.

The key data for each region, including population, the date of the first detected anomaly in call data, the date of the first confirmed case, first confirmed case infected in Sweden, first hospitalized case, first hospitalized case infected in Sweden, and first death as well as the estimated dates of onset of the community spread and severe outcomes based on different data sources in this study are summarized in Table 2 and Appendix Table A1. They present the statistical analysis, including correlation coefficients with 95% confidence intervals and median differences with interquartile ranges, comparing the first detected anomalies in call data to other important dates. Almost exclusively, the first confirmed and hospitalized cases infected in Sweden, and the first death occurred later than anomalies detected in the 1177 helpline data, with medians of 7 days (IQR: 2.5, 10.5), 13 days (IQR: 7.25, 16), and 22 days (IQR: 15.25, 26.75), respectively. Figure 7 also provides a visual timeline of early COVID-19 events across Swedish regions, complementing the data in Table 2 and aiding regional comparison.

As can be seen in Table 2, for both infections and hospitalizations, the anomaly signal correlated stronger with the first case than the first case of Swedish origin. This pattern is expected, as people usually call soon after the onset of the symptom, regardless of the origin of the infection. Regarding community spread (Appendix Table A1), the anomaly signal aligns more closely with absolute thresholds for community spread than with relative thresholds, and it shows stronger correlations with confirmed cases and hospitalizations than with deaths. Furthermore, it should be noted that, the strongest correlations observed in Tables 2 and A1 are of similar magnitude, supporting the robustness of the anomaly signal as an early proxy for emerging community spread for both sources of validation data.

Standard ROC analysis is not directly applicable in our setting, not only due to the absence of a well-defined ground truth, but also because we are estimating a time point and a time difference (in days) rather than a binary outcome. Instead, the reported IQR of the time difference versus the anomaly signal can be used to describe the variability in prediction error across regions. IQR thus serves as an indication of how well the anomaly signals in call data could have predicted the occurrence of COVID-19-related events across regions.

For instance, in Table 2 under the column for ”first hospitalized case infected in Sweden”, the median time difference suggests that we can predict that the first case infected in Sweden requiring hospitalization would occur 13 days (median time difference) after the anomaly signal. Additionally, we can interpret the IQR (7.25 to 16) as the variability in the prediction error across regions. This suggests that in 50% of regions, the actual event (hospitalization) occurred between -6 (6 days earlier) and +3 (3 days later) than predicted, relative to the anomaly detection date.

To assess the impact of seasonal effects, we performed a complementary analysis as mentioned in the method development part. Two regions were excluded due to a lack of data. For the remaining 16 regions with sufficient historical data, the adjustment yielded largely consistent anomaly detection dates, with only one region (Uppsala, R3) showing a one-day difference in the first detected anomaly.

The estimated onset of community spread, based on relative and absolute confirmed cases, as well as the onset of severe outcomes based on relative and absolute hospitalizations and deaths, occurs considerably later than when anomalies in helpline calls are used (Appendix Figure A1).

To enhance the contextual understanding of the data, we have included a supplementary table (Appendix Table A2) presenting the average daily number of calls per region related to selected COVID-19 symptoms (cough, fever, and breathing difficulties) in 2019 and 2020. The table also reports regional disposable income for the same years as a proxy for socioeconomic status. The data indicate that average disposable income varies only modestly across regions, suggesting limited regional socioeconomic disparities. In the Swedish context, socio-economic differences are mainly observed within different parts of the regions and even within cities while regions themselves are relatively comparable.

Discussion

The findings of the present study highlight the potential of monitoring 1177 helpline calls as an early indicator of infectious disease outbreaks. According to our proposed monitoring algorithm, the emerging COVID-19 pandemic in 2020 led to a surge in influenza-like symptoms at different time points in different regions in Sweden, starting on February 23, 2020. This surge was present in all included regions 17 days later, on March 11, 2020. On the regional level, the algorithm was able to signal community spread on average one week earlier than the first officially confirmed case of domestic origin and almost two weeks earlier on average than a domestic case was hospitalized. This finding aligns with a recent study by Dyrdak et al.⁴¹, who used stored respiratory samples and genome sequencing, and found that sustained community transmission in Sweden began at least a week earlier than previously recognized.

A major strength of the present study was that the available data, covering 18 out of 21 Swedish regions and 72% of the population, comes from a nationally implemented telehealth system (a service that is generally accessible to the public at no cost). Another strength was the granularity in the available individual-level data on symptoms, referral priority, confirmed cases, their likely origin, hospitalizations and deaths, which facilitated the estimation of spatiotemporal differences in disease transmission dynamics.

A limitation with the use of call data for anomaly detection is that citizens tend to use telehealth services more frequently in the event of a health crisis, such as a pandemic⁴². This would make the anomalies in call data more prone to noise and thereby dilute the association with the actual spread of the disease in society. Additionally, the absence of detailed data on media coverage and public announcements limits our ability to distinguish between increased calls due to awareness versus actual disease spread. Moreover, some studies have shown that clinical cases tend to represent a more consistent proportion of total cases compared to those reported through self-reported digital apps⁴³. Specifically, clinical cases demonstrated the highest correlation with unbiased household survey data, highlighting their reliability as a timely epidemic indicator. In contrast, self-reported data from digital apps, while also strongly correlated, captured a less consistent proportion of cases. Another important limitation was that the call data from the largest region (Stockholm) was not available, especially since this region was severely and earlier affected by the first wave of the pandemic⁴⁴.

Still, syndromic surveillance can be a helpful tool in detecting, monitoring, and managing public health events of concern⁴⁵. For example, the NHS Direct syndromic surveillance system has played a pivotal role in detecting and evaluating the impact of various health crises in the UK, including the 2009 influenza pandemic, the widespread flooding in 2007, and the ash cloud resulting from the Eyjafjallajökull volcanic eruption in Iceland in 2010¹¹. Data from the 1177 helpline holds promise for syndromic surveillance in Sweden^24,25,26,27, such as through time series anomaly detection (TAD) that can enable timely and accurate detection in surveillance systems. While several TAD methods have been proposed^46,47, the lack of a universally accepted definition of anomalies, along with challenges like dynamic changes in normal behavior, and the difficulty of generalizing across different domains, pose significant obstacles^47,48. Implementing data-driven approaches offers a promising solution to these challenges. The anomaly detection algorithm in this paper is a data-driven approach designed to identify both collective and point-wise anomalies. The z-score method produces results similar to our algorithm, which is expected in the context of COVID-19. However, we have tailored the more complex criteria to detect sequential outlier patterns. One should also consider that an important strength of anomaly detection approaches is their applicability to real-time surveillance. Although the present study used historical data, the analytical process was designed for prospective application, continuously assessing each new day’s data against recent trends using predefined rules to enable dynamic, real-time monitoring.

Variations in healthcare-seeking behavior and telehealth accessibility, shaped by factors such as demographics, healthcare infrastructure, and regional policies, can also influence the performance of anomaly detection. In this study, we used regional disposable income as a proxy for socioeconomic deprivation and observed relatively limited variability across Swedish regions. However, in settings with greater socioeconomic heterogeneity, incorporating deprivation indices would be crucial to assess and ensure the robustness of the anomaly detection method across diverse population groups. To account for such variation, the algorithm could be adapted in future applications by stratifying the data and computing thresholds separately for different socioeconomic strata or by including deprivation indices as covariates in a hierarchical modeling framework. These modifications would help the method adjust for structural differences in baseline call behavior related to access to care, risk perception, or health literacy.

Our findings suggest that COVID-19 was likely seeded at multiple locations in Sweden around the same time, but nevertheless escalated at different speeds across regions. This escalation exhibited a clear inverse association with population size, where our algorithm signaled anomalies earlier in more populated regions. This may reflect a larger inflow of the virus in those regions from abroad initially⁴⁹, but may also reflect the importance of population density as a determinant of the spread of emerging diseases⁵⁰. The observed pattern, where various areas experienced seeding simultaneously but with transmission escalating at different rates, is difficult to capture with diffusion models for spatio-temporal analyses⁵¹, which assume a more uniform spread. While methods like genome sequencing are also valuable for tracking the virus⁵², approaches such as ours based on helpline call data, offer a solution that provides real-time insights into the spatiotemporal spread, making it a cost-effective means for early detection of community transmission.

The Swedish Public Health Agency communicated on March 4, 2020, that all confirmed cases in Sweden were still linked to travelling⁵³. Clear signs of community spread were first officially observed on March 10 in the two largest regions, Stockholm and Västra Götaland, but still no general spread was noted for the rest of the country⁵⁴. For comparison, our monitoring algorithm had signaled symptom anomalies in 9 regions by March 4, in 13 regions by March 10, and in the remaining 5 regions the next day. The delay in the official reporting of community spread was most likely due to the limited testing for SARS-CoV-2, which during the early phase of the pandemic in Sweden was restricted to suspected cases linked to travelling and to people with symptoms seeking hospital care. A less restrictive testing policy could have led to earlier identification of domestic origin cases. However, our findings suggest that syndromic surveillance could have monitored disease transmission and detected community spread earlier, even without extensive testing.

The present study was part of the larger SWECOV project (Swedish Register-based Research Program on COVID-19), with cross-linked register data for the full population of Sweden during the pandemic years⁵⁵. This data infrastructure holds many promises for additional research. Future studies in this setting could, for example, increase the spatio-temporal granularity down to neighborhood or city areas, incorporate individual-level data on socioeconomic conditions, comorbidities, and health outcomes, integrate advanced machine learning models into disease surveillance, and include additional validation of our signal against other indicators of community spread, such as⁴¹. In conclusion, health helpline data can enhance public health surveillance by enabling early outbreak detection and monitoring spatio-temporal disease transmission. It is especially valuable in resource-limited settings for planning healthcare needs, protecting risk groups, and limiting disease spread.

Data availability

While some of the datasets analyzed in this study are publicly available, the telehealth call data remains restricted due to ethical considerations under the SWECOV project. Researchers interested in access may contact the SWECOV project coordinator for additional information. Dominik Dietler should be contacted if someone wants to request the data from this study.

References

Delatorre, E., Mir, D., Gräf, T. & Bello, G. Tracking the onset date of the community spread of SARS-CoV-2 in western countries. Mem. Inst. Oswaldo Cruz 115, e200183 (2020).
Article CAS PubMed PubMed Central Google Scholar
Olsen, S. J. et al. Early introduction of severe acute respiratory syndrome coronavirus 2 into Europe. Emerg. Infect. Dis. 26(7), 1567 (2020).
Article CAS PubMed PubMed Central Google Scholar
Pashakhanlou, A. H. Sweden’s coronavirus strategy: The Public Health Agency and the sites of controversy. World Med. Health Policy 14(3), 507–527 (2022).
Article PubMed Google Scholar
Bylund, P. L. & Packard, M. D. Separation of power and expertise: evidence of the tyranny of experts in Sweden’s COVID-19 responses. South. Econ. J. 87(4), 1300–1319 (2021).
Article PubMed PubMed Central Google Scholar
Li, G., Hilgenfeld, R., Whitley, R. & Clercq, E. Therapeutic strategies for COVID-19: progress and lessons learned. Nat. Rev. Drug Discov. 22(6), 449–475 (2023).
Article CAS PubMed PubMed Central Google Scholar
Smith, R. W. et al. Centralization and integration of public health systems: Perspectives of public health leaders on factors facilitating and impeding COVID-19 responses in three Canadian provinces. Health Policy 127, 19–28 (2023).
Article PubMed Google Scholar
Nsubuga, P., Mark E., White, S. B., Thacker, M. A., Anderson, S. B., Blount, C. V., Broome, T. M., Chiller et al. Public health surveillance: a tool for targeting and monitoring interventions (2011).
Güemes, A. et al. A syndromic surveillance tool to detect anomalous clusters of COVID-19 symptoms in the United States. Sci. Rep. 11(1), 4660 (2021).
Article ADS PubMed PubMed Central Google Scholar
Meyer, N., Jim McMenamin, C., Robertson, M. D., Allardice, G. & Cooper, D. A multi-data source surveillance system to detect a bioterrorism attack during the G8 Summit in Scotland. Epidemiol. Infect. 136(7), 876–885 (2008).
Article CAS PubMed Google Scholar
Rolland, E., Moore, K. M., Robinson, V. A. & McGuinness, D. Using Ontario’s ‘Telehealth’ health telephone helpline as an early-warning system: a study protocol. BMC Health Serv. Res. 6, 1–7 (2006).
Article Google Scholar
Elliot, A. J. et al. Syndromic surveillance to assess the potential public health impact of the Icelandic volcanic ash plume across the United Kingdom, April 2010. Eurosurveillance 15(23), 19583 (2010).
Article PubMed Google Scholar
Peccia, J., Zulli, A., Doug E., Brackney, N. D., Grubaugh, E. H., Kaplan, Casanovas-Massana, A., Ko, A. I., et al. “SARS-CoV-2 RNA concentrations in primary municipal sewage sludge as a leading indicator of COVID-19 outbreak dynamics.” MedRxiv (2020): 2020-05.
Kennedy, B. et al. App-based COVID-19 syndromic surveillance and prediction of hospital admissions in COVID Symptom Study Sweden. Nat. Commun. 13(1), 2110 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Varsavsky, T. et al. Detecting COVID-19 infection hotspots in England using large-scale self-reported data from a mobile application: a prospective, observational study. The Lancet Public Health 6(1), e21–e29 (2021).
Article PubMed Google Scholar
Canas, L. S. et al. Early detection of COVID-19 in the UK using self-reported symptoms: a large-scale, prospective, epidemiological surveillance study. The Lancet Digital Health 3(9), e587–e598 (2021).
Article CAS PubMed PubMed Central Google Scholar
Menni, C. et al. Real-time tracking of self-reported symptoms to predict potential COVID-19. Nat. Med. 26(7), 1037–1040 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fry, R. et al. Real-time spatial health surveillance: mapping the UK COVID-19 epidemic. Int. J. Med. Inf. 149, 104400 (2021).
Article Google Scholar
Ginsberg, J. et al. Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (2009).
Article ADS CAS PubMed Google Scholar
Struyf, T., Jonathan, J., Deeks, J. D., Takwoingi, Y., Davenport, C., Mariska M. G., Leeflang et al. Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19. Cochrane Datab. Syst. Rev. 5 (2022).
National Telehealth Service (Inera). Available at: https://www.inera.se/tjanster/1177/.
1177. När du ringer 1177 [When you call 1177] (Region Stockholm). Available at: https://www.1177.se/Stockholm/om-1177/nar-du-ringer-1177/nar-du-ringer-1177/.
Martin, L. J., Kühlmann-Berenzon, S., Azerkan, F. & Bjelkmar, P. Comparing healthcare needs by language: interpreted Arabic and Somali telehealth calls in two regions of Sweden, 2014–18. Eur. J. Pub. Health 34(3), 537–543 (2024).
Article Google Scholar
Dahlgren, K. et al. The use of a Swedish telephone medical advice service by the elderly-a population-based study. Scand. J. Prim. Health Care 35(1), 98–104 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bjelkmar, P. et al. Early outbreak detection by linking health advice line calls to water distribution areas retrospectively demonstrated in a large waterborne outbreak of cryptosporidiosis in Sweden. BMC Public Health 17, 1–10 (2017).
Article Google Scholar
Martin, L. J., Hjertqvist, M., Straten, E. & Bjelkmar, P. Investigating novel approaches to tick-borne encephalitis surveillance in Sweden, 2010–2017. Ticks and Tick-Borne Diseases 11(5), 101486 (2020).
Article PubMed Google Scholar
Ma, T., Englund, H., Bjelkmar, P., Wallensten, A. & Hulth, A. Syndromic surveillance of influenza activity in Sweden: an evaluation of three tools. Epidemiol. Infect. 143(11), 2390–2398 (2015).
Article CAS PubMed Google Scholar
Andersson, T. et al. Syndromic surveillance for local outbreak detection and awareness: evaluating outbreak signals of acute gastroenteritis in telephone triage, web-based queries and over-the-counter pharmacy sales. Epidemiol. Infect. 142(2), 303–313 (2014).
Article CAS PubMed Google Scholar
Jaya, I. G., Mindra, N., Folmer, H. & Lundberg, J. A joint Bayesian spatiotemporal risk prediction model of COVID-19 incidence, IC admission, and death with application to Sweden. Ann. Reg. Sci. 72(1), 107–140 (2024).
Article Google Scholar
Han, B., Cronie, O., Adiels, M., Rosengren, A., & Mia Söderberg. The Influence of Overcrowding and Socioeconomy on the Spatio-temporal Spread of Covid-19–a Swedish Register Study’ (2024).
Zoest, V. et al. Spatio-temporal predictions of COVID-19 test positivity in Uppsala County, Sweden: a comparative approach. Sci. Rep. 12(1), 15176 (2022).
Article ADS PubMed PubMed Central Google Scholar
Robinson, R. Usability Evaluation of a Health Web Portal: Case Study 1177.se in Sweden. (2018).
Thorstensson, B. Clinical Decision Support Systems in Context: Benefits, Challenges and Future Recommendations for Implementation in Rural Uganda. (2010).
Folkhälsomyndigheten (The Public Health Authority). Available at: https://www.folkhalsomyndigheten.se/.
Guan, W. et al. China medical treatment expert group for Covid-19. Clinical characteristics of coronavirus disease 2019 in China. N. Engl. J. Med. 382(18), 1708–1720 (2020).
Article CAS PubMed Google Scholar
Hansen, E. H. & Hunskaar, S. Understanding of and adherence to advice after telephone counselling by nurse: a survey among callers to a primary emergency out-of-hours service in Norway. Scand. J. Trauma Resuscit. Emerg. Med. 19, 1–8 (2011).
Article Google Scholar
Statistikmyndigheten (Statistics Sweden). Population by region and year. Available at: https://www.statistikdatabasen.scb.se/pxweb/sv/ssd/START__BE__BE0101__BE0101A/BefolkningNy/.
Toledano, M., Cohen, I., Ben-Simhon, Y., & Tadeski, I. Real-time anomaly detection system for time series at scale. In KDD 2017 Workshop on Anomaly Detection in Finance, pp. 56-65. PMLR, 2018.
Liu, Q., Boniol, P., Palpanas, T. & Paparrizos, J. Time-Series Anomaly Detection: Overview and New Trends. Proc. VLDB Endow. (PVLDB) 17(12), 4229–4232 (2024).
Article Google Scholar
Rousseeuw, P. J. & Hubert, M. Anomaly detection by robust statistics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(2), e1236 (2018).
Article Google Scholar
Blanford, J. I., Jong, N. B., Schouten, S. E., Friedrich, A. W. & Araújo-Soares, V. Navigating travel in Europe during the pandemic: from mobile apps, certificates and quarantine to traffic-light system. J. Travel Med. 29(3), taac006 (2022).
Article PubMed PubMed Central Google Scholar
Dyrdak, R. et al. Early unrecognised SARS-CoV-2 introductions shaped the first pandemic wave, Sweden, 2020. Eurosurveillance 29(41), 2400021 (2024).
Article PubMed PubMed Central Google Scholar
Berg, J. & Wretborn, J. Impact of the COVID-19 pandemic on the National Telehealth Service for triage and referral in Sweden: a national retrospective observational study. BMJ Open 14(12), e091627 (2024).
Article PubMed PubMed Central Google Scholar
Brainard, J. et al. Comparison of surveillance systems for monitoring COVID-19 in England: a retrospective observational study. Lancet Public Health 8(11), e850–e858 (2023).
Article PubMed Google Scholar
Drefahl, S. et al. A population-based cohort study of socio-demographic risk factors for COVID-19 deaths in Sweden. Nat. Commun. 11(1), 5097 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Elliot, A. J. et al. From Fax to Secure File Transfer Protocol: The 25-Year Evolution of Real-Time Syndromic Surveillance in England. J. Med. Internet Res. 26, e58704 (2024).
Article PubMed PubMed Central Google Scholar
Hashemi, A. S., Ghazani, M. M., Ohlsson, M., Björk, J., & Dietler, D. Surveillance of Disease Outbreaks Using Unsupervised Uni-Multivariate Anomaly Detection of Time-Series Symptoms. In Digital Health and Informatics Innovations for Sustainable Health Care Systems, pp. 1916–1920. IOS Press, (2024).
Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: A survey. ACM Comput. Surveys (CSUR) 41(3), 1–58 (2009).
Article Google Scholar
Schmidl, S., Wenig, P. & Papenbrock, T. Anomaly detection in time series: a comprehensive evaluation. Proc. VLDB Endow. 15(9), 1779 (2022).
Article Google Scholar
Björk, J., Mattisson, K. & Ahlbom, A. Impact of winter holiday and government responses on mortality in Europe during the first wave of the COVID-19 pandemic. Eur. J. Pub. Health 31(2), 272–277 (2021).
Article Google Scholar
Hazarie, S., Soriano-Paños, D., Arenas, A., Gómez-Gardeñes, J. & Ghoshal, G. Interplay between population density and mobility in determining the spread of epidemics in cities. Commun. Phys. 4(1), 191 (2021).
Article CAS Google Scholar
Cao, W. et al. Spatial-temporal diffusion model of aggregated infectious diseases based on population life characteristics: a case study of COVID-19. MBE 20(7), 13086–13112 (2023).
Article PubMed Google Scholar
Geoghegan, J. L. et al. Use of genomics to track coronavirus disease outbreaks, New Zealand. Emerg. Infect. Dis. 27(5), 1317 (2021).
Article CAS PubMed PubMed Central Google Scholar
Folkhälsomyndigheten (The Public Health Authority). ”COVID-19 pandemic timeline.” Available at: https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/utbrottsarkiv/covid-19-pandemin-2019-2023/nar-hande-vad-under-pandemin/.
Olofsson, T. & Vilhelmsson, A. Dataset: COVID-19 epidemic policy and events timeline (Sweden). Data Brief 40, 107698 (2022).
Article CAS PubMed Google Scholar
Altmejd, A., Östergren, O., Björkegren, E. & Persson, T. Inequality and COVID-19 in Sweden: Relative risks of nine bad life events, by four social gradients, in pandemic vs. prepandemic years. Proc. Natl. Acad. Sci. 120(46), e2303640120 (2023).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge the SWECOV project, hosted by Stockholm University and financed by Riksbankens Jubileumsfond (RIK21-0004). Ethical permission was granted by the Swedish Ethical Review Authority (permit numbers 2020-06492, 2021-01115, 2022-01355-02, 2022-06118-02, and 2024-02342-02).

Funding

Open access funding provided by Lund University. This work is funded by grants from the Lars Mikael Karlsson Foundation (LMK-stiftelsen), from the Swedish Research Council (VR; dnr 2022-06358), and from the Knut and Alice Wallenberg Foundation to SciLifeLab for research in Data-driven Life Science, DDLS (KAW 2024.0159).

Author information

Authors and Affiliations

Division of Occupational and Environmental Medicine, Department of Laboratory Medicine, Faculty of Medicine, Lund University, Lund, Sweden
Atiye Sadat Hashemi, Dominik Dietler & Jonas Björk
Molecular Epidemiology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
Tove Fall
Section for Infection Medicine, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
Malin Inghammar
Department of Hospital Hygiene and Infection Prevention and Control, Lund, Region Skåne, Sweden
Malin Inghammar
Department of Clinical Microbiology and Molecular Infection Medicine, Umeå University, Umeå, Sweden
Anders F. Johansson
School of Public Health and Community Medicine, Institute of Medicine, Center for Societal Risk Research, University of Gothenburg, Karlstad University, Karlstad, Sweden
Carl Bonander
Center for Environmental and Climate Science (CEC), Faculty of Science, Lund University, Lund, Sweden
Mattias Ohlsson
Clinical Studies Sweden, Forum South, Skåne University Hospital, Lund, Sweden
Jonas Björk

Authors

Atiye Sadat Hashemi
View author publications
Search author on:PubMed Google Scholar
Dominik Dietler
View author publications
Search author on:PubMed Google Scholar
Tove Fall
View author publications
Search author on:PubMed Google Scholar
Malin Inghammar
View author publications
Search author on:PubMed Google Scholar
Anders F. Johansson
View author publications
Search author on:PubMed Google Scholar
Carl Bonander
View author publications
Search author on:PubMed Google Scholar
Mattias Ohlsson
View author publications
Search author on:PubMed Google Scholar
Jonas Björk
View author publications
Search author on:PubMed Google Scholar

Contributions

AS.H.: Conceptualization, methodology, formal analysis, data curation, development of the analysis and writing original draft preparation. J.B. and D.D.: Conceptualization, investigation, data curation, and development of the analysis and experiments, as well as result interpretation. T.F., M.A., A.J., C.B., and M.O. contributed to the analysis and interpretation of the results. All authors reviewed and edited the final manuscript.

Corresponding author

Correspondence to Atiye Sadat Hashemi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hashemi, A.S., Dietler, D., Fall, T. et al. Time series anomaly detection in helpline call trends for early detection of COVID-19 spread across Sweden, 2020. Sci Rep 15, 32701 (2025). https://doi.org/10.1038/s41598-025-20641-2

Download citation

Received: 17 March 2025
Accepted: 16 September 2025
Published: 24 September 2025
Version of record: 24 September 2025
DOI: https://doi.org/10.1038/s41598-025-20641-2